AI RESEARCH PAPERS & ACADEMIC SOURCES
- UrbanNav: Learning Language-Guided Urban Navigation from Web-Scale Human Trajectories : Abstract: Navigating complex urban environments using natural language instructions poses significant challenges for embodied agents, including noisy language instructions, ambiguous spatial reference...
- Simultaneous Tactile-Visual Perception for Learning Multimodal Robot Manipulation : Abstract: Robotic manipulation requires both rich multimodal perception and effective learning frameworks to handle complex real-world tasks. See-through-skin (STS) sensors, which combine tactile and ...
- YOPO-Nav: Visual Navigation using 3DGS Graphs from One-Pass Videos : Abstract: Visual navigation has emerged as a practical alternative to traditional robotic navigation pipelines that rely on detailed mapping and path planning. However, constructing and maintaining 3D...
- Adversarial-Robustness-Guided Graph Pruning : Abstract: Graph learning plays a central role in many data mining and machine learning tasks, such as manifold learning, data representation and analysis, dimensionality reduction, clustering, and vis...
- RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations : Abstract: We present RELOCATE, a simple training-free baseline designed to perform the challenging task of visual query localization in long videos. To eliminate the need for task-specific training an...
- Foveation Improves Payload Capacity in Steganography : Abstract: Steganography finds its use in visual medium such as providing metadata and watermarking. With support of efficient latent representations and foveated rendering, we trained models that impr...
- InfoMotion: A Graph-Based Approach to Video Dataset Distillation for Echocardiography : Abstract: Echocardiography playing a critical role in the diagnosis and monitoring of cardiovascular diseases as a non-invasive real-time assessment of cardiac structure and function. However, the gro...
- FunPhase: A Periodic Functional Autoencoder for Motion Generation via Phase Manifolds : Abstract: Learning natural body motion remains challenging due to the strong coupling between spatial geometry and temporal dynamics. Embedding motion in phase manifolds, latent spaces that capture lo...
- UniPart: Part-Level 3D Generation with Unified 3D Geom-Seg Latents : Abstract: Part-level 3D generation is essential for applications requiring decomposable and structured 3D synthesis. However, existing methods either rely on implicit part segmentation with limited gr...
- Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation : Abstract: Recent vision language models (VLMs) like CLIP have demonstrated impressive anomaly detection performance under significant distribution shift by utilizing high-level semantic information th...
- MODA: The First Challenging Benchmark for Multispectral Object Detection in Aerial Images : Abstract: Aerial object detection faces significant challenges in real-world scenarios, such as small objects and extensive background interference, which limit the performance of RGB-based detectors ...
- StateSpace-SSL: Linear-Time Self-supervised Learning for Plant Disease Detectio : Abstract: Self-supervised learning (SSL) is attractive for plant disease detection as it can exploit large collections of unlabeled leaf images, yet most existing SSL methods are built on CNNs or visi...
- Gradient-Guided Learning Network for Infrared Small Target Detection : Abstract: Recently, infrared small target detection has attracted extensive attention. However, due to the small size and the lack of intrinsic features of infrared small targets, the existing methods...
- Masked Registration and Autoencoding of CT Images for Predictive Tibia Reconstruction : Abstract: Surgical planning for complex tibial fractures can be challenging for surgeons, as the 3D structure of the later desirable bone alignment may be diffi- cult to imagine. To assist in such pla...
- A Dual-Domain Convolutional Network for Hyperspectral Single-Image Super-Resolution : Abstract: This study presents a lightweight dual-domain super-resolution network (DDSRNet) that combines Spatial-Net with the discrete wavelet transform (DWT). Specifically, our proposed model compris...
- Building Reasonable Inference for Vision-Language Models in Blind Image Quality Assessment : Abstract: Recent progress in BIQA has been driven by VLMs, whose semantic reasoning abilities suggest that they might extract visual features, generate descriptive text, and infer quality in a human-l...
- From Graphs to Gates: DNS-HyXNet, A Lightweight and Deployable Sequential Model for Real-Time DNS Tunnel Detection : Abstract: Domain Name System (DNS) tunneling remains a covert channel for data exfiltration and command-and-control communication. Although graph-based methods such as GraphTunnel achieve strong accur...
- Investigate the Low-level Visual Perception in Vision-Language based Image Quality Assessment : Abstract: Recent advances in Image Quality Assessment (IQA) have leveraged Multi-modal Large Language Models (MLLMs) to generate descriptive explanations. However, despite their strong visual percepti...
- Seeing Soil from Space: Towards Robust and Scalable Remote Soil Nutrient Analysis : Abstract: Environmental variables are increasingly affecting agricultural decision-making, yet accessible and scalable tools for soil assessment remain limited. This study presents a robust and scalab...
- Content-Adaptive Image Retouching Guided by Attribute-Based Text Representation : Abstract: Image retouching has received significant attention due to its ability to achieve high-quality visual content. Existing approaches mainly rely on uniform pixel-wise color mapping across enti...
- UnReflectAnything: RGB-Only Highlight Removal by Rendering Synthetic Specular Supervision : Abstract: Specular highlights distort appearance, obscure texture, and hinder geometric reasoning in both natural and surgical imagery. We present UnReflectAnything, an RGB-only framework that removes...
- CS3D: An Efficient Facial Expression Recognition via Event Vision : Abstract: Responsive and accurate facial expression recognition is crucial to human-robot interaction for daily service robots. Nowadays, event cameras are becoming more widely adopted as they surpass...
- FROMAT: Multiview Material Appearance Transfer via Few-Shot Self-Attention Adaptation : Abstract: Multiview diffusion models have rapidly emerged as a powerful tool for content creation with spatial consistency across viewpoints, offering rich visual realism without requiring explicit ge...
- Beyond Sequences: A Benchmark for Atomic Hand-Object Interaction Using a Static RNN Encoder : Abstract: Reliably predicting human intent in hand-object interactions is an open challenge for computer vision. Our research concentrates on a fundamental sub-problem: the fine-grained classification...
- Benchmarking SAM2-based Trackers on FMOX : Abstract: Several object tracking pipelines extending Segment Anything Model 2 (SAM2) have been proposed in the past year, where the approach is to follow and segment the object from a single exemplar...
- Kaapana: A Comprehensive Open-Source Platform for Integrating AI in Medical Imaging Research Environments : Abstract: Developing generalizable AI for medical imaging requires both access to large, multi-center datasets and standardized, reproducible tooling within research environments. However, leveraging ...
- VHOI: Controllable Video Generation of Human-Object Interactions from Sparse Trajectories via Motion Densification : Abstract: Synthesizing realistic human-object interactions (HOI) in video is challenging due to the complex, instance-specific interaction dynamics of both humans and objects. Incorporating controllab...
- IF-Bench: Benchmarking and Enhancing MLLMs for Infrared Images with Generative Visual Prompting : Abstract: Recent advances in multimodal large language models (MLLMs) have led to impressive progress across various benchmarks. However, their capability in understanding infrared images remains unex...
- An Automated Tip-and-Cue Framework for Optimized Satellite Tasking and Visual Intelligence : Abstract: The proliferation of satellite constellations, coupled with reduced tasking latency and diverse sensor capabilities, has expanded the opportunities for automated Earth observation. This pape...
- Unconsciously Forget: Mitigating Memorization; Without Knowing What is being Memorized : Abstract: Recent advances in generative models have demonstrated an exceptional ability to produce highly realistic images. However, previous studies show that generated images often resemble the trai...
- LiM-YOLO: Less is More with Pyramid Level Shift and Normalized Auxiliary Branch for Ship Detection in Optical Remote Sensing Imagery : Abstract: Applying general-purpose object detectors to ship detection in satellite imagery presents significant challenges due to the extreme scale disparity and morphological anisotropy of maritime t...
- Stylized Meta-Album: Group-bias injection with style transfer to study robustness against distribution shifts : Abstract: We introduce Stylized Meta-Album (SMA), a new image classification meta-dataset comprising 24 datasets (12 content datasets, and 12 stylized datasets), designed to advance studies on out-of-...
- FastPose-ViT: A Vision Transformer for Real-Time Spacecraft Pose Estimation : Abstract: Estimating the 6-degrees-of-freedom (6DoF) pose of a spacecraft from a single image is critical for autonomous operations like in-orbit servicing and space debris removal. Existing state-of-...
- Modality-Specific Enhancement and Complementary Fusion for Semi-Supervised Multi-Modal Brain Tumor Segmentation : Abstract: Semi-supervised learning (SSL) has become a promising direction for medical image segmentation, enabling models to learn from limited labeled data alongside abundant unlabeled samples. Howev...
- DynaIP: Dynamic Image Prompt Adapter for Scalable Zero-shot Personalized Text-to-Image Generation : Abstract: Personalized Text-to-Image (PT2I) generation aims to produce customized images based on reference images. A prominent interest pertains to the integration of an image prompt adapter to facil...
- From Detection to Anticipation: Online Understanding of Struggles across Various Tasks and Activities : Abstract: Understanding human skill performance is essential for intelligent assistive systems, with struggle recognition offering a natural cue for identifying user difficulties. While prior work foc...
- UniUGP: Unifying Understanding, Generation, and Planing For End-to-end Autonomous Driving : Abstract: Autonomous driving (AD) systems struggle in long-tail scenarios due to limited world knowledge and weak visual dynamic modeling. Existing vision-language-action (VLA)-based methods cannot le...
- Diffusion Posterior Sampler for Hyperspectral Unmixing with Spectral Variability Modeling : Abstract: Linear spectral mixture models (LMM) provide a concise form to disentangle the constituent materials (endmembers) and their corresponding proportions (abundance) in a single pixel. The criti...
- Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs : Abstract: Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literature, yet existing benchmarks eit...
- VisualActBench: Can VLMs See and Act like a Human? : Abstract: Vision-Language Models (VLMs) have achieved impressive progress in perceiving and describing visual environments. However, their ability to proactively reason and act based solely on visual ...
- NordFKB: a fine-grained benchmark dataset for geospatial AI in Norway : Abstract: We present NordFKB, a fine-grained benchmark dataset for geospatial AI in Norway, derived from the authoritative, highly accurate, national Felles KartdataBase (FKB). The dataset contains hi...
- Splatent: Splatting Diffusion Latents for Novel View Synthesis : Abstract: Radiance field representations have recently been explored in the latent space of VAEs that are commonly used by diffusion models. This direction offers efficient rendering and seamless inte...
- ReViSE: Towards Reason-Informed Video Editing in Unified Models with Self-Reflective Learning : Abstract: Video unified models exhibit strong capabilities in understanding and generation, yet they struggle with reason-informed visual editing even when equipped with powerful internal vision-langu...
- GAINS: Gaussian-based Inverse Rendering from Sparse Multi-View Captures : Abstract: Recent advances in Gaussian Splatting-based inverse rendering extend Gaussian primitives with shading parameters and physically grounded light transport, enabling high-quality material recov...
- Agreement Disagreement Guided Knowledge Transfer for Cross-Scene Hyperspectral Imaging : Abstract: Knowledge transfer plays a crucial role in cross-scene hyperspectral imaging (HSI). However, existing studies often overlook the challenges of gradient conflicts and dominant gradients that ...
- Residual Primitive Fitting of 3D Shapes with SuperFrusta : Abstract: We introduce a framework for converting 3D shapes into compact and editable assemblies of analytic primitives, directly addressing the persistent trade-off between reconstruction fidelity an...
- A Distributed Framework for Privacy-Enhanced Vision Transformers on the Edge : Abstract: Nowadays, visual intelligence tools have become ubiquitous, offering all kinds of convenience and possibilities. However, these tools have high computational requirements that exceed the cap...
- Development and Testing for Perception Based Autonomous Landing of a Long-Range QuadPlane : Abstract: QuadPlanes combine the range efficiency of fixed-wing aircraft with the maneuverability of multi-rotor platforms for long-range autonomous missions. In GPS-denied or cluttered urban environm...
- Sequential Testing for Descriptor-Agnostic LiDAR Loop Closure in Repetitive Environments : Abstract: We propose a descriptor-agnostic, multi-frame loop closure verification method that formulates LiDAR loop closure as a truncated Sequential Probability Ratio Test (SPRT). Instead of deciding...
- LiePrune: Lie Group and Quantum Geometric Dual Representation for One-Shot Structured Pruning of Quantum Neural Networks : Abstract: Quantum neural networks (QNNs) and parameterized quantum circuits (PQCs) are key building blocks for near-term quantum machine learning. However, their scalability is constrained by excessiv...
- ViTA-Seg: Vision Transformer for Amodal Segmentation in Robotics : Abstract: Occlusions in robotic bin picking compromise accurate and reliable grasp planning. We present ViTA-Seg, a class-agnostic Vision Transformer framework for real-time amodal segmentation that l...
- Imitative Membership Inference Attack : Abstract: A Membership Inference Attack (MIA) assesses how much a target machine learning model reveals about its training data by determining whether specific query instances were part of the trainin...
- Enhancing Reliability across Short and Long-Form QA via Reinforcement Learning : Abstract: While reinforcement learning has unlocked unprecedented complex reasoning in large language models, it has also amplified their propensity for hallucination, creating a critical trade-off be...
- Targeting Misalignment: A Conflict-Aware Framework for Reward-Model-based LLM Alignment : Abstract: Reward-model-based fine-tuning is a central paradigm in aligning Large Language Models with human preferences. However, such approaches critically rely on the assumption that proxy reward mo...
- Training-free Context-adaptive Attention for Efficient Long Context Modeling : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across a wide range of natural language processing tasks. These capabilities stem primarily from the self-attention mec...
- Language models as tools for investigating the distinction between possible and impossible natural languages : Abstract: We argue that language models (LMs) have strong potential as investigative tools for probing the distinction between possible and impossible natural languages and thus uncovering the inducti...
- Knowledge-Augmented Large Language Model Agents for Explainable Financial Decision-Making : Abstract: This study investigates an explainable reasoning method for financial decision-making based on knowledge-enhanced large language model agents. To address the limitations of traditional finan...
- Advancing Text Classification with Large Language Models and Neural Attention Mechanisms : Abstract: This study proposes a text classification algorithm based on large language models, aiming to address the limitations of traditional methods in capturing long-range dependencies, understandi...
- Source Coverage and Citation Bias in LLM-based vs. Traditional Search Engines : Abstract: LLM-based Search Engines (LLM-SEs) introduces a new paradigm for information seeking. Unlike Traditional Search Engines (TSEs) (e.g., Google), these systems summarize results, often providin...
- Systematic Framework of Application Methods for Large Language Models in Language Sciences : Abstract: Large Language Models (LLMs) are transforming language sciences. However, their widespread deployment currently suffers from methodological fragmentation and a lack of systematic soundness. ...
- Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale : Abstract: This article presents the creation of an Estonian-language dataset for document-level subjectivity, analyzes the resulting annotations, and reports an initial experiment of automatic subject...
- MentraSuite: Post-Training Large Language Models for Mental Health Reasoning and Assessment : Abstract: Mental health disorders affect hundreds of millions globally, and the Web now serves as a primary medium for accessing support, information, and assessment. Large language models (LLMs) offe...
- Neurosymbolic Information Extraction from Transactional Documents : Abstract: This paper presents a neurosymbolic framework for information extraction from documents, evaluated on transactional documents. We introduce a schema-based approach that integrates symbolic v...
- d-TreeRPO: Towards More Reliable Policy Optimization for Diffusion Language Models : Abstract: Reliable reinforcement learning (RL) for diffusion large language models (dLLMs) requires both accurate advantage estimation and precise estimation of prediction probabilities. Existing RL m...
- FineFreq: A Multilingual Character Frequency Dataset from Web-Scale Text : Abstract: We present FineFreq, a large-scale multilingual character frequency dataset derived from the FineWeb and FineWeb2 corpora, covering over 1900 languages and spanning 2013-2025. The dataset co...
- MOA: Multi-Objective Alignment for Role-Playing Agents : Abstract: Role-playing agents (RPAs) must simultaneously master many conflicting skills -- following multi-turn instructions, exhibiting domain knowledge, and adopting a consistent linguistic style. E...
- DeepSeek's WEIRD Behavior: The cultural alignment of Large Language Models and the effects of prompt language and cultural prompting : Abstract: Culture is a core component of human-to-human interaction and plays a vital role in how we perceive and interact with others. Advancements in the effectiveness of Large Language Models (LLMs...
- ChronusOmni: Improving Time Awareness of Omni Large Language Models : Abstract: Time awareness is a fundamental ability of omni large language models, especially for understanding long videos and answering complex questions. Previous approaches mainly target vision-lang...
- Mitigating Social Bias in English and Urdu Language Models Using PRM-Guided Candidate Selection and Sequential Refinement : Abstract: Large language models (LLMs) increasingly mediate human communication, decision support, content creation, and information retrieval. Despite impressive fluency, these systems frequently pro...
- Large Language Models as Search Engines: Societal Challenges : Abstract: Large Language Models (LLMs) may one day replace search engines as the primary portal to information on the Web. In this article, we investigate the societal challenges that such a change co...
- Guiding LLMs to Generate High-Fidelity and High-Quality Counterfactual Explanations for Text Classification : Abstract: The need for interpretability in deep learning has driven interest in counterfactual explanations, which identify minimal changes to an instance that change a model's prediction. Current cou...
- Two Causal Principles for Improving Visual Dialog : Abstract: This paper unravels the design tricks adopted by us, the champion team MReaL-BDAI, for Visual Dialog Challenge 2019: two causal principles for improving Visual Dialog (VisDial). By "improvin...
- Improving Topic Relevance Model by Mix-structured Summarization and LLM-based Data Augmentation : Abstract: Topic relevance between query and document is a very important part of social search, which can evaluate the degree of matching between document and user's requirement. In most social search...
- Leveraging Machine Learning to Identify Gendered Stereotypes and Body Image Concerns on Diet and Fitness Online Forums : Abstract: The pervasive expectations about ideal body types in Western society can lead to body image concerns, dissatisfaction, and in extreme cases, eating disorders and other psychopathologies rela...
- An Efficient Test-Time Scaling Approach for Image Generation : Abstract: Image generation has emerged as a mainstream application of large generative AI models. Just as test-time compute and reasoning have helped language models improve their capabilities, simila...
- Enhancing Knowledge Transfer in Hyperspectral Image Classification via Cross-scene Knowledge Integration : Abstract: Knowledge transfer has strong potential to improve hyperspectral image (HSI) classification, yet two inherent challenges fundamentally restrict effective cross-domain transfer: spectral vari...
- Diffusion Model Regularized Implicit Neural Representation for CT Metal Artifact Reduction : Abstract: Computed tomography (CT) images are often severely corrupted by artifacts in the presence of metals. Existing supervised metal artifact reduction (MAR) approaches suffer from performance ins...
- A Survey of Body and Face Motion: Datasets, Performance Evaluation Metrics and Generative Techniques : Abstract: Body and face motion play an integral role in communication. They convey crucial information on the participants. Advances in generative modeling and multi-modal learning have enabled motion...
- An Approach for Detection of Entities in Dynamic Media Contents : Abstract: The notion of learning underlies almost every evolution of Intelligent Agents. In this paper, we present an approach for searching and detecting a given entity in a video sequence. Specifica...
- Learning to Remove Lens Flare in Event Camera : Abstract: Event cameras have the potential to revolutionize vision systems with their high temporal resolution and dynamic range, yet they remain susceptible to lens flare, a fundamental optical artif...
- ConceptPose: Training-Free Zero-Shot Object Pose Estimation using Concept Vectors : Abstract: Object pose estimation is a fundamental task in computer vision and robotics, yet most methods require extensive, dataset-specific training. Concurrently, large-scale vision language models ...
- Adaptive Thresholding for Visual Place Recognition using Negative Gaussian Mixture Statistics : Abstract: Visual place recognition (VPR) is an important component technology for camera-based mapping and navigation applications. This is a challenging problem because images of the same place may a...
- AgentComp: From Agentic Reasoning to Compositional Mastery in Text-to-Image Models : Abstract: Text-to-image generative models have achieved remarkable visual quality but still struggle with compositionality$-$accurately capturing object relationships, attribute bindings, and fine-gra...
- Explaining the Unseen: Multimodal Vision-Language Reasoning for Situational Awareness in Underground Mining Disasters : Abstract: Underground mining disasters produce pervasive darkness, dust, and collapses that obscure vision and make situational awareness difficult for humans and conventional systems. To address this...
- Food Image Generation on Multi-Noun Categories : Abstract: Generating realistic food images for categories with multiple nouns is surprisingly challenging. For instance, the prompt "egg noodle" may result in images that incorrectly contain both eggs...
- GimbalDiffusion: Gravity-Aware Camera Control for Video Generation : Abstract: Recent progress in text-to-video generation has achieved remarkable realism, yet fine-grained control over camera motion and orientation remains elusive. Existing approaches typically encode...
- SuperF: Neural Implicit Fields for Multi-Image Super-Resolution : Abstract: High-resolution imagery is often hindered by limitations in sensor technology, atmospheric conditions, and costs. Such challenges occur in satellite remote sensing, but also with handheld ca...
- GTAvatar: Bridging Gaussian Splatting and Texture Mapping for Relightable and Editable Gaussian Avatars : Abstract: Recent advancements in Gaussian Splatting have enabled increasingly accurate reconstruction of photorealistic head avatars, opening the door to numerous applications in visual effects, video...
- View-on-Graph: Zero-shot 3D Visual Grounding via Vision-Language Reasoning on Scene Graphs : Abstract: 3D visual grounding (3DVG) identifies objects in 3D scenes from language descriptions. Existing zero-shot approaches leverage 2D vision-language models (VLMs) by converting 3D spatial inform...
- Enabling Next-Generation Consumer Experience with Feature Coding for Machines : Abstract: As consumer devices become increasingly intelligent and interconnected, efficient data transfer solutions for machine tasks have become essential. This paper presents an overview of the late...
- Efficient Feature Compression for Machines with Global Statistics Preservation : Abstract: The split-inference paradigm divides an artificial intelligence (AI) model into two parts. This necessitates the transfer of intermediate feature data between the two halves. Here, effective...
- OmniPSD: Layered PSD Generation with Diffusion Transformer : Abstract: Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly chal...
- ROI-Packing: Efficient Region-Based Compression for Machine Vision : Abstract: This paper introduces ROI-Packing, an efficient image compression method tailored specifically for machine vision. By prioritizing regions of interest (ROI) critical to end-task accuracy and...
- MoRel: Long-Range Flicker-Free 4D Motion Modeling via Anchor Relay-based Bidirectional Blending with Hierarchical Densification : Abstract: Recent advances in 4D Gaussian Splatting (4DGS) have extended the high-speed rendering capability of 3D Gaussian Splatting (3DGS) into the temporal domain, enabling real-time rendering of dy...
- LongT2IBench: A Benchmark for Evaluating Long Text-to-Image Generation with Graph-structured Annotations : Abstract: The increasing popularity of long Text-to-Image (T2I) generation has created an urgent need for automatic and interpretable models that can evaluate the image-text alignment in long prompt s...
- Dynamic Facial Expressions Analysis Based Parkinson's Disease Auxiliary Diagnosis : Abstract: Parkinson's disease (PD), a prevalent neurodegenerative disorder, significantly affects patients' daily functioning and social interactions. To facilitate a more efficient and accessible dia...
- LoGoColor: Local-Global 3D Colorization for 360{\deg} Scenes : Abstract: Single-channel 3D reconstruction is widely used in fields such as robotics and medical imaging. While this line of work excels at reconstructing 3D geometry, the outputs are not colored 3D m...
- FoundIR-v2: Optimizing Pre-Training Data Mixtures for Image Restoration Foundation Model : Abstract: Recent studies have witnessed significant advances in image restoration foundation models driven by improvements in the scale and quality of pre-training data. In this work, we find that the...
- MelanomaNet: Explainable Deep Learning for Skin Lesion Classification : Abstract: Automated skin lesion classification using deep learning has shown remarkable accuracy, yet clinical adoption remains limited due to the "black box" nature of these models. We present Melano...
- Traffic Scene Small Target Detection Method Based on YOLOv8n-SPTS Model for Autonomous Driving : Abstract: This paper focuses on the key issue in autonomous driving: small target recognition in dynamic perception. Existing algorithms suffer from poor detection performance due to missing small tar...
- VABench: A Comprehensive Benchmark for Audio-Video Generation : Abstract: Recent advances in video generation have been remarkable, enabling models to produce visually compelling videos with synchronized audio. While existing video generation benchmarks provide co...
- From SAM to DINOv2: Towards Distilling Foundation Models to Lightweight Baselines for Generalized Polyp Segmentation : Abstract: Accurate polyp segmentation during colonoscopy is critical for the early detection of colorectal cancer and still remains challenging due to significant size, shape, and color variations, an...
- Transformer-Driven Multimodal Fusion for Explainable Suspiciousness Estimation in Visual Surveillance : Abstract: Suspiciousness estimation is critical for proactive threat detection and ensuring public safety in complex environments. This work introduces a large-scale annotated dataset, USE50k, along w...
- Benchmarking Real-World Medical Image Classification with Noisy Labels: Challenges, Practice, and Outlook : Abstract: Learning from noisy labels remains a major challenge in medical image analysis, where annotation demands expert knowledge and substantial inter-observer variability often leads to inconsiste...
- UniLS: End-to-End Audio-Driven Avatars for Unified Listening and Speaking : Abstract: Generating lifelike conversational avatars requires modeling not just isolated speakers, but the dynamic, reciprocal interaction of speaking and listening. However, modeling the listener is ...
- Relightable and Dynamic Gaussian Avatar Reconstruction from Monocular Video : Abstract: Modeling relightable and animatable human avatars from monocular video is a long-standing and challenging task. Recently, Neural Radiance Field (NeRF) and 3D Gaussian Splatting (3DGS) method...
- TextGuider: Training-Free Guidance for Text Rendering via Attention Alignment : Abstract: Despite recent advances, diffusion-based text-to-image models still struggle with accurate text rendering. Several studies have proposed fine-tuning or training-free refinement methods for a...
- Video-QTR: Query-Driven Temporal Reasoning Framework for Lightweight Video Understanding : Abstract: The rapid development of multimodal large-language models (MLLMs) has significantly expanded the scope of visual language reasoning, enabling unified systems to interpret and describe comple...
- StereoWorld: Geometry-Aware Monocular-to-Stereo Video Generation : Abstract: The growing adoption of XR devices has fueled strong demand for high-quality stereo video, yet its production remains costly and artifact-prone. To address this challenge, we present StereoW...
- ASSIST-3D: Adapted Scene Synthesis for Class-Agnostic 3D Instance Segmentation : Abstract: Class-agnostic 3D instance segmentation tackles the challenging task of segmenting all object instances, including previously unseen ones, without semantic class reliance. Current methods st...
- FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement : Abstract: Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherentl...
- Perception-Inspired Color Space Design for Photo White Balance Editing : Abstract: White balance (WB) is a key step in the image signal processor (ISP) pipeline that mitigates color casts caused by varying illumination and restores the scene's true colors. Currently, sRGB-...
- Wasserstein-Aligned Hyperbolic Multi-View Clustering : Abstract: Multi-view clustering (MVC) aims to uncover the latent structure of multi-view data by learning view-common and view-specific information. Although recent studies have explored hyperbolic re...
- Generative Point Cloud Registration : Abstract: In this paper, we propose a novel 3D registration paradigm, Generative Point Cloud Registration, which bridges advanced 2D generative models with 3D matching tasks to enhance registration pe...
- DirectSwap: Mask-Free Cross-Identity Training and Benchmarking for Expression-Consistent Video Head Swapping : Abstract: Video head swapping aims to replace the entire head of a video subject, including facial identity, head shape, and hairstyle, with that of a reference image, while preserving the target body...
- Label-free Motion-Conditioned Diffusion Model for Cardiac Ultrasound Synthesis : Abstract: Ultrasound echocardiography is essential for the non-invasive, real-time assessment of cardiac function, but the scarcity of labelled data, driven by privacy restrictions and the complexity ...
- Rates and architectures for learning geometrically non-trivial operators : Abstract: Deep learning methods have proven capable of recovering operators between high-dimensional spaces, such as solution maps of PDEs and similar objects in mathematical physics, from very few tr...
- Federated Distillation Assisted Vehicle Edge Caching Scheme Based on Lightweight DDPM : Abstract: Vehicle edge caching is a promising technology that can significantly reduce the latency for vehicle users (VUs) to access content by pre-caching user-interested content at edge nodes. It is...
- Black-Box Behavioral Distillation Breaks Safety Alignment in Medical LLMs : Abstract: As medical large language models (LLMs) become increasingly integrated into clinical workflows, concerns around alignment robustness, and safety are escalating. Prior work on model extractio...
- Cauchy-Schwarz Fairness Regularizer : Abstract: Group fairness in machine learning is often enforced by adding a regularizer that reduces the dependence between model predictions and sensitive attributes. However, existing regularizers ar...
- Contextual Dynamic Pricing with Heterogeneous Buyers : Abstract: We initiate the study of contextual dynamic pricing with a heterogeneous population of buyers, where a seller repeatedly posts prices (over $T$ rounds) that depend on the observable $d$-dime...
- QuanvNeXt: An end-to-end quanvolutional neural network for EEG-based detection of major depressive disorder : Abstract: This study presents QuanvNeXt, an end-to-end fully quanvolutional model for EEG-based depression diagnosis. QuanvNeXt incorporates a novel Cross Residual block, which reduces feature homogen...
- Latent-Autoregressive GP-VAE Language Model : Abstract: We investigate a fully Latent AutoRegressive scheme based on a Gaussian Process (GP) integrated into a Variational Autoencoder (VAE). In this setting, sequential dynamics are transferred fro...
- Semantic-Aware Cooperative Communication and Computation Framework in Vehicular Networks : Abstract: Semantic Communication (SC) combined with Vehicular edge computing (VEC) provides an efficient edge task processing paradigm for Internet of Vehicles (IoV). Focusing on highway scenarios, th...
- Membership and Dataset Inference Attacks on Large Audio Generative Models : Abstract: Generative audio models, based on diffusion and autoregressive architectures, have advanced rapidly in both quality and expressiveness. This progress, however, raises pressing copyright conc...
- A data-driven approach to linking design features with manufacturing process data for sustainable product development : Abstract: The growing adoption of Industrial Internet of Things (IIoT) technologies enables automated, real-time collection of manufacturing process data, unlocking new opportunities for data-driven p...
- Training One Model to Master Cross-Level Agentic Actions via Reinforcement Learning : Abstract: The paradigm of agentic AI is shifting from engineered complex workflows to post-training native models. However, existing agents are typically confined to static, predefined action spaces--...
- Mixture of Lookup Key-Value Experts : Abstract: Recent research has developed several LLM architectures suitable for inference on end-user devices, such as the Mixture of Lookup Experts (MoLE)~\parencite{jie_mixture_2025}. A key feature o...
- Physics-Aware Heterogeneous GNN Architecture for Real-Time BESS Optimization in Unbalanced Distribution Systems : Abstract: Battery energy storage systems (BESS) have become increasingly vital in three-phase unbalanced distribution grids for maintaining voltage stability and enabling optimal dispatch. However, ex...
- Predicting Polymer Solubility in Solvents Using SMILES Strings : Abstract: Understanding and predicting polymer solubility in various solvents is critical for applications ranging from recycling to pharmaceutical formulation. This work presents a deep learning fram...
- TinyD\'ej\`aVu: Smaller Memory Footprint & Faster Inference on Sensor Data Streams with Always-On Microcontrollers : Abstract: Always-on sensors are increasingly expected to embark a variety of tiny neural networks and to continuously perform inference on time-series of the data they sense. In order to fit lifetime ...
- Knowledge Diversion for Efficient Morphology Control and Policy Transfer : Abstract: Universal morphology control aims to learn a universal policy that generalizes across heterogeneous agent morphologies, with Transformer-based controllers emerging as a popular choice. Howev...
- Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers : Abstract: Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks ar...
- Incorporating Fairness in Neighborhood Graphs for Fair Spectral Clustering : Abstract: Graph clustering plays a pivotal role in unsupervised learning methods like spectral clustering, yet traditional methods for graph clustering often perpetuate bias through unfair graph const...
- Predicting the Containment Time of California Wildfires Using Machine Learning : Abstract: California's wildfire season keeps getting worse over the years, overwhelming the emergency response teams. These fires cause massive destruction to both property and human life. Because of ...
- Conformal Bandits: Bringing statistical validity and reward efficiency to the small-gap regime : Abstract: We introduce Conformal Bandits, a novel framework integrating Conformal Prediction (CP) into bandit problems, a classic paradigm for sequential decision-making under uncertainty. Traditional...
- HPM-KD: Hierarchical Progressive Multi-Teacher Framework for Knowledge Distillation and Efficient Model Compression : Abstract: Knowledge Distillation (KD) has emerged as a promising technique for model compression but faces critical limitations: (1) sensitivity to hyperparameters requiring extensive manual tuning, (...
- Analysis of Dirichlet Energies as Over-smoothing Measures : Abstract: We analyze the distinctions between two functionals often used as over-smoothing measures: the Dirichlet energies induced by the unnormalized graph Laplacian and the normalized graph Laplaci...
- Exploring Protein Language Model Architecture-Induced Biases for Antibody Comprehension : Abstract: Recent advances in protein language models (PLMs) have demonstrated remarkable capabilities in understanding protein sequences. However, the extent to which different model architectures cap...
- Closing the Train-Test Gap in World Models for Gradient-Based Planning : Abstract: World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at i...
- Controlling Steering Angle for Cooperative Self-driving Vehicles utilizing CNN and LSTM-based Deep Networks : Abstract: A fundamental challenge in autonomous vehicles is adjusting the steering angle at different road conditions. Recent state-of-the-art solutions addressing this challenge include deep learning...
- Robustness and Adaptability of Reinforcement Learning based Cooperative Autonomous Driving in Mixed-autonomy Traffic : Abstract: Building autonomous vehicles (AVs) is a complex problem, but enabling them to operate in the real world where they will be surrounded by human-driven vehicles (HVs) is extremely challenging....
- Learning-based social coordination to improve safety and robustness of cooperative autonomous vehicles in mixed traffic : Abstract: It is expected that autonomous vehicles(AVs) and heterogeneous human-driven vehicles(HVs) will coexist on the same road. The safety and reliability of AVs will depend on their social awarene...
- Online Inference of Constrained Optimization: Primal-Dual Optimality and Sequential Quadratic Programming : Abstract: We study online statistical inference for the solutions of stochastic optimization problems with equality and inequality constraints. Such problems are prevalent in statistics and machine le...
- Multivariate time series prediction using clustered echo state network : Abstract: Many natural and physical processes can be understood by analyzing multiple system variables evolving, forming a multivariate time series. Predicting such time series is challenging due to t...
- FuXi-Nowcast: Meet the longstanding challenge of convective initiation in nowcasting : Abstract: Accurate nowcasting of convective storms remains a major challenge for operational forecasting, particularly for convective initiation and the evolution of high-impact rainfall and strong wi...
- Deterministic World Models for Verification of Closed-loop Vision-based Systems : Abstract: Verifying closed-loop vision-based control systems remains a fundamental challenge due to the high dimensionality of images and the difficulty of modeling visual environments. While generati...
- Luxical: High-Speed Lexical-Dense Text Embeddings : Abstract: Frontier language model quality increasingly hinges on our ability to organize web-scale text corpora for training. Today's dominant tools trade off speed and flexibility: lexical classifier...
- Interpretable machine learning of halo gas density profiles: a sensitivity analysis of cosmological hydrodynamical simulations : Abstract: Stellar and AGN-driven feedback processes affect the distribution of gas on a wide range of scales, from within galaxies well into the intergalactic medium. Yet, it remains unclear how feedb...
- SIP: Site in Pieces- A Dataset of Disaggregated Construction-Phase 3D Scans for Semantic Segmentation and Scene Understanding : Abstract: Accurate 3D scene interpretation in active construction sites is essential for progress monitoring, safety assessment, and digital twin development. LiDAR is widely used in construction beca...
- Banach neural operator for Navier-Stokes equations : Abstract: Classical neural networks are known for their ability to approximate mappings between finite-dimensional spaces, but they fall short in capturing complex operator dynamics across infinite-di...
- Causal Attribution of Model Performance Gaps in Medical Imaging Under Distribution Shifts : Abstract: Deep learning models for medical image segmentation suffer significant performance drops due to distribution shifts, but the causal mechanisms behind these drops remain poorly understood. We...
- Understanding temperature tuning in energy-based models : Abstract: Generative models of complex systems often require post-hoc parameter adjustments to produce useful outputs. For example, energy-based models for protein design are sampled at an artificiall...
- WTNN: Weibull-Tailored Neural Networks for survival analysis : Abstract: The Weibull distribution is a commonly adopted choice for modeling the survival of systems subject to maintenance over time. When only proxy indicators and censored observations are availabl...
- Robust and Sparse Estimation of Unbounded Density Ratio under Heavy Contamination : Abstract: We examine the non-asymptotic properties of robust density ratio estimation (DRE) in contaminated settings. Weighted DRE is the most promising among existing methods, exhibiting doubly stron...
- Impact of Positional Encoding: Clean and Adversarial Rademacher Complexity for Transformers under In-Context Regression : Abstract: Positional encoding (PE) is a core architectural component of Transformers, yet its impact on the Transformer's generalization and robustness remains unclear. In this work, we provide the fi...
- Distributional Shrinkage II: Optimal Transport Denoisers with Higher-Order Scores : Abstract: We revisit the signal denoising problem through the lens of optimal transport: the goal is to recover an unknown scalar signal distribution $X \sim P$ from noisy observations $Y = X + σZ$, w...
- Meta-learning three-factor plasticity rules for structured credit assignment with sparse feedback : Abstract: Biological neural networks learn complex behaviors from sparse, delayed feedback using local synaptic plasticity, yet the mechanisms enabling structured credit assignment remain elusive. In ...
- Detection and Localization of Subdural Hematoma Using Deep Learning on Computed Tomography : Abstract: Background. Subdural hematoma (SDH) is a common neurosurgical emergency, with increasing incidence in aging populations. Rapid and accurate identification is essential to guide timely interv...
- Generalizable Collaborative Search-and-Capture in Cluttered Environments via Path-Guided MAPPO and Directional Frontier Allocation : Abstract: Collaborative pursuit-evasion in cluttered environments presents significant challenges due to sparse rewards and constrained Fields of View (FOV). Standard Multi-Agent Reinforcement Learnin...
- WarmServe: Enabling One-for-Many GPU Prewarming for Multi-LLM Serving : Abstract: Deploying multiple models within shared GPU clusters is promising for improving resource efficiency in large language model (LLM) serving. Existing multi-LLM serving systems optimize GPU uti...
- Estimation of Stochastic Optimal Transport Maps : Abstract: The optimal transport (OT) map is a geometry-driven transformation between high-dimensional probability distributions which underpins a wide range of tasks in statistics, applied probability...
- Transport Novelty Distance: A Distributional Metric for Evaluating Material Generative Models : Abstract: Recent advances in generative machine learning have opened new possibilities for the discovery and design of novel materials. However, as these models become more sophisticated, the need for...
- Transformers for Tabular Data: A Training Perspective of Self-Attention via Optimal Transport : Abstract: This thesis examines self-attention training through the lens of Optimal Transport (OT) and develops an OT-based alternative for tabular classification. The study tracks intermediate project...
- Don't Throw Away Your Beams: Improving Consistency-based Uncertainties in LLMs via Beam Search : Abstract: Consistency-based methods have emerged as an effective approach to uncertainty quantification (UQ) in large language models. These methods typically rely on several generations obtained via ...
- Comparative Analysis of Hash-based Malware Clustering via K-Means : Abstract: With the adoption of multiple digital devices in everyday life, the cyber-attack surface has increased. Adversaries are continuously exploring new avenues to exploit them and deploy malware....
- SynthPix: A lightspeed PIV images generator : Abstract: We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX. SynthPix supports the ...
- OxEnsemble: Fair Ensembles for Low-Data Classification : Abstract: We address the problem of fair classification in settings where data is scarce and unbalanced across demographic groups. Such low-data regimes are common in domains like medical imaging, whe...
- Interpreto: An Explainability Library for Transformers : Abstract: Interpreto is a Python library for post-hoc explainability of text HuggingFace models, from early BERT variants to LLMs. It provides two complementary families of methods: attributions and c...
- Optimal certification of constant-local Hamiltonians : Abstract: We study the problem of certifying local Hamiltonians from real-time access to their dynamics. Given oracle access to $e^{-itH}$ for an unknown $k$-local Hamiltonian $H$ and a fully specifie...
- M3Net: A Multi-Metric Mixture of Experts Network Digital Twin with Graph Neural Networks : Abstract: The rise of 5G/6G network technologies promises to enable applications like autonomous vehicles and virtual reality, resulting in a significant increase in connected devices and necessarily ...
- OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations : Abstract: This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve...
- A roadmap of geospatial soil quality analysis systems : Abstract: Soil quality (SQ) plays a crucial role in sustainable agriculture, environmental conservation, and land-use planning. Traditional SQ assessment techniques rely on costly, labor-intensive sam...
- Fast Factorized Learning: Powered by In-Memory Database Systems : Abstract: Learning models over factorized joins avoids redundant computations by identifying and pre-computing shared cofactors. Previous work has investigated the performance gain when computing cofa...
- TCNN: Triple Convolutional Neural Network Models for Retrieval-based Question Answering System in E-commerce : Abstract: Automatic question-answering (QA) systems have boomed during last few years, and commonly used techniques can be roughly categorized into Information Retrieval (IR)-based and generation-base...
- Information-Theoretic Active Correlation Clustering : Abstract: Correlation clustering is a flexible framework for partitioning data based solely on pairwise similarity or dissimilarity information, without requiring the number of clusters as input. Howe...
- Hard Work Does Not Always Pay Off: Poisoning Attacks on Neural Architecture Search : Abstract: We study the robustness of data-centric methods to find neural network architectures, known as neural architecture search (NAS), against data poisoning. To audit this robustness, we design a...
- Entropy-Informed Weighting Channel Normalizing Flow for Deep Generative Models : Abstract: Normalizing Flows (NFs) are widely used in deep generative models for their exact likelihood estimation and efficient sampling. However, they require substantial memory since the latent sp...
- Point Neuron Learning: A New Physics-Informed Neural Network Architecture : Abstract: Machine learning and neural networks have advanced numerous research domains, but challenges such as large training data requirements and inconsistent model performance hinder their applicat...
- Memory Injection Attacks on LLM Agents via Query-Only Interaction : Abstract: Agents powered by large language models (LLMs) have demonstrated strong capabilities in a wide range of complex, real-world applications. However, LLM agents with a compromised memory bank m...
- Low-Dimensional Structure in the Space of Language Representations is Reflected in Brain Responses : Abstract: How related are the representations learned by neural language models, translation models, and language tagging tasks? We answer this question by adapting an encoder-decoder transfer learnin...
- Self Distillation Fine-Tuning of Protein Language Models Improves Versatility in Protein Design : Abstract: Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (P...
- Improved Physics-Driven Neural Network to Solve Inverse Scattering Problems : Abstract: This paper presents an improved physics-driven neural network (IPDNN) framework for solving electromagnetic inverse scattering problems (ISPs). A new Gaussian-localized oscillation-suppressi...
- A Granular Framework for Construction Material Price Forecasting: Econometric and Machine-Learning Approaches : Abstract: The persistent volatility of construction material prices poses significant risks to cost estimation, budgeting, and project delivery, underscoring the urgent need for granular and scalable ...
- KGOT: Unified Knowledge Graph and Optimal Transport Pseudo-Labeling for Molecule-Protein Interaction Prediction : Abstract: Predicting molecule-protein interactions (MPIs) is a fundamental task in computational biology, with crucial applications in drug discovery and molecular function annotation. However, existi...
- CFLight: Enhancing Safety with Traffic Signal Control through Counterfactual Learning : Abstract: Traffic accidents result in millions of injuries and fatalities globally, with a significant number occurring at intersections each year. Traffic Signal Control (TSC) is an effective strateg...
- Are Hypervectors Enough? Single-Call LLM Reasoning over Knowledge Graphs : Abstract: Recent advances in large language models (LLMs) have enabled strong reasoning over both structured and unstructured knowledge. When grounded on knowledge graphs (KGs), however, prevailing pi...
- Hands-on Evaluation of Visual Transformers for Object Recognition and Detection : Abstract: Convolutional Neural Networks (CNNs) for computer vision sometimes struggle with understanding images in a global context, as they mainly focus on local patterns. On the other hand, Vision T...
- Graph-Based Bayesian Optimization for Quantum Circuit Architecture Search with Uncertainty Calibrated Surrogates : Abstract: Quantum circuit design is a key bottleneck for practical quantum machine learning on complex, real-world data. We present an automated framework that discovers and refines variational quantu...
- Stanford Sleep Bench: Evaluating Polysomnography Pre-training Methods for Sleep Foundation Models : Abstract: Polysomnography (PSG), the gold standard test for sleep analysis, generates vast amounts of multimodal clinical data, presenting an opportunity to leverage self-supervised representation lea...
- ImageTalk: Designing a Multimodal AAC Text Generation System Driven by Image Recognition and Natural Language Generation : Abstract: People living with Motor Neuron Disease (plwMND) frequently encounter speech and motor impairments that necessitate a reliance on augmentative and alternative communication (AAC) systems. Th...
- Rethinking Chain-of-Thought Reasoning for Videos : Abstract: Chain-of-thought (CoT) reasoning has been highly successful in solving complex tasks in natural language processing, and recent multimodal large language models (MLLMs) have extended this pa...
- Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection : Abstract: Hate speech spreads widely online, harming individuals and communities, making automatic detection essential for large-scale moderation, yet detecting it remains difficult. Part of the chall...
- Drawback of Enforcing Equivariance and its Compensation via the Lens of Expressive Power : Abstract: Equivariant neural networks encode symmetry as an inductive bias and have achieved strong empirical performance in wide domains. However, their expressive power remains not well understood. ...
- The Ky Fan Norms and Beyond: Dual Norms and Combinations for Matrix Optimization : Abstract: In this article, we explore the use of various matrix norms for optimizing functions of weight matrices, a crucial problem in training large language models. Moving beyond the spectral norm ...
- Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies : Abstract: This work presents a conceptual study on the application of Multi-Agent Reinforcement Learning (MARL) for decentralized control of unmanned aerial vehicles to relay a critical data package t...
- Ethics Readiness of Artificial Intelligence: A Practical Evaluation Method : Abstract: We present Ethics Readiness Levels (ERLs), a four-level, iterative method to track how ethical reflection is implemented in the design of AI systems. ERLs bridge high-level ethical principle...
- Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs : Abstract: LLMs are useful because they generalize so well. But can you have too much of a good thing? We show that a small amount of finetuning in narrow contexts can dramatically shift behavior outsi...
- Circuits, Features, and Heuristics in Molecular Transformers : Abstract: Transformers generate valid and diverse chemical structures, but little is known about the mechanisms that enable these models to capture the rules of molecular representation. We present a ...
- Quantifying Uncertainty in Machine Learning-Based Pervasive Systems: Application to Human Activity Recognition : Abstract: The recent convergence of pervasive computing and machine learning has given rise to numerous services, impacting almost all areas of economic and social activity. However, the use of AI tec...
- PathCo-LatticE: Pathology-Constrained Lattice-Of Experts Framework for Fully-supervised Few-Shot Cardiac MRI Segmentation : Abstract: Few-shot learning (FSL) mitigates data scarcity in cardiac MRI segmentation but typically relies on semi-supervised techniques sensitive to domain shifts and validation bias, restricting zer...
- CHEM: Estimating and Understanding Hallucinations in Deep Learning for Image Processing : Abstract: U-Net and other U-shaped architectures have achieved significant success in image deconvolution tasks. However, challenges have emerged, as these methods might generate unrealistic artifacts...
- Composing Concepts from Images and Videos via Concept-prompt Binding : Abstract: Visual concept composition, which aims to integrate different elements from images and videos into a single, coherent visual output, still falls short in accurately extracting complex concep...
- LLMs in Interpreting Legal Documents : Abstract: This chapter explores the application of Large Language Models in the legal domain, showcasing their potential to optimise and augment traditional legal tasks by analysing possible use cases...
- MedForget: Hierarchy-Aware Multimodal Unlearning Testbed for Medical AI : Abstract: Pretrained Multimodal Large Language Models (MLLMs) are increasingly deployed in medical AI systems for clinical reasoning, diagnosis support, and report generation. However, their training ...
- FlipLLM: Efficient Bit-Flip Attacks on Multimodal LLMs using Reinforcement Learning : Abstract: Generative Artificial Intelligence models, such as Large Language Models (LLMs) and Large Vision Models (VLMs), exhibit state-of-the-art performance but remain vulnerable to hardware-based t...
- Provably Learning from Modern Language Models via Low Logit Rank : Abstract: While modern language models and their inner workings are incredibly complex, recent work (Golowich, Liu & Shetty; 2025) has proposed a simple and potentially tractable abstraction for them ...
- Visual Heading Prediction for Autonomous Aerial Vehicles : Abstract: The integration of Unmanned Aerial Vehicles (UAVs) and Unmanned Ground Vehicles (UGVs) is increasingly central to the development of intelligent autonomous systems for applications such as s...
- STACHE: Local Black-Box Explanations for Reinforcement Learning Policies : Abstract: Reinforcement learning agents often behave unexpectedly in sparse-reward or safety-critical environments, creating a strong need for reliable debugging and verification tools. In this paper,...
- Efficient Continual Learning in Neural Machine Translation: A Low-Rank Adaptation Approach : Abstract: Continual learning in Neural Machine Translation (NMT) faces the dual challenges of catastrophic forgetting and the high computational cost of retraining. This study establishes Low-Rank Ada...
- Supervised learning pays attention : Abstract: In-context learning with attention enables large neural networks to make context-specific predictions by selectively focusing on relevant examples. Here, we adapt this idea to supervised lea...
- FALCON: Few-step Accurate Likelihoods for Continuous Flows : Abstract: Scalable sampling of molecular states in thermodynamic equilibrium is a long-standing challenge in statistical physics. Boltzmann Generators tackle this problem by pairing a generative model...
- LISN: Language-Instructed Social Navigation with VLM-based Controller Modulating : Abstract: Towards human-robot coexistence, socially aware navigation is significant for mobile robots. Yet existing studies on this area focus mainly on path efficiency and pedestrian collision avoida...
- HeLoFusion: An Efficient and Scalable Encoder for Modeling Heterogeneous and Multi-Scale Interactions in Trajectory Prediction : Abstract: Multi-agent trajectory prediction in autonomous driving requires a comprehensive understanding of complex social dynamics. Existing methods, however, often struggle to capture the full richn...
- ALIGN: A Vision-Language Framework for High-Accuracy Accident Location Inference through Geo-Spatial Neural Reasoning : Abstract: Reliable geospatial information on road accidents is vital for safety analysis and infrastructure planning, yet most low- and middle-income countries continue to face a critical shortage of ...
- Active Inference in Discrete State Spaces from First Principles : Abstract: We seek to clarify the concept of active inference by disentangling it from the Free Energy Principle. We show how the optimizations that need to be carried out in order to implement active ...
- Persona-based Multi-Agent Collaboration for Brainstorming : Abstract: We demonstrate the importance of persona-based multi-agents brainstorming for both diverse topics and subject matter ideation. Prior work has shown that generalized multi-agent collaboration...
- A survey on the impacts of recommender systems on users, items, and human-AI ecosystems : Abstract: Recommendation systems and assistants (in short, recommenders) influence through online platforms most actions of our daily lives, suggesting items or providing solutions based on users' pre...
- Optimal Transportation by Orthogonal Coupling Dynamics : Abstract: Many numerical and learning algorithms rely on the solution of the Monge-Kantorovich problem and Wasserstein distances, which provide appropriate distributional metrics. While the natural ap...
- Self-Supervised Learning and Opportunistic Inference for Continuous Monitoring of Freezing of Gait in Parkinson's Disease : Abstract: Parkinson's disease (PD) is a progressive neurological disorder that impacts the quality of life significantly, making in-home monitoring of motor symptoms such as Freezing of Gait (FoG) cri...
- Optimizing Algorithms for Mobile Health Interventions with Active Querying Optimization : Abstract: Reinforcement learning in mobile health (mHealth) interventions requires balancing intervention efficacy with user burden, particularly when state measurements (for example, user surveys or ...
- DW-KNN: A Transparent Local Classifier Integrating Distance Consistency and Neighbor Reliability : Abstract: K-Nearest Neighbors (KNN) is one of the most used ML classifiers. However, if we observe closely, standard distance-weighted KNN and relative variants assume all 'k' neighbors are equally re...
- SEA: Spectral Edge Attacks on Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) achieve strong performance on graph-structured data, but are notoriously vulnerable to small, carefully crafted perturbations of the graph structure. Most existi...
- StructuredDNA: A Bio-Physical Framework for Energy-Aware Transformer Routing : Abstract: The rapid scaling of large computational models has led to a critical increase in energy and compute costs. Inspired by biological systems where structure and function emerge from low-energy...
- Graph Deep Learning for Intracranial Aneurysm Blood Flow Simulation and Risk Assessment : Abstract: Intracranial aneurysms remain a major cause of neurological morbidity and mortality worldwide, where rupture risk is tightly coupled to local hemodynamics particularly wall shear stress and ...
- Improving Multi-Class Calibration through Normalization-Aware Isotonic Techniques : Abstract: Accurate and reliable probability predictions are essential for multi-class supervised learning tasks, where well-calibrated models enable rational decision-making. While isotonic regression...
- A Diffusion-Based Framework for High-Resolution Precipitation Forecasting over CONUS : Abstract: Accurate precipitation forecasting is essential for hydrometeorological risk management, especially for anticipating extreme rainfall that can lead to flash flooding and infrastructure damag...
- Contrast transfer functions help quantify neural network out-of-distribution generalization in HRTEM : Abstract: Neural networks, while effective for tackling many challenging scientific tasks, are not known to perform well out-of-distribution (OOD), i.e., within domains which differ from their trainin...
- Modular Deep-Learning-Based Early Warning System for Deadly Heatwave Prediction : Abstract: Severe heatwaves in urban areas significantly threaten public health, calling for establishing early warning strategies. Despite predicting occurrence of heatwaves and attributing historical...
- GS-KAN: Parameter-Efficient Kolmogorov-Arnold Networks via Sprecher-Type Shared Basis Functions : Abstract: The Kolmogorov-Arnold representation theorem offers a theoretical alternative to Multi-Layer Perceptrons (MLPs) by placing learnable univariate functions on edges rather than nodes. While re...
- Natural Geometry of Robust Data Attribution: From Convex Models to Deep Networks : Abstract: Data attribution methods identify which training examples are responsible for a model's predictions, but their sensitivity to distributional perturbations undermines practical reliability. W...
- Learning Unmasking Policies for Diffusion Language Models : Abstract: Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during in...
- Spectral Embedding via Chebyshev Bases for Robust DeepONet Approximation : Abstract: Deep Operator Networks (DeepONets) have become a central tool in data-driven operator learning, providing flexible surrogates for nonlinear mappings arising in partial differential equations...
- Understanding the Failure Modes of Transformers through the Lens of Graph Neural Networks : Abstract: Transformers and more specifically decoder-only transformers dominate modern LLM architectures. While they have shown to work exceptionally well, they are not without issues, resulting in su...
- Contrastive Learning for Semi-Supervised Deep Regression with Generalized Ordinal Rankings from Spectral Seriation : Abstract: Contrastive learning methods enforce label distance relationships in feature space to improve representation capability for regression models. However, these methods highly depend on label i...
- Goal inference with Rao-Blackwellized Particle Filters : Abstract: Inferring the eventual goal of a mobile agent from noisy observations of its trajectory is a fundamental estimation problem. We initiate the study of such intent inference using a variant of...
- Self-Supervised Learning with Gaussian Processes : Abstract: Self supervised learning (SSL) is a machine learning paradigm where models learn to understand the underlying structure of data without explicit supervision from labeled samples. The acquire...
- When AI Gives Advice: Evaluating AI and Human Responses to Online Advice-Seeking for Well-Being : Abstract: Seeking advice is a core human behavior that the Internet has reinvented twice: first through forums and Q\&A communities that crowdsource public guidance, and now through large language mod...
- Assessing the Human-Likeness of LLM-Driven Digital Twins in Simulating Health Care System Trust : Abstract: Serving as an emerging and powerful tool, Large Language Model (LLM)-driven Human Digital Twins are showing great potential in healthcare system research. However, its actual simulation abil...
- Beyond Technical Debt: How AI-Assisted Development Creates Comprehension Debt in Resource-Constrained Indie Teams : Abstract: Junior indie game developers in distributed, part-time teams lack production frameworks suited to their specific context, as traditional methodologies are often inaccessible. This study intr...
- Noise-Robust Abstractive Compression in Retrieval-Augmented Language Models : Abstract: Abstractive compression utilizes smaller langauge models to condense query-relevant context, reducing computational costs in retrieval-augmented generation (RAG). However, retrieved document...
- The Linguistic Architecture of Reflective Thought: Evaluation of a Large Language Model as a Tool to Isolate the Formal Structure of Mentalization : Abstract: Background: Mentalization integrates cognitive, affective, and intersubjective components. Large Language Models (LLMs) display an increasing ability to generate reflective texts, raising qu...
- AI Co-Artist: A LLM-Powered Framework for Interactive GLSL Shader Animation Evolution : Abstract: Creative coding and real-time shader programming are at the forefront of interactive digital art, enabling artists, designers, and enthusiasts to produce mesmerizing, complex visual effects ...
- Learning When to Ask: Simulation-Trained Humanoids for Mental-Health Diagnosis : Abstract: Testing humanoid robots with users is slow, causes wear, and limits iteration and diversity. Yet screening agents must master conversational timing, prosody, backchannels, and what to attend...
- SimClinician: A Multimodal Simulation Testbed for Reliable Psychologist AI Collaboration in Mental Health Diagnosis : Abstract: AI based mental health diagnosis is often judged by benchmark accuracy, yet in practice its value depends on how psychologists respond whether they accept, adjust, or reject AI suggestions. ...
- An Electrocardiogram Multi-task Benchmark with Comprehensive Evaluations and Insightful Findings : Abstract: In the process of patient diagnosis, non-invasive measurements are widely used due to their low risks and quick results. Electrocardiogram (ECG), as a non-invasive method to collect heart ac...
- LLM4XCE: Large Language Models for Extremely Large-Scale Massive MIMO Channel Estimation : Abstract: Extremely large-scale massive multiple-input multiple-output (XL-MIMO) is a key enabler for sixth-generation (6G) networks, offering massive spatial degrees of freedom. Despite these advanta...
- LUMOS: Large User MOdels for User Behavior Prediction : Abstract: User behavior prediction at scale remains a critical challenge for online B2C platforms. Traditional approaches rely heavily on task-specific models and domain-specific feature engineering. ...
- EEG-Bench: A Benchmark for EEG Foundation Models in Clinical Applications : Abstract: We introduce a unified benchmarking framework focused on evaluating EEG-based foundation models in clinical applications. The benchmark spans 11 well-defined diagnostic tasks across 14 publi...
- Resolving Conflicts in Lifelong Learning via Aligning Updates in Subspaces : Abstract: Low-Rank Adaptation (LoRA) enables efficient Continual Learning but often suffers from catastrophic forgetting due to destructive interference between tasks. Our analysis reveals that this d...
- Financial Instruction Following Evaluation (FIFE) : Abstract: Language Models (LMs) struggle with complex, interdependent instructions, particularly in high-stakes domains like finance where precision is critical. We introduce FIFE, a novel, high-diffi...
- CluCERT: Certifying LLM Robustness via Clustering-Guided Denoising Smoothing : Abstract: Recent advancements in Large Language Models (LLMs) have led to their widespread adoption in daily applications. Despite their impressive capabilities, they remain vulnerable to adversarial ...
- Learning Robust Representations for Malicious Content Detection via Contrastive Sampling and Uncertainty Estimation : Abstract: We propose the Uncertainty Contrastive Framework (UCF), a Positive-Unlabeled (PU) representation learning framework that integrates uncertainty-aware contrastive loss, adaptive temperature s...
- Enhancing Automatic Speech Recognition Through Integrated Noise Detection Architecture : Abstract: This research presents a novel approach to enhancing automatic speech recognition systems by integrating noise detection capabilities directly into the recognition architecture. Building upo...
- Peek-a-Boo Reasoning: Contrastive Region Masking in MLLMs : Abstract: We introduce Contrastive Region Masking (CRM), a training free diagnostic that reveals how multimodal large language models (MLLMs) depend on specific visual regions at each step of chain-of...
- Institutional AI Sovereignty Through Gateway Architecture: Implementation Report from Fontys ICT : Abstract: To counter fragmented, high-risk adoption of commercial AI tools, we built and ran an institutional AI platform in a six-month, 300-user pilot, showing that a university of applied sciences ...
- What Happens When: Learning Temporal Orders of Events in Videos : Abstract: Video Large Multimodal Models (VLMMs) have shown impressive performance in video understanding, yet their ability to accurately capture the temporal order of multiple events remains underexp...
- Training Multi-Image Vision Agents via End2End Reinforcement Learning : Abstract: Recent VLM-based agents aim to replicate OpenAI O3's ``thinking with images" via tool use, but most open-source methods limit input to a single image, falling short on real-world multi-image...
- Mitigating Bias with Words: Inducing Demographic Ambiguity in Face Recognition Templates by Text Encoding : Abstract: Face recognition (FR) systems are often prone to demographic biases, partially due to the entanglement of demographic-specific information with identity-relevant features in facial embedding...
- Consist-Retinex: One-Step Noise-Emphasized Consistency Training Accelerates High-Quality Retinex Enhancement : Abstract: Diffusion models have achieved remarkable success in low-light image enhancement through Retinex-based decomposition, yet their requirement for hundreds of iterative sampling steps severely ...
- HSCP: A Two-Stage Spectral Clustering Framework for Resource-Constrained UAV Identification : Abstract: With the rapid development of Unmanned Aerial Vehicles (UAVs) and the increasing complexity of low-altitude security threats, traditional UAV identification methods struggle to extract relia...
- RAG-HAR: Retrieval Augmented Generation-based Human Activity Recognition : Abstract: Human Activity Recognition (HAR) underpins applications in healthcare, rehabilitation, fitness tracking, and smart environments, yet existing deep learning approaches demand dataset-specific...
- Explainable Fundus Image Curation and Lesion Detection in Diabetic Retinopathy : Abstract: Diabetic Retinopathy (DR) affects individuals with long-term diabetes. Without early diagnosis, DR can lead to vision loss. Fundus photography captures the structure of the retina along with...
- 3DID: Direct 3D Inverse Design for Aerodynamics with Physics-Aware Optimization : Abstract: Inverse design aims to design the input variables of a physical system to optimize a specified objective function, typically formulated as a search or optimization problem. However, in 3D do...
- Enhanced Chest Disease Classification Using an Improved CheXNet Framework with EfficientNetV2-M and Optimization-Driven Learning : Abstract: The interpretation of Chest X-ray is an important diagnostic issue in clinical practice and especially in the resource-limited setting where the shortage of radiologists plays a role in dela...
- Demo: Generative AI helps Radiotherapy Planning with User Preference : Abstract: Radiotherapy planning is a highly complex process that often varies significantly across institutions and individual planners. Most existing deep learning approaches for 3D dose prediction r...
- DermETAS-SNA LLM: A Dermatology Focused Evolutionary Transformer Architecture Search with StackNet Augmented LLM Assistant : Abstract: Our work introduces the DermETAS-SNA LLM Assistant that integrates Dermatology-focused Evolutionary Transformer Architecture Search with StackNet Augmented LLM. The assistant dynamically lea...
- A Physics-Constrained, Design-Driven Methodology for Defect Dataset Generation in Optical Lithography : Abstract: The efficacy of Artificial Intelligence (AI) in micro/nano manufacturing is fundamentally constrained by the scarcity of high-quality and physically grounded training data for defect inspect...
- Digital Modeling of Spatial Pathway Activity from Histology Reveals Tumor Microenvironment Heterogeneity : Abstract: Spatial transcriptomics (ST) enables simultaneous mapping of tissue morphology and spatially resolved gene expression, offering unique opportunities to study tumor microenvironment heterogen...
- Llama-based source code vulnerability detection: Prompt engineering vs Fine tuning : Abstract: The significant increase in software production, driven by the acceleration of development cycles over the past two decades, has led to a steady rise in software vulnerabilities, as shown by...
- Towards Lossless Ultimate Vision Token Compression for VLMs : Abstract: Visual language models encounter challenges in computational efficiency and latency, primarily due to the substantial redundancy in the token representations of high-resolution images and vi...
- Monitoring Deployed AI Systems in Health Care : Abstract: Post-deployment monitoring of artificial intelligence (AI) systems in health care is essential to ensure their safety, quality, and sustained benefit-and to support governance decisions abou...
- ShelfAware: Real-Time Visual-Inertial Semantic Localization in Quasi-Static Environments with Low-Cost Sensors : Abstract: Many indoor workspaces are quasi-static: global layout is stable but local semantics change continually, producing repetitive geometry, dynamic clutter, and perceptual noise that defeat visi...
- ORCA: Open-ended Response Correctness Assessment for Audio Question Answering : Abstract: Evaluating open-ended responses from large audio language models (LALMs) is challenging because human annotators often genuinely disagree on answer correctness due to multiple valid interpre...
- KD-OCT: Efficient Knowledge Distillation for Clinical-Grade Retinal OCT Classification : Abstract: Age-related macular degeneration (AMD) and choroidal neovascularization (CNV)-related conditions are leading causes of vision loss worldwide, with optical coherence tomography (OCT) serving ...
- Beyond the Hype: Comparing Lightweight and Deep Learning Models for Air Quality Forecasting : Abstract: Accurate forecasting of urban air pollution is essential for protecting public health and guiding mitigation policies. While Deep Learning (DL) and hybrid pipelines dominate recent research,...
- Mental Models of Autonomy and Sentience Shape Reactions to AI : Abstract: Narratives about artificial intelligence (AI) entangle autonomy, the capacity to self-govern, with sentience, the capacity to sense and feel. AI agents that perform tasks autonomously and co...
- Masked Generative Policy for Robotic Control : Abstract: We present Masked Generative Policy (MGP), a novel framework for visuomotor imitation learning. We represent actions as discrete tokens, and train a conditional masked transformer that gener...
- Evolving Excellence: Automated Optimization of LLM-based Agents : Abstract: Agentic AI systems built on large language models (LLMs) offer significant potential for automating complex workflows, from software development to customer support. However, LLM agents ofte...
- Semantic Trajectory Generation for Goal-Oriented Spacecraft Rendezvous : Abstract: Reliable real-time trajectory generation is essential for future autonomous spacecraft. While recent progress in nonconvex guidance and control is paving the way for onboard autonomous traje...
- Knowledge-Guided Large Language Model for Automatic Pediatric Dental Record Understanding and Safe Antibiotic Recommendation : Abstract: Accurate interpretation of pediatric dental clinical records and safe antibiotic prescribing remain persistent challenges in dental informatics. Traditional rule-based clinical decision supp...
- Integrated Pipeline for Coronary Angiography With Automated Lesion Profiling, Virtual Stenting, and 100-Vessel FFR Validation : Abstract: Coronary angiography is the main tool for assessing coronary artery disease, but visual grading of stenosis is variable and only moderately related to ischaemia. Wire based fractional flow r...
- Detecting Hallucinations in Graph Retrieval-Augmented Generation via Attention Patterns and Semantic Alignment : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) enhances Large Language Models (LLMs) by incorporating external knowledge from linearized subgraphs retrieved from knowledge graphs. How...
- MindShift: Analyzing Language Models' Reactions to Psychological Prompts : Abstract: Large language models (LLMs) hold the potential to absorb and reflect personality traits and attitudes specified by users. In our study, we investigated this potential using robust psychomet...
- WonderZoom: Multi-Scale 3D World Generation : Abstract: We present WonderZoom, a novel approach to generating 3D scenes with contents across multiple spatial scales from a single image. Existing 3D world generation models remain limited to single...
- AI-Driven Expansion and Application of the Alexandria Database : Abstract: We present a novel multi-stage workflow for computational materials discovery that achieves a 99% success rate in identifying compounds within 100 meV/atom of thermodynamic stability, with a...
- Prompt-Based Continual Compositional Zero-Shot Learning : Abstract: We tackle continual adaptation of vision-language models to new attributes, objects, and their compositions in Compositional Zero-Shot Learning (CZSL), while preventing forgetting of prior k...
- Learning Patient-Specific Disease Dynamics with Latent Flow Matching for Longitudinal Imaging Generation : Abstract: Understanding disease progression is a central clinical challenge with direct implications for early diagnosis and personalized treatment. While recent generative approaches have attempted t...
- WOLF: Werewolf-based Observations for LLM Deception and Falsehoods : Abstract: Deception is a fundamental challenge for multi-agent reasoning: effective systems must strategically conceal information while detecting misleading behavior in others. Yet most evaluations r...
- Understanding Mental States in Active and Autonomous Driving with EEG : Abstract: Understanding how driver mental states differ between active and autonomous driving is critical for designing safe human-vehicle interfaces. This paper presents the first EEG-based compariso...
- Towards Optimal Valve Prescription for Transcatheter Aortic Valve Replacement (TAVR) Surgery: A Machine Learning Approach : Abstract: Transcatheter Aortic Valve Replacement (TAVR) has emerged as a minimally invasive treatment option for patients with severe aortic stenosis, a life-threatening cardiovascular condition. Mult...
- LLMs for Analog Circuit Design Continuum (ACDC) : Abstract: Large Language Models (LLMs) and transformer architectures have shown impressive reasoning and generation capabilities across diverse natural language tasks. However, their reliability and r...
- Tensor-Compressed and Fully-Quantized Training of Neural PDE Solvers : Abstract: Physics-Informed Neural Networks (PINNs) have emerged as a promising paradigm for solving partial differential equations (PDEs) by embedding physical laws into neural network training object...
- CORE: A Conceptual Reasoning Layer for Large Language Models : Abstract: Large language models handle single-turn generation well, but multi-turn interactions still require the model to reconstruct user intent and task state from an expanding token history becaus...
- A Clinically Interpretable Deep CNN Framework for Early Chronic Kidney Disease Prediction Using Grad-CAM-Based Explainable AI : Abstract: Chronic Kidney Disease (CKD) constitutes a major global medical burden, marked by the gradual deterioration of renal function, which results in the impaired clearance of metabolic waste and ...
- GLACIA: Instance-Aware Positional Reasoning for Glacial Lake Segmentation via Multimodal Large Language Model : Abstract: Glacial lake monitoring bears great significance in mitigating the anticipated risk of Glacial Lake Outburst Floods. However, existing segmentation methods based on convolutional neural netw...
- FBA$^2$D: Frequency-based Black-box Attack for AI-generated Image Detection : Abstract: The prosperous development of Artificial Intelligence-Generated Content (AIGC) has brought people's anxiety about the spread of false information on social media. Designing detectors for fil...
- Identifying Bias in Machine-generated Text Detection : Abstract: The meteoric rise in text generation capability has been accompanied by parallel growth in interest in machine-generated text detection: the capability to identify whether a given text was g...
- Hetero-SplitEE: Split Learning of Neural Networks with Early Exits for Heterogeneous IoT Devices : Abstract: The continuous scaling of deep neural networks has fundamentally transformed machine learning, with larger models demonstrating improved performance across diverse tasks. This growth in mode...
- Functional Percolation: A Perspective on Criticality of Form and Function : Abstract: Understanding the physical constraints and minimal conditions that enable information processing in extended systems remains a central challenge across disciplines, from neuroscience and art...
- Simultaneous Genetic Evolution of Neural Networks for Optimal SFC Embedding : Abstract: The reliance of organisations on computer networks is enabled by network programmability, which is typically achieved through Service Function Chaining. These chains virtualise network funct...
- Efficiency-Aware Computational Intelligence for Resource-Constrained Manufacturing Toward Edge-Ready Deployment : Abstract: Industrial cyber physical systems operate under heterogeneous sensing, stochastic dynamics, and shifting process conditions, producing data that are often incomplete, unlabeled, imbalanced, ...
- Branching Strategies Based on Subgraph GNNs: A Study on Theoretical Promise versus Practical Reality : Abstract: Graph Neural Networks (GNNs) have emerged as a promising approach for ``learning to branch'' in Mixed-Integer Linear Programming (MILP). While standard Message-Passing GNNs (MPNNs) are effic...
- Log NeRF: Comparing Spaces for Learning Radiance Fields : Abstract: Neural Radiance Fields (NeRF) have achieved remarkable results in novel view synthesis, typically using sRGB images for supervision. However, little attention has been paid to the color spac...
- BugSweeper: Function-Level Detection of Smart Contract Vulnerabilities Using Graph Neural Networks : Abstract: The rapid growth of Ethereum has made it more important to quickly and accurately detect smart contract vulnerabilities. While machine-learning-based methods have shown some promise, many st...
- CONCUR: A Framework for Continual Constrained and Unconstrained Routing : Abstract: AI tasks differ in complexity and are best addressed with different computation strategies (e.g., combinations of models and decoding methods). Hence, an effective routing system that maps t...
- GAIR: GUI Automation via Information-Joint Reasoning and Group Reflection : Abstract: Building AI systems for GUI automation task has attracted remarkable research efforts, where MLLMs are leveraged for processing user requirements and give operations. However, GUI automation...
- Towards Resilient Transportation: A Conditional Transformer for Accident-Informed Traffic Forecasting : Abstract: Traffic prediction remains a key challenge in spatio-temporal data mining, despite progress in deep learning. Accurate forecasting is hindered by the complex influence of external factors su...
- H2R-Grounder: A Paired-Data-Free Paradigm for Translating Human Interaction Videos into Physically Grounded Robot Videos : Abstract: Robots that learn manipulation skills from everyday human videos could acquire broad capabilities without tedious robot data collection. We propose a video-to-video translation framework tha...
- ODMA: On-Demand Memory Allocation Framework for LLM Serving on LPDDR-Class Accelerators : Abstract: Serving large language models (LLMs) on accelerators with poor random-access bandwidth (e.g., LPDDR5-based) is limited by current memory managers. Static pre-allocation wastes memory, while ...
- CourtPressGER: A German Court Decision to Press Release Summarization Dataset : Abstract: Official court press releases from Germany's highest courts present and explain judicial rulings to the public, as well as to expert audiences. Prior NLP efforts emphasize technical headnote...
- Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model : Abstract: Class-incremental learning requires a learning system to continually learn knowledge of new classes and meanwhile try to preserve previously learned knowledge of old classes. As current stat...
- Advancing Research via Human-AI Interactive Theorem Proving : Abstract: We investigate how large language models can be used as research tools in scientific computing while preserving mathematical rigor. We propose a human-in-the-loop workflow for interactive th...
- Cytoplasmic Strings Analysis in Human Embryo Time-Lapse Videos using Deep Learning Framework : Abstract: Infertility is a major global health issue, and while in-vitro fertilization has improved treatment outcomes, embryo selection remains a critical bottleneck. Time-lapse imaging enables conti...
- Privacy-Preserving Computer Vision for Industry: Three Case Studies in Human-Centric Manufacturing : Abstract: The adoption of AI-powered computer vision in industry is often constrained by the need to balance operational utility with worker privacy. Building on our previously proposed privacy-preser...
- Temporal-Spatial Tubelet Embedding for Cloud-Robust MSI Reconstruction using MSI-SAR Fusion: A Multi-Head Self-Attention Video Vision Transformer Approach : Abstract: Cloud cover in multispectral imagery (MSI) significantly hinders early-season crop mapping by corrupting spectral information. Existing Vision Transformer(ViT)-based time-series reconstructi...
- Color encoding in Latent Space of Stable Diffusion Models : Abstract: Recent advances in diffusion-based generative models have achieved remarkable visual fidelity, yet a detailed understanding of how specific perceptual attributes - such as color and shape - ...
- Advancing LLM-Based Security Automation with Customized Group Relative Policy Optimization for Zero-Touch Networks : Abstract: Zero-Touch Networks (ZTNs) represent a transformative paradigm toward fully automated and intelligent network management, providing the scalability and adaptability required for the complexi...
- RouteRAG: Efficient Retrieval-Augmented Generation from Text and Graph via Reinforcement Learning : Abstract: Retrieval-Augmented Generation (RAG) integrates non-parametric knowledge into Large Language Models (LLMs), typically from unstructured texts and structured graphs. While recent progress has...
- Representation Invariance and Allocation: When Subgroup Balance Matters : Abstract: Unequal representation of demographic groups in training data poses challenges to model generalisation across populations. Standard practice assumes that balancing subgroup representation op...
- NeuroSketch: An Effective Framework for Neural Decoding via Systematic Architectural Optimization : Abstract: Neural decoding, a critical component of Brain-Computer Interface (BCI), has recently attracted increasing research interest. Previous research has focused on leveraging signal processing an...
- SWEnergy: An Empirical Study on Energy Efficiency in Agentic Issue Resolution Frameworks with SLMs : Abstract: Context. LLM-based autonomous agents in software engineering rely on large, proprietary models, limiting local deployment. This has spurred interest in Small Language Models (SLMs), but thei...
- System Report for CCL25-Eval Task 10: Prompt-Driven Large Language Model Merge for Fine-Grained Chinese Hate Speech Detection : Abstract: The proliferation of hate speech on Chinese social media poses urgent societal risks, yet traditional systems struggle to decode context-dependent rhetorical strategies and evolving slang. T...
- The Gender Code: Gendering the Global Governance of Artificial Intelligence : Abstract: This paper examines how international AI governance frameworks address gender issues and gender-based harms. The analysis covers binding regulations, such as the EU AI Act; soft law instrume...
- Lazy Diffusion: Mitigating spectral collapse in generative diffusion-based stable autoregressive emulation of turbulent flows : Abstract: Turbulent flows posses broadband, power-law spectra in which multiscale interactions couple high-wavenumber fluctuations to large-scale dynamics. Although diffusion-based generative models o...
- Auto-BenchmarkCard: Automated Synthesis of Benchmark Documentation : Abstract: We present Auto-BenchmarkCard, a workflow for generating validated descriptions of AI benchmarks. Benchmark documentation is often incomplete or inconsistent, making it difficult to interpre...
- Calibrated Trust in Dealing with LLM Hallucinations: A Qualitative Study : Abstract: Hallucinations are outputs by Large Language Models (LLMs) that are factually incorrect yet appear plausible [1]. This paper investigates how such hallucinations influence users' trust in LL...
- AI TIPS 2.0: A Comprehensive Framework for Operationalizing AI Governance : Abstract: The deployment of AI systems faces three critical governance challenges that current frameworks fail to adequately address. First, organizations struggle with inadequate risk assessment at t...
- A Categorical Analysis of Large Language Models and Why LLMs Circumvent the Symbol Grounding Problem : Abstract: This paper presents a formal, categorical framework for analysing how humans and large language models (LLMs) transform content into truth-evaluated propositions about a state space of possi...
- SDialog: A Python Toolkit for End-to-End Agent Building, User Simulation, Dialog Generation, and Evaluation : Abstract: We present SDialog, an MIT-licensed open-source Python toolkit that unifies dialog generation, evaluation and mechanistic interpretability into a single end-to-end framework for building and...
- Visual Categorization Across Minds and Models: Cognitive Analysis of Human Labeling and Neuro-Symbolic Integration : Abstract: Understanding how humans and AI systems interpret ambiguous visual stimuli offers critical insight into the nature of perception, reasoning, and decision-making. This paper examines image la...
- Architectures for Building Agentic AI : Abstract: This chapter argues that the reliability of agentic and generative AI is chiefly an architectural property. We define agentic systems as goal-directed, tool-using decision makers operating i...
- Toward Closed-loop Molecular Discovery via Language Model, Property Alignment and Strategic Search : Abstract: Drug discovery is a time-consuming and expensive process, with traditional high-throughput and docking-based virtual screening hampered by low success rates and limited scalability. Recent a...
- An End-to-end Planning Framework with Agentic LLMs and PDDL : Abstract: We present an end-to-end framework for planning supported by verifiers. An orchestrator receives a human specification written in natural language and converts it into a PDDL (Planning Domai...
- Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions : Abstract: Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In envir...
- Analyzing Planner Design Trade-offs for MAPF under Realistic Simulation : Abstract: Multi-Agent Path Finding (MAPF) algorithms are increasingly deployed in industrial warehouses and automated manufacturing facilities, where robots must operate reliably under real-world phys...
- RIFT: A Scalable Methodology for LLM Accelerator Fault Assessment using Reinforcement Learning : Abstract: The massive scale of modern AI accelerators presents critical challenges to traditional fault assessment methodologies, which face prohibitive computational costs and provide poor coverage o...
- Interpretation as Linear Transformation: A Cognitive-Geometric Model of Belief and Meaning : Abstract: This paper develops a geometric framework for modeling belief, motivation, and influence across cognitively heterogeneous agents. Each agent is represented by a personalized value space, a v...
- Comparing AI Agents to Cybersecurity Professionals in Real-World Penetration Testing : Abstract: We present the first comprehensive evaluation of AI agents against human cybersecurity professionals in a live enterprise environment. We evaluate ten cybersecurity professionals alongside s...
- Human-in-the-Loop and AI: Crowdsourcing Metadata Vocabulary for Materials Science : Abstract: Metadata vocabularies are essential for advancing FAIR and FARR data principles, but their development constrained by limited human resources and inconsistent standardization practices. This...
- SCOPE: Language Models as One-Time Teacher for Hierarchical Planning in Text Environments : Abstract: Long-term planning in complex, text-based environments presents significant challenges due to open-ended action spaces, ambiguous observations, and sparse feedback. Recent research suggests ...
- Bayesian Networks, Markov Networks, Moralisation, Triangulation: a Categorical Perspective : Abstract: Moralisation and Triangulation are transformations allowing to switch between different ways of factoring a probability distribution into a graphical model. Moralisation allows to view a Bay...
- Altruistic Maneuver Planning for Cooperative Autonomous Vehicles Using Multi-agent Advantage Actor-Critic : Abstract: With the adoption of autonomous vehicles on our roads, we will witness a mixed-autonomy environment where autonomous and human-driven vehicles must learn to co-exist by sharing the same road...
- Prediction-aware and Reinforcement Learning based Altruistic Cooperative Driving : Abstract: Autonomous vehicle (AV) navigation in the presence of Human-driven vehicles (HVs) is challenging, as HVs continuously update their policies in response to AVs. In order to navigate safely in...
- Agentic AI as Undercover Teammates: Argumentative Knowledge Construction in Hybrid Human-AI Collaborative Learning : Abstract: Generative artificial intelligence (AI) agents are increasingly embedded in collaborative learning environments, yet their impact on the processes of argumentative knowledge construction rem...
- Motion2Meaning: A Clinician-Centered Framework for Contestable LLM in Parkinson's Disease Gait Interpretation : Abstract: AI-assisted gait analysis holds promise for improving Parkinson's Disease (PD) care, but current clinical dashboards lack transparency and offer no meaningful way for clinicians to interroga...
- A Principle-based Framework for the Development and Evaluation of Large Language Models for Health and Wellness : Abstract: The incorporation of generative artificial intelligence into personal health applications presents a transformative opportunity for personalized, data-driven health and fitness guidance, yet...
Research Sources: 347 | Generated: 12/12/2025
