AI Research News Feeds for December 17th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Erasing CLIP Memories: Non-Destructive, Data-Free Zero-Shot class Unlearning in CLIP Models : Abstract: We introduce a novel, closed-form approach for selective unlearning in multimodal models, specifically targeting pretrained models such as CLIP. Our method leverages nullspace projection to ...
SketchAssist: A Practical Assistant for Semantic Edits and Precise Local Redrawing : Abstract: Sketch editing is central to digital illustration, yet existing image editing systems struggle to preserve the sparse, style-sensitive structure of line art while supporting both high-level ...
CIS-BA: Continuous Interaction Space Based Backdoor Attack for Object Detection in the Real-World : Abstract: Object detection models deployed in real-world applications such as autonomous driving face serious threats from backdoor attacks. Despite their practical effectiveness,existing methods are ...
FastDDHPose: Towards Unified, Efficient, and Disentangled 3D Human Pose Estimation : Abstract: Recent approaches for monocular 3D human pose estimation (3D HPE) have achieved leading performance by directly regressing 3D poses from 2D keypoint sequences. Despite the rapid progress in ...
Improving Semantic Uncertainty Quantification in LVLMs with Semantic Gaussian Processes : Abstract: Large Vision-Language Models (LVLMs) often produce plausible but unreliable outputs, making robust uncertainty estimation essential. Recent work on semantic uncertainty estimates relies on e...
Spherical Voronoi: Directional Appearance as a Differentiable Partition of the Sphere : Abstract: Radiance field methods (e.g. 3D Gaussian Splatting) have emerged as a powerful paradigm for novel view synthesis, yet their appearance modeling often relies on Spherical Harmonics (SH), whic...
Fracture Morphology Classification: Local Multiclass Modeling for Multilabel Complexity : Abstract: Between $15\,\%$ and $45\,\%$ of children experience a fracture during their growth years, making accurate diagnosis essential. Fracture morphology, alongside location and fragment angle, is...
Beyond a Single Light: A Large-Scale Aerial Dataset for Urban Scene Reconstruction Under Varying Illumination : Abstract: Recent advances in Neural Radiance Fields and 3D Gaussian Splatting have demonstrated strong potential for large-scale UAV-based 3D reconstruction tasks by fitting the appearance of images. ...
DRAW2ACT: Turning Depth-Encoded Trajectories into Robotic Demonstration Videos : Abstract: Video diffusion models provide powerful real-world simulators for embodied AI but remain limited in controllability for robotic manipulation. Recent works on trajectory-conditioned video gen...
History-Enhanced Two-Stage Transformer for Aerial Vision-and-Language Navigation : Abstract: Aerial Vision-and-Language Navigation (AVLN) requires Unmanned Aerial Vehicle (UAV) agents to localize targets in large-scale urban environments based on linguistic instructions. While succe...
OmniGen: Unified Multimodal Sensor Generation for Autonomous Driving : Abstract: Autonomous driving has seen remarkable advancements, largely driven by extensive real-world data collection. However, acquiring diverse and corner-case data remains costly and inefficient. G...
Multi-View MRI Approach for Classification of MGMT Methylation in Glioblastoma Patients : Abstract: The presence of MGMT promoter methylation significantly affects how well chemotherapy works for patients with Glioblastoma Multiforme (GBM). Currently, confirmation of MGMT promoter methylat...
ViBES: A Conversational Agent with Behaviorally-Intelligent 3D Virtual Body : Abstract: Human communication is inherently multimodal and social: words, prosody, and body language jointly carry intent. Yet most prior systems model human behavior as a translation task co-speech g...
4D-RaDiff: Latent Diffusion for 4D Radar Point Cloud Generation : Abstract: Automotive radar has shown promising developments in environment perception due to its cost-effectiveness and robustness in adverse weather conditions. However, the limited availability of a...
Elastic3D: Controllable Stereo Video Conversion with Guided Latent Decoding : Abstract: The growing demand for immersive 3D content calls for automated monocular-to-stereo video conversion. We present Elastic3D, a controllable, direct end-to-end method for upgrading a conventio...
Enhancing Visual Programming for Visual Reasoning via Probabilistic Graphs : Abstract: Recently, Visual Programming (VP) based on large language models (LLMs) has rapidly developed and demonstrated significant potential in complex Visual Reasoning (VR) tasks. Previous works to...
DriverGaze360: OmniDirectional Driver Attention with Object-Level Guidance : Abstract: Predicting driver attention is a critical problem for developing explainable autonomous driving systems and understanding driver behavior in mixed human-autonomous vehicle traffic scenarios....
Zoom-Zero: Reinforced Coarse-to-Fine Video Understanding via Temporal Zoom-in : Abstract: Grounded video question answering (GVQA) aims to localize relevant temporal segments in videos and generate accurate answers to a given question; however, large video-language models (LVLMs)...
SS4D: Native 4D Generative Model via Structured Spacetime Latents : Abstract: We present SS4D, a native 4D generative model that synthesizes dynamic 3D objects directly from monocular video. Unlike prior approaches that construct 4D representations by optimizing over ...
PSMamba: Progressive Self-supervised Vision Mamba for Plant Disease Recognition : Abstract: Self-supervised Learning (SSL) has become a powerful paradigm for representation learning without manual annotations. However, most existing frameworks focus on global alignment and struggle...
Vector Prism: Animating Vector Graphics by Stratifying Semantic Structure : Abstract: Scalable Vector Graphics (SVG) are central to modern web design, and the demand to animate them continues to grow as web environments become increasingly dynamic. Yet automating the animatio...
HGS: Hybrid Gaussian Splatting with Static-Dynamic Decomposition for Compact Dynamic View Synthesis : Abstract: Dynamic novel view synthesis (NVS) is essential for creating immersive experiences. Existing approaches have advanced dynamic NVS by introducing 3D Gaussian Splatting (3DGS) with implicit de...
Mimicking Human Visual Development for Learning Robust Image Representations : Abstract: The human visual system is remarkably adept at adapting to changes in the input distribution; a capability modern convolutional neural networks (CNNs) still struggle to match. Drawing inspir...
Unified Semantic Transformer for 3D Scene Understanding : Abstract: Holistic 3D scene understanding involves capturing and parsing unstructured 3D environments. Due to the inherent complexity of the real world, existing models have predominantly been develop...
Optimizing Rank for High-Fidelity Implicit Neural Representations : Abstract: Implicit Neural Representations (INRs) based on vanilla Multi-Layer Perceptrons (MLPs) are widely believed to be incapable of representing high-frequency content. This has directed research ...
EcoScapes: LLM-Powered Advice for Crafting Sustainable Cities : Abstract: Climate adaptation is vital for the sustainability and sometimes the mere survival of our urban areas. However, small cities often struggle with limited personnel resources and integrating v...
Broadening View Synthesis of Dynamic Scenes from Constrained Monocular Videos : Abstract: In dynamic Neural Radiance Fields (NeRF) systems, state-of-the-art novel view synthesis methods often fail under significant viewpoint deviations, producing unstable and unrealistic renderin...
LCMem: A Universal Model for Robust Image Memorization Detection : Abstract: Recent advances in generative image modeling have achieved visual realism sufficient to deceive human experts, yet their potential for privacy preserving data sharing remains insufficiently ...
The Devil is in Attention Sharing: Improving Complex Non-rigid Image Editing Faithfulness via Attention Synergy : Abstract: Training-free image editing with large diffusion models has become practical, yet faithfully performing complex non-rigid edits (e.g., pose or shape changes) remains highly challenging. We i...
Score-Based Turbo Message Passing for Plug-and-Play Compressive Imaging : Abstract: Message-passing algorithms have been adapted for compressive imaging by incorporating various off-the-shelf image denoisers. However, these denoisers rely largely on generic or hand-crafted ...
S2D: Sparse-To-Dense Keymask Distillation for Unsupervised Video Instance Segmentation : Abstract: In recent years, the state-of-the-art in unsupervised video instance segmentation has heavily relied on synthetic video data, generated from object-centric image datasets such as ImageNet. H...
A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning : Abstract: Affordance prediction, which identifies interaction regions on objects based on language instructions, is critical for embodied AI. Prevailing end-to-end models couple high-level reasoning a...
SuperCLIP: CLIP with Simple Classification Supervision : Abstract: Contrastive Language-Image Pretraining (CLIP) achieves strong generalization in vision-language tasks by aligning images and texts in a shared embedding space. However, recent findings show ...
SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition : Abstract: In this work we present SignIT, a new dataset to study the task of Italian Sign Language (LIS) recognition. The dataset is composed of 644 videos covering 3.33 hours. We manually annotated v...
Native Intelligence Emerges from Large-Scale Clinical Practice: A Retinal Foundation Model with Deployment Efficiency : Abstract: Current retinal foundation models remain constrained by curated research datasets that lack authentic clinical context, and require extensive task-specific optimization for each application,...
DASP: Self-supervised Nighttime Monocular Depth Estimation with Domain Adaptation of Spatiotemporal Priors : Abstract: Self-supervised monocular depth estimation has achieved notable success under daytime conditions. However, its performance deteriorates markedly at night due to low visibility and varying il...
HiFi-Portrait: Zero-shot Identity-preserved Portrait Generation with High-fidelity Multi-face Fusion : Abstract: Recent advancements in diffusion-based technologies have made significant strides, particularly in identity-preserved portrait generation (IPG). However, when using multiple reference images...
TAT: Task-Adaptive Transformer for All-in-One Medical Image Restoration : Abstract: Medical image restoration (MedIR) aims to recover high-quality medical images from their low-quality counterparts. Recent advancements in MedIR have focused on All-in-One models capable of s...
FoodLogAthl-218: Constructing a Real-World Food Image Dataset Using Dietary Management Applications : Abstract: Food image classification models are crucial for dietary management applications because they reduce the burden of manual meal logging. However, most publicly available datasets for training...
LLM-driven Knowledge Enhancement for Multimodal Cancer Survival Prediction : Abstract: Current multimodal survival prediction methods typically rely on pathology images (WSIs) and genomic data, both of which are high-dimensional and redundant, making it difficult to extract di...
TUMTraf EMOT: Event-Based Multi-Object Tracking Dataset and Baseline for Traffic Scenarios : Abstract: In Intelligent Transportation Systems (ITS), multi-object tracking is primarily based on frame-based cameras. However, these cameras tend to perform poorly under dim lighting and high-speed ...
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling : Abstract: This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between spee...
Distill Video Datasets into Images : Abstract: Dataset distillation aims to synthesize compact yet informative datasets that allow models trained on them to achieve performance comparable to training on the full dataset. While this appro...
AMD-HookNet++: Evolution of AMD-HookNet with Hybrid CNN-Transformer Feature Enhancement for Glacier Calving Front Segmentation : Abstract: The dynamics of glaciers and ice shelf fronts significantly impact the mass balance of ice sheets and coastal sea levels. To effectively monitor glacier conditions, it is crucial to consiste...
Adaptable Segmentation Pipeline for Diverse Brain Tumors with Radiomic-guided Subtyping and Lesion-Wise Model Ensemble : Abstract: Robust and generalizable segmentation of brain tumors on multi-parametric magnetic resonance imaging (MRI) remains difficult because tumor types differ widely. The BraTS 2025 Lighthouse Chal...
ViRC: Enhancing Visual Interleaved Mathematical CoT with Reason Chunking : Abstract: CoT has significantly enhanced the reasoning ability of LLMs while it faces challenges when extended to multimodal domains, particularly in mathematical tasks. Existing MLLMs typically perfo...
Enhancing Visual Sentiment Analysis via Semiotic Isotopy-Guided Dataset Construction : Abstract: Visual Sentiment Analysis (VSA) is a challenging task due to the vast diversity of emotionally salient images and the inherent difficulty of acquiring sufficient data to capture this variabi...
ART: Articulated Reconstruction Transformer : Abstract: We introduce ART, Articulated Reconstruction Transformer -- a category-agnostic, feed-forward model that reconstructs complete 3D articulated objects from only sparse, multi-state RGB images...
CRISP: Contact-Guided Real2Sim from Monocular Video with Planar Scene Primitives : Abstract: We introduce CRISP, a method that recovers simulatable human motion and scene geometry from monocular video. Prior work on joint human-scene reconstruction relies on data-driven priors and j...
MemFlow: Flowing Adaptive Memory for Consistent and Efficient Long Video Narratives : Abstract: The core challenge for streaming video generation is maintaining the content consistency in long context, which poses high requirement for the memory design. Most existing solutions maintain...
CLAIM: Camera-LiDAR Alignment with Intensity and Monodepth : Abstract: In this paper, we unleash the potential of the powerful monodepth model in camera-LiDAR calibration and propose CLAIM, a novel method of aligning data from the camera and LiDAR. Given the in...
Expert Switching for Robust AAV Landing: A Dual-Detector Framework in Simulation : Abstract: Reliable helipad detection is essential for Autonomous Aerial Vehicle (AAV) landing, especially under GPS-denied or visually degraded conditions. While modern detectors such as YOLOv8 offer ...
Establishing Stochastic Object Models from Noisy Data via Ambient Measurement-Integrated Diffusion : Abstract: Task-based measures of image quality (IQ) are critical for evaluating medical imaging systems, which must account for randomness including anatomical variability. Stochastic object models (S...
A Comprehensive Safety Metric to Evaluate Perception in Autonomous Systems : Abstract: Complete perception of the environment and its correct interpretation is crucial for autonomous vehicles. Object perception is the main component of automotive surround sensing. Various metr...
VICTOR: Dataset Copyright Auditing in Video Recognition Systems : Abstract: Video recognition systems are increasingly being deployed in daily life, such as content recommendation and security monitoring. To enhance video recognition development, many institutions h...
Test Time Optimized Generalized AI-based Medical Image Registration Method : Abstract: Medical image registration is critical for aligning anatomical structures across imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound. Among ...
WaveSim: A Wavelet-based Multi-scale Similarity Metric for Weather and Climate Fields : Abstract: We introduce WaveSim, a multi-scale similarity metric for the evaluation of spatial fields in weather and climate applications. WaveSim exploits wavelet transforms to decompose input fields ...
EVOLVE-VLA: Test-Time Training from Environment Feedback for Vision-Language-Action Models : Abstract: Achieving truly adaptive embodied intelligence requires agents that learn not just by imitating static demonstrations, but by continuously improving through environmental interaction, which ...
LSM: A Comprehensive Metric for Assessing the Safety of Lane Detection Systems in Autonomous Driving : Abstract: Comprehensive perception of the vehicle's environment and correct interpretation of the environment are crucial for the safe operation of autonomous vehicles. The perception of surrounding o...
A Unified Framework with Multimodal Fine-tuning for Remote Sensing Semantic Segmentation : Abstract: Multimodal remote sensing data, acquired from diverse sensors, offer a comprehensive and integrated perspective of the Earth's surface. Leveraging multimodal fusion techniques, semantic segm...
Theoretical Guarantees of Learning Ensembling Strategies with Applications to Time Series Forecasting : Abstract: Ensembling is among the most popular tools in machine learning (ML) due to its effectiveness in minimizing variance and thus improving generalization. Most ensembling methods for black-box b...
Differentially Private Knowledge Distillation via Synthetic Text Generation : Abstract: Large Language models (LLMs) are achieving state-of-the-art performance in many different downstream tasks. However, the increasing urgency of data privacy puts pressure on practitioners to ...
I-Diff: Structural Regularization for High-Fidelity Diffusion Models : Abstract: Denoising Diffusion Probabilistic Models (DDPMs) have significantly advanced generative AI, achieving impressive results in high-quality image and data generation. However, enhancing fidelit...
Unsupervised Representation Learning from Sparse Transformation Analysis : Abstract: There is a vast literature on representation learning based on principles such as coding efficiency, statistical independence, causality, controllability, or symmetry. In this paper we propo...
Transparent Networks for Multivariate Time Series : Abstract: Transparent models, which provide inherently interpretable predictions, are receiving significant attention in high-stakes domains. However, despite much real-world data being collected as t...
A Lipschitz spaces view of infinitely wide shallow neural networks : Abstract: We revisit the mean field parametrization of shallow neural networks, using signed measures on unbounded parameter spaces and duality pairings that take into account the regularity and growt...
On uniqueness in structured model learning : Abstract: This paper addresses the problem of uniqueness in learning physical laws for systems of partial differential equations (PDEs). Contrary to most existing approaches, it considers a framework ...
FiNERweb: Datasets and Artifacts for Scalable Multilingual Named Entity Recognition : Abstract: Recent multilingual named entity recognition (NER) work has shown that large language models (LLMs) can provide effective synthetic supervision, yet such datasets have mostly appeared as by-...
Structure-Aware Decoding Mechanisms for Complex Entity Extraction with Large-Scale Language Models : Abstract: This paper proposes a structure-aware decoding method based on large language models to address the difficulty of traditional approaches in maintaining both semantic integrity and structural...
What Affects the Effective Depth of Large Language Models? : Abstract: The scaling of large language models (LLMs) emphasizes increasing depth, yet performance gains diminish with added layers. Prior work introduces the concept of "effective depth", arguing tha...
A Unified Sparse Attention via Multi-Granularity Compression : Abstract: Efficient long-context understanding and reasoning are increasingly vital for large language model (LLM) applications such as multi-turn dialogue and program analysis. However, the core self...
Multilingual and Continuous Backchannel Prediction: A Cross-lingual Study : Abstract: We present a multilingual, continuous backchannel prediction model for Japanese, English, and Chinese, and use it to investigate cross-linguistic timing behavior. The model is Transformer-ba...
CogMem: A Cognitive Memory Architecture for Sustained Multi-Turn Reasoning in Large Language Models : Abstract: Large language models (LLMs) excel at single-turn reasoning but often lose accuracy and coherence over extended, multi-turn interactions. Recent evaluations such as TurnBench highlight recur...
Astraea: A State-Aware Scheduling Engine for LLM-Powered Agents : Abstract: Large Language Models (LLMs) are increasingly being deployed as intelligent agents. Their multi-stage workflows, which alternate between local computation and calls to external network servi...
Two CFG Nahuatl for automatic corpora expansion : Abstract: The aim of this article is to introduce two Context-Free Grammars (CFG) for Nawatl Corpora expansion. Nawatl is an Amerindian language (it is a National Language of Mexico) of the $π$-langua...
Inflation Attitudes of Large Language Models : Abstract: This paper investigates the ability of Large Language Models (LLMs), specifically GPT-3.5-turbo (GPT), to form inflation perceptions and expectations based on macroeconomic price signals. We...
Linguists should learn to love speech-based deep learning models : Abstract: Futrell and Mahowald present a useful framework bridging technology-oriented deep learning systems and explanation-oriented linguistic theories. Unfortunately, the target article's focus on ...
VersatileFFN: Achieving Parameter Efficiency in LLMs via Adaptive Wide-and-Deep Reuse : Abstract: The rapid scaling of Large Language Models (LLMs) has achieved remarkable performance, but it also leads to prohibitive memory costs. Existing parameter-efficient approaches such as pruning ...
Agreement Between Large Language Models and Human Raters in Essay Scoring: A Research Synthesis : Abstract: Despite the growing promise of large language models (LLMs) in automatic essay scoring (AES), empirical findings regarding their reliability compared to human raters remain mixed. Following ...
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing : Abstract: Multi-token generation has emerged as a promising paradigm for accelerating transformer-based large model inference. Recent efforts primarily explore diffusion Large Language Models (dLLMs) ...
MMGR: Multi-Modal Generative Reasoning : Abstract: Video foundation models generate visually realistic and temporally coherent content, but their reliability as world simulators depends on whether they capture physical, logical, and spatial ...
Shakespeare, Entropy and Educated Monkeys : Abstract: It has often been said, correctly, that a monkey forever randomly typing on a keyboard would eventually produce the complete works of William Shakespeare. Almost just as often it has been po...
HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices : Abstract: Current multimodal large lanauge models possess strong perceptual and reasoning capabilities, however high computational and memory requirements make them difficult to deploy directly on on-...
RecGPT-V2 Technical Report : Abstract: Large language models (LLMs) have demonstrated remarkable potential in transforming recommender systems from implicit behavioral pattern matching to explicit intent reasoning. While RecGPT-V...
Segmental Attention Decoding With Long Form Acoustic Encodings : Abstract: We address the fundamental incompatibility of attention-based encoder-decoder (AED) models with long-form acoustic encodings. AED models trained on segmented utterances learn to encode absol...
Can Finetuing LLMs on Small Human Samples Increase Heterogeneity, Alignment, and Belief-Action Coherence? : Abstract: There is ongoing debate about whether large language models (LLMs) can serve as substitutes for human participants in survey and experimental research. While recent work in fields such as ma...
Nexels: Neurally-Textured Surfels for Real-Time Novel View Synthesis with Sparse Geometries : Abstract: Though Gaussian splatting has achieved impressive results in novel view synthesis, it requires millions of primitives to model highly textured scenes, even when the geometry of the scene is ...
MoLingo: Motion-Language Alignment for Text-to-Motion Generation : Abstract: We introduce MoLingo, a text-to-motion (T2M) model that generates realistic, lifelike human motion by denoising in a continuous latent space. Recent works perform latent space diffusion, eit...
Coarse-to-Fine Hierarchical Alignment for UAV-based Human Detection using Diffusion Models : Abstract: Training object detectors demands extensive, task-specific annotations, yet this requirement becomes impractical in UAV-based human detection due to constantly shifting target distributions ...
SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning : Abstract: As humans, we are natural any-horizon reasoners, i.e., we can decide whether to iteratively skim long videos or watch short ones in full when necessary for a given task. With this in mind, o...
Route-DETR: Pairwise Query Routing in Transformers for Object Detection : Abstract: Detection Transformer (DETR) offers an end-to-end solution for object detection by eliminating hand-crafted components like non-maximum suppression. However, DETR suffers from inefficient qu...
An evaluation of SVBRDF Prediction from Generative Image Models for Appearance Modeling of 3D Scenes : Abstract: Digital content creation is experiencing a profound change with the advent of deep generative models. For texturing, conditional image generators now allow the synthesis of realistic RGB ima...
From Unlearning to UNBRANDING: A Benchmark for Trademark-Safe Text-to-Image Generation : Abstract: The rapid progress of text-to-image diffusion models raises significant concerns regarding the unauthorized reproduction of trademarked content. While prior work targets general concepts (e....
Quality-Driven and Diversity-Aware Sample Expansion for Robust Marine Obstacle Segmentation : Abstract: Marine obstacle detection demands robust segmentation under challenging conditions, such as sun glitter, fog, and rapidly changing wave patterns. These factors degrade image quality, while t...
XAI-Driven Diagnosis of Generalization Failure in State-Space Cerebrovascular Segmentation Models: A Case Study on Domain Shift Between RSNA and TopCoW Datasets : Abstract: The clinical deployment of deep learning models in medical imaging is severely hindered by domain shift. This challenge, where a high-performing model fails catastrophically on external data...
FocalComm: Hard Instance-Aware Multi-Agent Perception : Abstract: Multi-agent collaborative perception (CP) is a promising paradigm for improving autonomous driving safety, particularly for vulnerable road users like pedestrians, via robust 3D perception. ...
Repurposing 2D Diffusion Models for 3D Shape Completion : Abstract: We present a framework that adapts 2D diffusion models for 3D shape completion from incomplete point clouds. While text-to-image diffusion models have achieved remarkable success with abunda...
Sparse-LaViDa: Sparse Multimodal Discrete Diffusion Language Models : Abstract: Masked Discrete Diffusion Models (MDMs) have achieved strong performance across a wide range of multimodal tasks, including image understanding, generation, and editing. However, their infer...
Deep Learning Perspective of Scene Understanding in Autonomous Robots : Abstract: This paper provides a review of deep learning applications in scene understanding in autonomous robots, including innovations in object detection, semantic and instance segmentation, depth e...
Unleashing the Power of Image-Tabular Self-Supervised Learning via Breaking Cross-Tabular Barriers : Abstract: Multi-modal learning integrating medical images and tabular data has significantly advanced clinical decision-making in recent years. Self-Supervised Learning (SSL) has emerged as a powerful...
Robust Single-shot Structured Light 3D Imaging via Neural Feature Decoding : Abstract: We consider the problem of active 3D imaging using single-shot structured light systems, which are widely employed in commercial 3D sensing devices such as Apple Face ID and Intel RealSense....
ASAP-Textured Gaussians: Enhancing Textured Gaussians with Adaptive Sampling and Anisotropic Parameterization : Abstract: Recent advances have equipped 3D Gaussian Splatting with texture parameterizations to capture spatially varying attributes, improving the performance of both appearance modeling and downstre...
SELECT: Detecting Label Errors in Real-world Scene Text Data : Abstract: We introduce SELECT (Scene tExt Label Errors deteCTion), a novel approach that leverages multi-modal training to detect label errors in real-world scene text datasets. Utilizing an image-tex...
Bridging Fidelity-Reality with Controllable One-Step Diffusion for Image Super-Resolution : Abstract: Recent diffusion-based one-step methods have shown remarkable progress in the field of image super-resolution, yet they remain constrained by three critical limitations: (1) inferior fidelit...
GaussianPlant: Structure-aligned Gaussian Splatting for 3D Reconstruction of Plants : Abstract: We present a method for jointly recovering the appearance and internal structure of botanical plants from multi-view images based on 3D Gaussian Splatting (3DGS). While 3DGS exhibits robust ...
Quality-Aware Framework for Video-Derived Respiratory Signals : Abstract: Video-based respiratory rate (RR) estimation is often unreliable due to inconsistent signal quality across extraction methods. We present a predictive, quality-aware framework that integrate...
AnchorHOI: Zero-shot Generation of 4D Human-Object Interaction via Anchor-based Prior Distillation : Abstract: Despite significant progress in text-driven 4D human-object interaction (HOI) generation with supervised methods, the scalability remains limited by the scarcity of large-scale 4D HOI datase...
OUSAC: Optimized Guidance Scheduling with Adaptive Caching for DiT Acceleration : Abstract: Diffusion models have emerged as the dominant paradigm for high-quality image generation, yet their computational expense remains substantial due to iterative denoising. Classifier-Free Guid...
ViewMask-1-to-3: Multi-View Consistent Image Generation via Multimodal Diffusion Models : Abstract: Multi-view image generation from a single image and text description remains challenging due to the difficulty of maintaining geometric consistency across different viewpoints. Existing appr...
Selective, Controlled and Domain-Agnostic Unlearning in Pretrained CLIP: A Training- and Data-Free Approach : Abstract: Pretrained models like CLIP have demonstrated impressive zero-shot classification capabilities across diverse visual domains, spanning natural images, artistic renderings, and abstract repre...
MFE-GAN: Efficient GAN-based Framework for Document Image Enhancement and Binarization with Multi-scale Feature Extraction : Abstract: Document image enhancement and binarization are commonly performed prior to document analysis and recognition tasks for improving the efficiency and accuracy of optical character recognition...
Consistent Instance Field for Dynamic Scene Understanding : Abstract: We introduce Consistent Instance Field, a continuous and probabilistic spatio-temporal representation for dynamic scene understanding. Unlike prior methods that rely on discrete tracking or ...
Prediction of Respiratory Syncytial Virus-Associated Hospitalizations Using Machine Learning Models Based on Environmental Data : Abstract: Respiratory syncytial virus (RSV) is a leading cause of hospitalization among young children, with outbreaks strongly influenced by environmental conditions. This study developed a machine l...
RAST-MoE-RL: A Regime-Aware Spatio-Temporal MoE Framework for Deep Reinforcement Learning in Ride-Hailing : Abstract: Ride-hailing platforms face the challenge of balancing passenger waiting times with overall system efficiency under highly uncertain supply-demand conditions. Adaptive delayed matching creat...
Enhancing Semi-Supervised Multi-View Graph Convolutional Networks via Supervised Contrastive Learning and Self-Training : Abstract: The advent of graph convolutional network (GCN)-based multi-view learning provides a powerful framework for integrating structural information from heterogeneous views, enabling effective mo...
Constrained Policy Optimization via Sampling-Based Weight-Space Projection : Abstract: Safety-critical learning requires policies that improve performance without leaving the safe operating regime. We study constrained policy learning where model parameters must satisfy unknow...
The Double Life of Code World Models: Provably Unmasking Malicious Behavior Through Execution Traces : Abstract: Large language models (LLMs) increasingly generate code with minimal human oversight, raising critical concerns about backdoor injection and malicious behavior. We present Cross-Trace Verifi...
Explainable reinforcement learning from human feedback to improve alignment : Abstract: A common and effective strategy for humans to improve an unsatisfactory outcome in daily life is to find a cause of this outcome and correct the cause. In this paper, we investigate whether ...
Topologically-Stabilized Graph Neural Networks: Empirical Robustness Across Domains : Abstract: Graph Neural Networks (GNNs) have become the standard for graph representation learning but remain vulnerable to structural perturbations. We propose a novel framework that integrates persis...
Dropout Neural Network Training Viewed from a Percolation Perspective : Abstract: In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, firs...
Measuring Uncertainty Calibration : Abstract: We make two contributions to the problem of estimating the $L_1$ calibration error of a binary classifier from a finite dataset. First, we provide an upper bound for any classifier where the...
Let's (not) just put things in Context: Test-Time Training for Long-Context LLMs : Abstract: Progress on training and architecture strategies has enabled LLMs with millions of tokens in context length. However, empirical evidence suggests that such long-context LLMs can consume far ...
Capturing reduced-order quantum many-body dynamics out of equilibrium via neural ordinary differential equations : Abstract: Out-of-equilibrium quantum many-body systems exhibit rapid correlation buildup that underlies many emerging phenomena. Exact wave-function methods to describe this scale exponentially with p...
Adaptive digital twins for predictive decision-making: Online Bayesian learning of transition dynamics : Abstract: This work shows how adaptivity can enhance value realization of digital twins in civil engineering. We focus on adapting the state transition models within digital twins represented through ...
Sliding Window Recurrences for Sequence Models : Abstract: Multi-hybrid architectures are poised to take over language modeling due to better quality and performance. We introduce a hierarchical decomposition framework for linear recurrences that al...
A Complete Guide to Spherical Equivariant Graph Transformers : Abstract: Spherical equivariant graph neural networks (EGNNs) provide a principled framework for learning on three-dimensional molecular and biomolecular systems, where predictions must respect the ro...
Pattern-Guided Diffusion Models : Abstract: Diffusion models have shown promise in forecasting future data from multivariate time series. However, few existing methods account for recurring structures, or patterns, that appear within ...
A Single Architecture for Representing Invariance Under Any Space Group : Abstract: Incorporating known symmetries in data into machine learning models has consistently improved predictive accuracy, robustness, and generalization. However, achieving exact invariance to spec...
Accelerating MHC-II Epitope Discovery via Multi-Scale Prediction in Antigen Presentation : Abstract: Antigenic epitope presented by major histocompatibility complex II (MHC-II) proteins plays an essential role in immunotherapy. However, compared to the more widely studied MHC-I in computati...
EXAONE Path 2.5: Pathology Foundation Model with Multi-Omics Alignment : Abstract: Cancer progression arises from interactions across multiple biological layers, especially beyond morphological and across molecular layers that remain invisible to image-only models. To capt...
Multivariate Time Series Forecasting with Hybrid Euclidean-SPD Manifold Graph Neural Networks : Abstract: Multivariate Time Series (MTS) forecasting plays a vital role in various real-world applications, such as traffic management and predictive maintenance. Existing approaches typically model M...
FusAD: Time-Frequency Fusion with Adaptive Denoising for General Time Series Analysis : Abstract: Time series analysis plays a vital role in fields such as finance, healthcare, industry, and meteorology, underpinning key tasks including classification, forecasting, and anomaly detection....
Derivative-Informed Fourier Neural Operator: Universal Approximation and Applications to PDE-Constrained Optimization : Abstract: We present approximation theories and efficient training methods for derivative-informed Fourier neural operators (DIFNOs) with applications to PDE-constrained optimization. A DIFNO is an FN...
Cornserve: Efficiently Serving Any-to-Any Multimodal Models : Abstract: We present Cornserve, an efficient online serving system for an emerging class of multimodal models called Any-to-Any models. Any-to-Any models accept combinations of text and multimodal dat...
A First-Order Logic-Based Alternative to Reward Models in RLHF : Abstract: Reinforcement Learning from Human Feedback (RLHF) plays a crucial role in aligning large language models (LLMs) with human values and preferences. However, the quality and stability of the t...
On Improving Deep Active Learning with Formal Verification : Abstract: Deep Active Learning (DAL) aims to reduce labeling costs in neural-network training by prioritizing the most informative unlabeled samples for annotation. Beyond selecting which samples to l...
Optimizing the Adversarial Perturbation with a Momentum-based Adaptive Matrix : Abstract: Generating adversarial examples (AEs) can be formulated as an optimization problem. Among various optimization-based attacks, the gradient-based PGD and the momentum-based MI-FGSM have garne...
Random-Bridges as Stochastic Transports for Generative Models : Abstract: This paper motivates the use of random-bridges -- stochastic processes conditioned to take target distributions at fixed timepoints -- in the realm of generative modelling. Herein, random-br...
Understanding the Gain from Data Filtering in Multimodal Contrastive Learning : Abstract: The success of modern multimodal representation learning relies on internet-scale datasets. Due to the low quality of a large fraction of raw web data, data curation has become a critical st...
Physically consistent model learning for reaction-diffusion systems : Abstract: This paper addresses the problem of learning reaction-diffusion (RD) systems from data while ensuring physical consistency and well-posedness of the learned models. Building on a regularizat...
FLAME: Flow Enhanced Legendre Memory Models for General Time Series Forecasting : Abstract: In this work, we introduce FLAME, a family of extremely lightweight and capable Time Series Foundation Models, which support both deterministic and probabilistic forecasting via generative p...
Implicit Bias and Invariance: How Hopfield Networks Efficiently Learn Graph Orbits : Abstract: Many learning problems involve symmetries, and while invariance can be built into neural architectures, it can also emerge implicitly when training on group-structured data. We study this ph...
Black-Box Auditing of Quantum Model: Lifted Differential Privacy with Quantum Canaries : Abstract: Quantum machine learning (QML) promises significant computational advantages, yet models trained on sensitive data risk memorizing individual records, creating serious privacy vulnerabilitie...
SuperWing: a comprehensive transonic wing dataset for data-driven aerodynamic design : Abstract: Machine-learning surrogate models have shown promise in accelerating aerodynamic design, yet progress toward generalizable predictors for three-dimensional wings has been limited by the scar...
GRAFT: Grid-Aware Load Forecasting with Multi-Source Textual Alignment and Fusion : Abstract: Electric load is simultaneously affected across multiple time scales by exogenous factors such as weather and calendar rhythms, sudden events, and policies. Therefore, this paper proposes GR...
Dual-Axis RCCL: Representation-Complete Convergent Learning for Organic Chemical Space : Abstract: Machine learning is profoundly reshaping molecular and materials modeling; however, given the vast scale of chemical space (10^30-10^60), it remains an open scientific question whether model...
Bridging Artificial Intelligence and Data Assimilation: The Data-driven Ensemble Forecasting System ClimaX-LETKF : Abstract: While machine learning-based weather prediction (MLWP) has achieved significant advancements, research on assimilating real observations or ensemble forecasts within MLWP models remains limi...
AnySleep: a channel-agnostic deep learning system for high-resolution sleep staging in multi-center cohorts : Abstract: Sleep is essential for good health throughout our lives, yet studying its dynamics requires manual sleep staging, a labor-intensive step in sleep research and clinical care. Across centers, ...
Kinetic-Mamba: Mamba-Assisted Predictions of Stiff Chemical Kinetics : Abstract: Accurate chemical kinetics modeling is essential for combustion simulations, as it governs the evolution of complex reaction pathways and thermochemical states. In this work, we introduce Ki...
Improving Slow Transfer Predictions: Generative Methods Compared : Abstract: Monitoring data transfer performance is a crucial task in scientific computing networks. By predicting performance early in the communication phase, potentially sluggish transfers can be ide...
Synthetic Electrogram Generation with Variational Autoencoders for ECGI : Abstract: Atrial fibrillation (AF) is the most prevalent sustained cardiac arrhythmia, and its clinical assessment requires accurate characterization of atrial electrical activity. Noninvasive electro...
Counterfactual Explanations for Time Series Should be Human-Centered and Temporally Coherent in Interventions : Abstract: Counterfactual explanations are increasingly proposed as interpretable mechanisms to achieve algorithmic recourse. However, current counterfactual techniques for time series classification a...
Hybrid Iterative Solvers with Geometry-Aware Neural Preconditioners for Parametric PDEs : Abstract: The convergence behavior of classical iterative solvers for parametric partial differential equations (PDEs) is often highly sensitive to the domain and specific discretization of PDEs. Prev...
Hierarchical Persistence Velocity for Network Anomaly Detection: Theory and Applications to Cryptocurrency Markets : Abstract: We introduce the Overlap-Weighted Hierarchical Normalized Persistence Velocity (OW-HNPV), a novel topological data analysis method for detecting anomalies in time-varying networks. Unlike ex...
ParaFormer: A Generalized PageRank Graph Transformer for Graph Representation Learning : Abstract: Graph Transformers (GTs) have emerged as a promising graph learning tool, leveraging their all-pair connected property to effectively capture global information. To address the over-smoothin...
Beyond Lipschitz Continuity and Monotonicity: Fractal and Chaotic Activation Functions in Echo State Networks : Abstract: Contemporary reservoir computing relies heavily on smooth, globally Lipschitz continuous activation functions, limiting applications in defense, disaster response, and pharmaceutical modelin...
Early Warning Index for Patient Deteriorations in Hospitals : Abstract: Hospitals lack automated systems to harness the growing volume of heterogeneous clinical and operational data to effectively forecast critical events. Early identification of patients at ris...
Sim2Real Reinforcement Learning for Soccer skills : Abstract: This thesis work presents a more efficient and effective approach to training control-related tasks for humanoid robots using Reinforcement Learning (RL). The traditional RL methods are limi...
Modular connectivity in neural networks emerges from Poisson noise-motivated regularisation, and promotes robustness and compositional generalisation : Abstract: Circuits in the brain commonly exhibit modular architectures that factorise complex tasks, resulting in the ability to compositionally generalise and reduce catastrophic forgetting. In contr...
Smart Surveillance: Identifying IoT Device Behaviours using ML-Powered Traffic Analysis : Abstract: The proliferation of Internet of Things (IoT) devices has grown exponentially in recent years, introducing significant security challenges. Accurate identification of the types of IoT device...
Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network : Abstract: Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets. T...
Time-aware UNet and super-resolution deep residual networks for spatial downscaling : Abstract: Satellite data of atmospheric pollutants are often available only at coarse spatial resolution, limiting their applicability in local-scale environmental analysis and decision-making. Spatia...
Improving the Plausibility of Pressure Distributions Synthesized from Depth through Generative Modeling : Abstract: Monitoring contact pressure in hospital beds is essential for preventing pressure ulcers and enabling real-time patient assessment. Current methods can predict pressure maps but often lack p...
Unreasonable effectiveness of unsupervised learning in identifying Majorana topology : Abstract: In unsupervised learning, the training data for deep learning does not come with any labels, thus forcing the algorithm to discover hidden patterns in the data for discerning useful informat...
BiCoRec: Bias-Mitigated Context-Aware Sequential Recommendation Model : Abstract: Sequential recommendation models aim to learn from users evolving preferences. However, current state-of-the-art models suffer from an inherent popularity bias. This study developed a novel ...
Safe Online Control-Informed Learning : Abstract: This paper proposes a Safe Online Control-Informed Learning framework for safety-critical autonomous systems. The framework unifies optimal control, parameter estimation, and safety constrai...
Simultaneous and Proportional Finger Motion Decoding Using Spatial Features from High-Density Surface Electromyography : Abstract: Restoring natural and intuitive hand function requires simultaneous and proportional control (SPC) of multiple degrees of freedom (DoFs). This study systematically evaluated the multichannel...
Group-Theoretic Reinforcement Learning of Dynamical Decoupling Sequences : Abstract: Dynamical decoupling seeks to mitigate phase decoherence in qubits by applying a carefully designed sequence of effectively instantaneous electromagnetic pulses. Although analytic solutions ...
KLO-Net: A Dynamic K-NN Attention U-Net with CSP Encoder for Efficient Prostate Gland Segmentation from MRI : Abstract: Real-time deployment of prostate MRI segmentation on clinical workstations is often bottlenecked by computational load and memory footprint. Deep learning-based prostate gland segmentation a...
Olmo 3 : Abstract: We introduce Olmo 3, a family of state-of-the-art, fully-open language models at the 7B and 32B parameter scales. Olmo 3 model construction targets long-context reasoning, function calling, ...
Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics : Abstract: Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Appl...
On the Hardness of Conditional Independence Testing In Practice : Abstract: Tests of conditional independence (CI) underpin a number of important problems in machine learning and statistics, from causal discovery to evaluation of predictor fairness and out-of-distri...
Physics-Informed Machine Learning for Two-Phase Moving-Interface and Stefan Problems : Abstract: The Stefan problem is a classical free-boundary problem that models phase-change processes and poses computational challenges due to its moving interface and nonlinear temperature-phase coup...
ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning : Abstract: With their high information density and intuitive readability, charts have become the de facto medium for data analysis and communication across disciplines. Recent multimodal large language...
A Deep Dive into Function Inlining and its Security Implications for ML-based Binary Analysis : Abstract: A function inlining optimization is a widely used transformation in modern compilers, which replaces a call site with the callee's body in need. While this transformation improves performanc...
Scalable Frameworks for Real-World Audio-Visual Speech Recognition : Abstract: The practical deployment of Audio-Visual Speech Recognition (AVSR) systems is fundamentally challenged by significant performance degradation in real-world environments, characterized by unp...
Joint Multimodal Contrastive Learning for Robust Spoken Term Detection and Keyword Spotting : Abstract: Acoustic Word Embeddings (AWEs) improve the efficiency of speech retrieval tasks such as Spoken Term Detection (STD) and Keyword Spotting (KWS). However, existing approaches suffer from limi...
Weighted Conformal Prediction Provides Adaptive and Valid Mask-Conditional Coverage for General Missing Data Mechanisms : Abstract: Conformal prediction (CP) offers a principled framework for uncertainty quantification, but it fails to guarantee coverage when faced with missing covariates. In addressing the heterogeneity...
Ladder Up, Memory Down: Low-Cost Fine-Tuning With Side Nets : Abstract: Fine-tuning large language models (LLMs) is often limited by the memory available on commodity GPUs. Parameter-efficient fine-tuning (PEFT) methods such as QLoRA reduce the number of trainab...
TUN: Detecting Significant Points in Persistence Diagrams with Deep Learning : Abstract: Persistence diagrams (PDs) provide a powerful tool for understanding the topology of the underlying shape of a point cloud. However, identifying which points in PDs encode genuine signals re...
Improving the Accuracy of Amortized Model Comparison with Self-Consistency : Abstract: Amortized Bayesian inference (ABI) offers fast, scalable approximations to posterior densities by training neural surrogates on data simulated from the statistical model. However, ABI method...
Continual Learning at the Edge: An Agnostic IIoT Architecture : Abstract: The exponential growth of Internet-connected devices has presented challenges to traditional centralized computing systems due to latency and bandwidth limitations. Edge computing has evolve...
From STLS to Projection-based Dictionary Selection in Sparse Regression for System Identification : Abstract: In this work, we revisit dictionary-based sparse regression, in particular, Sequential Threshold Least Squares (STLS), and propose a score-guided library selection to provide practical guida...
Pattern Recognition of Aluminium Arbitrage in Global Trade Data : Abstract: As the global economy transitions toward decarbonization, the aluminium sector has become a focal point for strategic resource management. While policies such as the Carbon Border Adjustment...
Hybrid Ensemble Method for Detecting Cyber-Attacks in Water Distribution Systems Using the BATADAL Dataset : Abstract: The cybersecurity of Industrial Control Systems that manage critical infrastructure such as Water Distribution Systems has become increasingly important as digital connectivity expands. BATA...
C-ing Clearly: Enhanced Binary Code Explanations using C code : Abstract: Large Language Models (LLMs) typically excel at coding tasks involving high-level programming languages, as opposed to lower-level programming languages, such as assembly. We propose a synth...
Sound and Music Biases in Deep Music Transcription Models: A Systematic Analysis : Abstract: Automatic Music Transcription (AMT) -- the task of converting music audio into note representations -- has seen rapid progress, driven largely by deep learning systems. Due to the limited av...
LLmFPCA-detect: LLM-powered Multivariate Functional PCA for Anomaly Detection in Sparse Longitudinal Texts : Abstract: Sparse longitudinal (SL) textual data arises when individuals generate text repeatedly over time (e.g., customer reviews, occasional social media posts, electronic medical records across vis...
TiME: Tiny Monolingual Encoders for Efficient NLP Pipelines : Abstract: Today, a lot of research on language models is focused on large, general-purpose models. However, many NLP pipelines only require models with a well-defined, small set of capabilities. While...
CHIP: Adaptive Compliance for Humanoid Control through Hindsight Perturbation : Abstract: Recent progress in humanoid robots has unlocked agile locomotion skills, including backflipping, running, and crawling. Yet it remains challenging for a humanoid robot to perform forceful ma...
The Trust in AI-Generated Health Advice (TAIGHA) Scale and Short Version (TAIGHA-S): Development and Validation Study : Abstract: Artificial Intelligence tools such as large language models are increasingly used by the public to obtain health information and guidance. In health-related contexts, following or rejecting ...
A Threshold-Triggered Deep Q-Network-Based Framework for Self-Healing in Autonomic Software-Defined IIoT-Edge Networks : Abstract: Stochastic disruptions such as flash events arising from benign traffic bursts and switch thermal fluctuations are major contributors to intermittent service degradation in software-defined ...
From YOLO to VLMs: Advancing Zero-Shot and Few-Shot Detection of Wastewater Treatment Plants Using Satellite Imagery in MENA Region : Abstract: In regions of the Middle East and North Africa (MENA), there is a high demand for wastewater treatment plants (WWTPs), crucial for sustainable water management. Precise identification of WWT...
Semantic Mismatch and Perceptual Degradation: A New Perspective on Image Editing Immunity : Abstract: Text-guided image editing via diffusion models, while powerful, raises significant concerns about misuse, motivating efforts to immunize images against unauthorized edits using imperceptible...
A data-physics hybrid generative model for patient-specific post-stroke motor rehabilitation using wearable sensor data : Abstract: Dynamic prediction of locomotor capacity after stroke is crucial for tailoring rehabilitation, yet current assessments provide only static impairment scores and do not indicate whether patie...
Criminal Liability in AI-Enabled Autonomous Vehicles: A Comparative Study : Abstract: AI revolutionizes transportation through autonomous vehicles (AVs) but introduces complex criminal liability issues regarding infractions. This study employs a comparative legal analysis of ...
Step-Tagging: Toward controlling the generation of Language Reasoning Models through step monitoring : Abstract: The field of Language Reasoning Models (LRMs) has been very active over the past few years with advances in training and inference techniques enabling LRMs to reason longer, and more accurat...
Dual Attention Guided Defense Against Malicious Edits : Abstract: Recent progress in text-to-image diffusion models has transformed image editing via text prompts, yet this also introduces significant ethical challenges from potential misuse in creating de...
Towards Transferable Defense Against Malicious Image Edits : Abstract: Recent approaches employing imperceptible perturbations in input images have demonstrated promising potential to counter malicious manipulations in diffusion-based image editing systems. How...
Enhancing Interpretability for Vision Models via Shapley Value Optimization : Abstract: Deep neural networks have demonstrated remarkable performance across various domains, yet their decision-making processes remain opaque. Although many explanation methods are dedicated to br...
Causal Structure Learning for Dynamical Systems with Theoretical Score Analysis : Abstract: Real world systems evolve in continuous-time according to their underlying causal relationships, yet their dynamics are often unknown. Existing approaches to learning such dynamics typically...
RePo: Language Models with Context Re-Positioning : Abstract: In-context learning is fundamental to modern Large Language Models (LLMs); however, prevailing architectures impose a rigid and fixed contextual structure by assigning linear or constant pos...
DISCODE: Distribution-Aware Score Decoder for Robust Automatic Evaluation of Image Captioning : Abstract: Large vision-language models (LVLMs) have shown impressive performance across a broad range of multimodal tasks. However, robust image caption evaluation using LVLMs remains challenging, par...
Effect of Document Packing on the Latent Multi-Hop Reasoning Capabilities of Large Language Models : Abstract: The standard practice for training large language models involves packing multiple documents together to optimize computational efficiency. However, the impact of this process on the models'...
Reasoning-Style Poisoning of LLM Agents via Stealthy Style Transfer: Process-Level Attacks and Runtime Monitoring in RSV Space : Abstract: Large Language Model (LLM) agents relying on external retrieval are increasingly deployed in high-stakes environments. While existing adversarial attacks primarily focus on content falsifica...
TACK Tunnel Data (TTD): A Benchmark Dataset for Deep Learning-Based Defect Detection in Tunnels : Abstract: Tunnels are essential elements of transportation infrastructure, but are increasingly affected by ageing and deterioration mechanisms such as cracking. Regular inspections are required to en...
SASQ: Static Activation Scaling for Quantization-Aware Training in Large Language Models : Abstract: Large language models (LLMs) excel at natural language tasks but face deployment challenges due to their growing size outpacing GPU memory advancements. Model quantization mitigates this iss...
CAPRMIL: Context-Aware Patch Representations for Multiple Instance Learning : Abstract: In computational pathology, weak supervision has become the standard for deep learning due to the gigapixel scale of WSIs and the scarcity of pixel-level annotations, with Multiple Instance ...
Dual Language Models: Balancing Training Efficiency and Overfitting Resilience : Abstract: This paper combines autoregressive and masked-diffusion training objectives without any architectural modifications, resulting in flexible language models that outperform single-objective mo...
VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models : Abstract: The rapid advancement of large language models (LLMs) has enabled new possibilities for applying artificial intelligence within the legal domain. Nonetheless, the complexity, hierarchical or...
CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer : Abstract: Image retrieval-based cross-view geo-localization (IRCVGL) aims to match images captured from significantly different viewpoints, such as satellite and street-level images. Existing methods ...
Polypersona: Persona-Grounded LLM for Synthetic Survey Responses : Abstract: This paper introduces PolyPersona, a generative framework for synthesizing persona-conditioned survey responses across multiple domains. The framework instruction-tunes compact chat models u...
Residual GRU+MHSA: A Lightweight Hybrid Recurrent Attention Model for Cardiovascular Disease Detection : Abstract: Cardiovascular disease (CVD) remains the leading cause of mortality worldwide, underscoring the need for reliable and efficient predictive tools that support early intervention. Traditional ...
Low-Resource, High-Impact: Building Corpora for Inclusive Language Technologies : Abstract: This tutorial (https://tum-nlp.github.io/low-resource-tutorial) is designed for NLP practitioners, researchers, and developers working with multilingual and low-resource languages who seek t...
Towards Nepali-language LLMs: Efficient GPT training with a Nepali BPE tokenizer : Abstract: Nepali, a low-resource language spoken by over 32 million people, continues to face challenges in natural language processing (NLP) due to its complex grammar, agglutinative morphology, and ...
FakeRadar: Probing Forgery Outliers to Detect Unknown Deepfake Videos : Abstract: In this paper, we propose FakeRadar, a novel deepfake video detection framework designed to address the challenges of cross-domain generalization in real-world scenarios. Existing detection ...
Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes : Abstract: Many practical decision-making problems involve tasks whose success depends on the entire system history, rather than on achieving a state with desired properties. Markovian Reinforcement Le...
JMMMU-Pro: Image-based Japanese Multi-discipline Multimodal Understanding Benchmark via Vibe Benchmark Construction : Abstract: This paper introduces JMMMU-Pro, an image-based Japanese Multi-discipline Multimodal Understanding Benchmark, and Vibe Benchmark Construction, a scalable construction method. Following the e...
MuseCPBench: an Empirical Study of Music Editing Methods through Music Context Preservation : Abstract: Music editing plays a vital role in modern music production, with applications in film, broadcasting, and game development. Recent advances in music generation models have enabled diverse ed...
A Multicenter Benchmark of Multiple Instance Learning Models for Lymphoma Subtyping from HE-stained Whole Slide Images : Abstract: Timely and accurate lymphoma diagnosis is essential for guiding cancer treatment. Standard diagnostic practice combines hematoxylin and eosin (HE)-stained whole slide images with immunohisto...
gridfm-datakit-v1: A Python Library for Scalable and Realistic Power Flow and Optimal Power Flow Data Generation : Abstract: We introduce gridfm-datakit-v1, a Python library for generating realistic and diverse Power Flow (PF) and Optimal Power Flow (OPF) datasets for training Machine Learning (ML) solvers. Existi...
VASA-3D: Lifelike Audio-Driven Gaussian Head Avatars from a Single Image : Abstract: We propose VASA-3D, an audio-driven, single-shot 3D head avatar generator. This research tackles two major challenges: capturing the subtle expression details present in real human faces, an...
Bias-Variance Trade-off for Clipped Stochastic First-Order Methods: From Bounded Variance to Infinite Mean : Abstract: Stochastic optimization is fundamental to modern machine learning. Recent research has extended the study of stochastic first-order methods (SFOMs) from light-tailed to heavy-tailed noise, w...
Spoken DialogSum: An Emotion-Rich Conversational Dataset for Spoken Dialogue Summarization : Abstract: Recent audio language models can follow long conversations. However, research on emotion-aware or spoken dialogue summarization is constrained by the lack of data that links speech, summarie...
Native and Compact Structured Latents for 3D Generation : Abstract: Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture asset...
Spherical Leech Quantization for Visual Tokenization and Generation : Abstract: Non-parametric quantization has received much attention due to its efficiency on parameters and scalability to a large codebook. In this paper, we present a unified formulation of different ...
TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs : Abstract: This paper does not introduce a novel method but instead establishes a straightforward, incremental, yet essential baseline for video temporal grounding (VTG), a core capability in video und...
Efficient Reinforcement Learning with Semantic and Token Entropy for LLM Reasoning : Abstract: Reinforcement learning with verifiable rewards (RLVR) has demonstrated superior performance in enhancing the reasoning capability of large language models (LLMs). However, this accuracy-orie...
Decoding Emotional Trajectories: A Temporal-Semantic Network Approach for Latent Depression Assessment in Social Media : Abstract: The early identification and intervention of latent depression are of significant societal importance for mental health governance. While current automated detection methods based on social ...
Question Answering Over Spatio-Temporal Knowledge Graph : Abstract: Spatio-temporal knowledge graphs (STKGs) enhance traditional KGs by integrating temporal and spatial annotations, enabling precise reasoning over questions with spatio-temporal dependencies....
Runtime Analysis of Evolutionary Diversity Optimization on the Multi-objective (LeadingOnes, TrailingZeros) Problem : Abstract: Diversity optimization is the class of optimization problems in which we aim to find a diverse set of good solutions. One of the frequently-used approaches to solve such problems is to use e...
CrossPT-EEG: A Benchmark for Cross-Participant and Cross-Time Generalization of EEG-based Visual Decoding : Abstract: Exploring brain activity in relation to visual perception provides insights into the biological representation of the world. While functional magnetic resonance imaging (fMRI) and magnetoenc...
Enhancing Long-term RAG Chatbots with Psychological Models of Memory Importance and Forgetting : Abstract: While Retrieval-Augmented Generation (RAG) has shown promise in enhancing long-term conversations, the increasing memory load as conversations progress degrades retrieval accuracy. Drawing o...
Online Multi-modal Root Cause Identification in Microservice Systems : Abstract: Root Cause Analysis (RCA) is essential for pinpointing the root causes of failures in microservice systems. Traditional data-driven RCA methods are typically limited to offline applications ...
Chase Anonymisation: Privacy-Preserving Knowledge Graphs with Logical Reasoning : Abstract: We propose a novel framework to enable Knowledge Graphs (KGs) sharing while ensuring that information that should remain private is not directly released nor indirectly exposed via derived k...
Holistic Utility Preference Learning for Listwise Alignment : Abstract: Aligning large language models with human preferences is essential for improving interaction quality and safety by ensuring outputs better reflect human values. A promising strategy involves...
Renal Cell Carcinoma subtyping: learning from multi-resolution localization : Abstract: Renal Cell Carcinoma is typically asymptomatic at the early stages for many patients. This leads to a late diagnosis of the tumor, where the curability likelihood is lower, and makes the mor...
RepoTransBench: A Real-World Multilingual Benchmark for Repository-Level Code Translation : Abstract: Repository-level code translation refers to translating an entire code repository from one programming language to another while preserving the functionality of the source repository. Many b...
Property-Isometric Variational Autoencoders for Sequence Modeling and Design : Abstract: Biological sequence design (DNA, RNA, or peptides) with desired functional properties has applications in discovering novel nanomaterials, biosensors, antimicrobial drugs, and beyond. One co...
A Knowledge Graph-based Retrieval-Augmented Generation Framework for Algorithm Selection in the Facility Layout Problem : Abstract: Selecting a solution algorithm for the Facility Layout Problem (FLP), an NP-hard optimization problem with multiobjective trade-off, is a complex task that requires deep expert knowledge. Th...
Convergence dynamics of Agent-to-Agent Interactions with Misaligned objectives : Abstract: We develop and analyze a theoretical framework for agent-to-agent interactions in a simplified in-context linear regression setting. In our model, each agent is instantiated as a single-laye...
Physics-Guided Deep Learning for Heat Pump Stress Detection: A Comprehensive Analysis on When2Heat Dataset : Abstract: Heat pump systems are critical components in modern energy-efficient buildings, yet their operational stress detection remains challenging due to complex thermodynamic interactions and limit...
Mitigating Catastrophic Forgetting in Mathematical Reasoning Finetuning through Mixed Training : Abstract: When finetuning large language models for specialized tasks such as mathematical reasoning, models exhibit catastrophic forgetting, losing previously learned capabilities. We investigate thi...
Variational Physics-Informed Ansatz for Reconstructing Hidden Interaction Networks from Steady States : Abstract: The interaction structure of a complex dynamical system governs its collective behavior, yet existing reconstruction methods struggle with nonlinear, heterogeneous, and higher-order coupling...
Predictive Modeling of Flood-Prone Areas Using SAR and Environmental Variables : Abstract: Flooding is one of the most destructive natural hazards worldwide, posing serious risks to ecosystems, infrastructure, and human livelihoods. This study combines Synthetic Aperture Radar (SA...
Delete and Retain: Efficient Unlearning for Document Classification : Abstract: Machine unlearning aims to efficiently remove the influence of specific training data from a model without full retraining. While much progress has been made in unlearning for LLMs, document...
Universal Reasoning Model : Abstract: Universal transformers (UTs) have been widely used for complex reasoning tasks such as ARC-AGI and Sudoku, yet the specific sources of their performance gains remain underexplored. In this w...
Writing in Symbiosis: Mapping Human Creative Agency in the AI Era : Abstract: The proliferation of Large Language Models (LLMs) raises a critical question about what it means to be human when we share an increasingly symbiotic relationship with persuasive and creative...
Enhancing Transparency and Traceability in Healthcare AI: The AI Product Passport : Abstract: Objective: To develop the AI Product Passport, a standards-based framework improving transparency, traceability, and compliance in healthcare AI via lifecycle-based documentation. Materials ...
Safe2Harm: Semantic Isomorphism Attacks for Jailbreaking Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated exceptional performance across various tasks, but their security vulnerabilities can be exploited by attackers to generate harmful content, cau...
Scaling and Transferability of Annealing Strategies in Large Language Model Training : Abstract: Learning rate scheduling is crucial for training large language models, yet understanding the optimal annealing strategies across different model configurations remains challenging. In this ...
Federated Few-Shot Learning for Epileptic Seizure Detection Under Privacy Constraints : Abstract: Many deep learning approaches have been developed for EEG-based seizure detection; however, most rely on access to large centralized annotated datasets. In clinical practice, EEG data are of...
Made-in China, Thinking in America:U.S. Values Persist in Chinese LLMs : Abstract: As large language models increasingly mediate access to information and facilitate decision-making, they are becoming instruments in soft power competitions between global actors such as the...
Graph AI generates neurological hypotheses validated in molecular, organoid, and clinical systems : Abstract: Neurological diseases are the leading global cause of disability, yet most lack disease-modifying treatments. We present PROTON, a heterogeneous graph transformer that generates testable hyp...
Time-Constrained Recommendations: Reinforcement Learning Strategies for E-Commerce : Abstract: Unlike traditional recommendation tasks, finite user time budgets introduce a critical resource constraint, requiring the recommender system to balance item relevance and evaluation cost. Fo...
CurvaDion: Curvature-Adaptive Distributed Orthonormalization : Abstract: As language models scale to trillions of parameters, distributed training across many GPUs becomes essential, yet gradient synchronization over high-bandwidth, low-latency networks remains a...
Composite Classifier-Free Guidance for Multi-Modal Conditioning in Wind Dynamics Super-Resolution : Abstract: Various weather modelling problems (e.g., weather forecasting, optimizing turbine placements, etc.) require ample access to high-resolution, highly accurate wind data. Acquiring such high-re...
Exploring the Modular Integration of "AI + Architecture" Pedagogy in Undergraduate Design Education: A Case Study of Architectural Design III/IV Courses at Zhejiang University : Abstract: This study investigates AI integration in architectural education through a teaching experiment in Zhejiang University's 2024-25 grade three undergraduate design studio. Adopting a dual-modu...
Complex Mathematical Expression Recognition: Benchmark, Large-Scale Dataset and Strong Baseline : Abstract: Mathematical Expression Recognition (MER) has made significant progress in recognizing simple expressions, but the robust recognition of complex mathematical expressions with many tokens and...
PIS: A Generalized Physical Inversion Solver for Arbitrary Sparse Observations via Set-Conditioned Diffusion : Abstract: Estimation of PDE-constrained physical parameters from limited indirect measurements is inherently ill-posed, particularly when observations are sparse, irregular, and constrained by real-wo...
Low-Rank Compression of Language Models via Differentiable Rank Selection : Abstract: Approaches for compressing large-language models using low-rank decomposition have made strides, particularly with the introduction of activation and loss-aware SVD, which improves the trade...
Plug-and-Play Parameter-Efficient Tuning of Embeddings for Federated Recommendation : Abstract: With the rise of cloud-edge collaboration, recommendation services are increasingly trained in distributed environments. Federated Recommendation (FR) enables such multi-end collaborative tr...
DARTs: A Dual-Path Robust Framework for Anomaly Detection in High-Dimensional Multivariate Time Series : Abstract: Multivariate time series anomaly detection (MTSAD) aims to accurately identify and localize complex abnormal patterns in the large-scale industrial control systems. While existing approaches...
TF-MCL: Time-frequency Fusion and Multi-domain Cross-Loss for Self-supervised Depression Detection : Abstract: In recent years, there has been a notable increase in the use of supervised detection methods of major depressive disorder (MDD) based on electroencephalogram (EEG) signals. However, the pro...
Instilling Organisational Values in Firefighters through Simulation-Based Training : Abstract: In firefighting and other emergency operations, decisions made under pressure carry profound ethical weight and can significantly impact incident outcomes and firefighter safety. Traditional...
Human-AI Collaboration Mechanism Study on AIGC Assisted Image Production for Special Coverage : Abstract: Artificial Intelligence Generated Content (AIGC) assisting image production triggers controversy in journalism while attracting attention from media agencies. Key issues involve misinformati...
The Laminar Flow Hypothesis: Detecting Jailbreaks via Semantic Turbulence in Large Language Models : Abstract: As Large Language Models (LLMs) become ubiquitous, the challenge of securing them against adversarial "jailbreaking" attacks has intensified. Current defense strategies often rely on computa...
DL$^3$M: A Vision-to-Language Framework for Expert-Level Medical Reasoning through Deep Learning and Large Language Models : Abstract: Medical image classifiers detect gastrointestinal diseases well, but they do not explain their decisions. Large language models can generate clinical text, yet they struggle with visual reas...
Toward Noise-Aware Audio Deepfake Detection: Survey, SNR-Benchmarks, and Practical Recipes : Abstract: Deepfake audio detection has progressed rapidly with strong pre-trained encoders (e.g., WavLM, Wav2Vec2, MMS). However, performance in realistic capture conditions - background noise (domest...
A Spatio-Temporal Hybrid Quantum-Classical Graph Convolutional Neural Network Approach for Urban Taxi Destination Prediction : Abstract: We propose a Hybrid Spatio-Temporal Quantum Graph Convolutional Network (H-STQGCN) algorithm by combining the strengths of quantum computing and classical deep learning to predict the taxi d...
Why Text Prevails: Vision May Undermine Multimodal Medical Decision Making : Abstract: With the rapid progress of large language models (LLMs), advanced multimodal large language models (MLLMs) have demonstrated impressive zero-shot capabilities on vision-language tasks. In th...
Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis : Abstract: Financial sentiment analysis enhances market understanding; however, standard natural language processing approaches encounter significant challenges when applied to small datasets. This stu...
The algorithmic muse and the public domain: Why copyrights legal philosophy precludes protection for generative AI outputs : Abstract: Generative AI (GenAI) outputs are not copyrightable. This article argues why. We bypass conventional doctrinal analysis that focuses on black letter law notions of originality and authorship...
MIDUS: Memory-Infused Depth Up-Scaling : Abstract: Scaling large language models (LLMs) demands approaches that increase capacity without incurring excessive parameter growth or inference cost. Depth Up-Scaling (DUS) has emerged as a promisi...
STAR: STacked AutoRegressive Scheme for Unified Multimodal Learning : Abstract: Multimodal large language models (MLLMs) play a pivotal role in advancing the quest for general artificial intelligence. However, achieving unified target for multimodal understanding and ge...
Network-Wide Traffic Volume Estimation from Speed Profiles using a Spatio-Temporal Graph Neural Network with Directed Spatial Attention : Abstract: Existing traffic volume estimation methods typically address either forecasting traffic on sensor-equipped roads or spatially imputing missing volumes using nearby sensors. While forecasting...
Towards Deep Learning Surrogate for the Forward Problem in Electrocardiology: A Scalable Alternative to Physics-Based Models : Abstract: The forward problem in electrocardiology, computing body surface potentials from cardiac electrical activity, is traditionally solved using physics-based models such as the bidomain or monod...
Beyond Procedural Compliance: Human Oversight as a Dimension of Well-being Efficacy in AI Governance : Abstract: Major AI ethics guidelines and laws, including the EU AI Act, call for effective human oversight, but do not define it as a distinct and developable capacity. This paper introduces human ove...
EEG-D3: A Solution to the Hidden Overfitting Problem of Deep Learning Models : Abstract: Deep learning for decoding EEG signals has gained traction, with many claims to state-of-the-art accuracy. However, despite the convincing benchmark performance, successful translation to re...
VajraV1 -- The most accurate Real Time Object Detector of the YOLO family : Abstract: Recent years have seen significant advances in real-time object detection, with the release of YOLOv10, YOLO11, YOLOv12, and YOLOv13 between 2024 and 2025. This technical report presents the...
Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging : Abstract: Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter...
Verification-Guided Context Optimization for Tool Calling via Hierarchical LLMs-as-Editors : Abstract: Tool calling enables large language models (LLMs) to interact with external environments through tool invocation, providing a practical way to overcome the limitations of pretraining. Howeve...
Privacy-Enhancing Infant Cry Classification with Federated Transformers and Denoising Regularization : Abstract: Infant cry classification can aid early assessment of infant needs. However, deployment of such solutions is limited by privacy concerns around audio data, sensitivity to background noise, a...
OPTIMA: Optimal One-shot Pruning for LLMs via Quadratic Programming Reconstruction : Abstract: Post-training model pruning is a promising solution, yet it faces a trade-off: simple heuristics that zero weights are fast but degrade accuracy, while principled joint optimization methods ...
One Permutation Is All You Need: Fast, Reliable Variable Importance and Model Stress-Testing : Abstract: Reliable estimation of feature contributions in machine learning models is essential for trust, transparency and regulatory compliance, especially when models are proprietary or otherwise op...
Generative AI for Video Translation: A Scalable Architecture for Multilingual Video Conferencing : Abstract: The real-time deployment of cascaded generative AI pipelines for applications like video translation is constrained by significant system-level challenges. These include the cumulative laten...
Assessing High-Risk Systems: An EU AI Act Verification Framework : Abstract: A central challenge in implementing the AI Act and other AI-relevant regulations in the EU is the lack of a systematic approach to verify their legal mandates. Recent surveys show that this ...
Exploring Machine Learning, Deep Learning, and Explainable AI Methods for Seasonal Precipitation Prediction in South America : Abstract: Forecasting meteorological variables is challenging due to the complexity of their processes, requiring advanced models for accuracy. Accurate precipitation forecasts are vital for society. ...
Intelligent matter consisting of active particles : Abstract: In this book chapter, we review how systems of simple motile agents can be used as a pathway to intelligent systems. It is a well known result from nature that large groups of entities follo...
Context Branching for LLM Conversations: A Version Control Approach to Exploratory Programming : Abstract: Large Language Models (LLMs) have become integral to software engineering workflows, yet their effectiveness degrades significantly in multi-turn conversations. Recent studies demonstrate an...
Hierarchical Multi-agent Large Language Model Reasoning for Autonomous Functional Materials Discovery : Abstract: Artificial intelligence is reshaping scientific exploration, but most methods automate procedural tasks without engaging in scientific reasoning, limiting autonomy in discovery. We introduce...
Informing Acquisition Functions via Foundation Models for Molecular Discovery : Abstract: Bayesian Optimization (BO) is a key methodology for accelerating molecular discovery by estimating the mapping from molecules to their properties while seeking the optimal candidate. Typical...
Multi-Agent Collaborative Framework for Intelligent IT Operations: An AOI System with Context-Aware Compression and Dynamic Task Scheduling : Abstract: The proliferation of cloud-native architectures, characterized by microservices and dynamic orchestration, has rendered modern IT infrastructures exceedingly complex and volatile. This compl...
Memo2496: Expert-Annotated Dataset and Dual-View Adaptive Framework for Music Emotion Recognition : Abstract: Music Emotion Recogniser (MER) research faces challenges due to limited high-quality annotated datasets and difficulties in addressing cross-track feature drift. This work presents two prima...
Professional Software Developers Don't Vibe, They Control: AI Agent Use for Coding in 2025 : Abstract: The rise of AI agents is transforming how software can be built. The promise of agents is that developers might write code quicker, delegate multiple tasks to different agents, and even writ...
KFS-Bench: Comprehensive Evaluation of Key Frame Sampling in Long Video Understanding : Abstract: We propose KFS-Bench, the first benchmark for key frame sampling in long video question answering (QA), featuring multi-scene annotations to enable direct and robust evaluation of sampling s...
PerfCoder: Large Language Models for Interpretable Code Performance Optimization : Abstract: Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in rea...
Sample-Efficient Robot Skill Learning for Construction Tasks: Benchmarking Hierarchical Reinforcement Learning and Vision-Language-Action VLA Model : Abstract: This study evaluates two leading approaches for teaching construction robots new skills to understand their applicability for construction automation: a Vision-Language-Action (VLA) model an...
ACE-SLAM: Scene Coordinate Regression for Neural Implicit Real-Time SLAM : Abstract: We present a novel neural RGB-D Simultaneous Localization And Mapping (SLAM) system that learns an implicit map of the scene in real time. For the first time, we explore the use of Scene Coo...
OmniDrive-R1: Reinforcement-driven Interleaved Multi-modal Chain-of-Thought for Trustworthy Vision-Language Autonomous Driving : Abstract: The deployment of Vision-Language Models (VLMs) in safety-critical domains like autonomous driving (AD) is critically hindered by reliability failures, most notably object hallucination. Thi...
FacEDiT: Unified Talking Face Editing and Generation via Facial Motion Infilling : Abstract: Talking face editing and face generation have often been studied as distinct problems. In this work, we propose viewing both not as separate tasks but as subtasks of a unifying formulation, ...
Real-time prediction of workplane illuminance distribution for daylight-linked controls using non-intrusive multimodal deep learning : Abstract: Daylight-linked controls (DLCs) have significant potential for energy savings in buildings, especially when abundant daylight is available and indoor workplane illuminance can be accurately ...
Efficient-DLM: From Autoregressive to Diffusion Language Models, and Beyond in Speed : Abstract: Diffusion language models (dLMs) have emerged as a promising paradigm that enables parallel, non-autoregressive generation, but their learning efficiency lags behind that of autoregressive (...
SDAR-VL: Stable and Efficient Block-wise Diffusion for Vision-Language Understanding : Abstract: Block-wise discrete diffusion offers an attractive balance between parallel generation and causal dependency modeling, making it a promising backbone for vision-language modeling. However, i...
SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations : Abstract: Mixture of Experts (MoE) models have emerged as the de facto architecture for scaling up language models without significantly increasing the computational cost. Recent MoE models demonstrat...
Arithmetic-Intensity-Aware Quantization : Abstract: As modern neural networks become increasingly memory-bound, inference throughput is limited by DRAM bandwidth rather than compute. We present Arithmetic-Intensity-Aware Quantization (AIQ), a...
ProtoFlow: Interpretable and Robust Surgical Workflow Modeling with Learned Dynamic Scene Graph Prototypes : Abstract: Purpose: Detailed surgical recognition is critical for advancing AI-assisted surgery, yet progress is hampered by high annotation costs, data scarcity, and a lack of interpretable models. Wh...
Neurosymbolic Inference On Foundation Models For Remote Sensing Text-to-image Retrieval With Complex Queries : Abstract: Text-to-image retrieval in remote sensing (RS) has advanced rapidly with the rise of large vision-language models (LVLMs) tailored for aerial and satellite imagery, culminating in remote sen...
SportsGPT: An LLM-driven Framework for Interpretable Sports Motion Assessment and Training Guidance : Abstract: Existing intelligent sports analysis systems mainly focus on "scoring and visualization," often lacking automatic performance diagnosis and interpretable training guidance. Recent advances o...
UIXPOSE: Mobile Malware Detection via Intention-Behaviour Discrepancy Analysis : Abstract: We introduce UIXPOSE, a source-code-agnostic framework that operates on both compiled and open-source apps. This framework applies Intention Behaviour Alignment (IBA) to mobile malware analy...
LAPPI: Interactive Optimization with LLM-Assisted Preference-Based Problem Instantiation : Abstract: Many real-world tasks, such as trip planning or meal planning, can be formulated as combinatorial optimization problems. However, using optimization solvers is difficult for end users becaus...
TorchTraceAP: A New Benchmark Dataset for Detecting Performance Anti-Patterns in Computer Vision Models : Abstract: Identifying and addressing performance anti-patterns in machine learning (ML) models is critical for efficient training and inference, but it typically demands deep expertise spanning system...
PathFinder: Advancing Path Loss Prediction for Single-to-Multi-Transmitter Scenario : Abstract: Radio path loss prediction (RPP) is critical for optimizing 5G networks and enabling IoT, smart city, and similar applications. However, current deep learning-based RPP methods lack proactiv...
IntentMiner: Intent Inversion Attack via Tool Call Analysis in the Model Context Protocol : Abstract: The rapid evolution of Large Language Models (LLMs) into autonomous agents has led to the adoption of the Model Context Protocol (MCP) as a standard for discovering and invoking external too...
A Comparative Analysis of Retrieval-Augmented Generation Techniques for Bengali Standard-to-Dialect Machine Translation Using LLMs : Abstract: Translating from a standard language to its regional dialects is a significant NLP challenge due to scarce data and linguistic variation, a problem prominent in the Bengali language. This pa...
Towards Explainable Quantum AI: Informing the Encoder Selection of Quantum Neural Networks via Visualization : Abstract: Quantum Neural Networks (QNNs) represent a promising fusion of quantum computing and neural network architectures, offering speed-ups and efficient processing of high-dimensional, entangled ...
End-to-End Learning-based Video Streaming Enhancement Pipeline: A Generative AI Approach : Abstract: The primary challenge of video streaming is to balance high video quality with smooth playback. Traditional codecs are well tuned for this trade-off, yet their inability to use context means...
Understanding and Improving Hyperbolic Deep Reinforcement Learning : Abstract: The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose,...
Error Bound Analysis of Physics-Informed Neural Networks-Driven T2 Quantification in Cardiac Magnetic Resonance Imaging : Abstract: Physics-Informed Neural Networks (PINN) are emerging as a promising approach for quantitative parameter estimation of Magnetic Resonance Imaging (MRI). While existing deep learning methods c...
Estimating problem difficulty without ground truth using Large Language Model comparisons : Abstract: Recent advances in the finetuning of large language models (LLMs) have significantly improved their performance on established benchmarks, emphasizing the need for increasingly difficult, sy...
PentestEval: Benchmarking LLM-based Penetration Testing with Modular and Stage-Level Design : Abstract: Penetration testing is essential for assessing and strengthening system security against real-world threats, yet traditional workflows remain highly manual, expertise-intensive, and difficul...
Beyond MMD: Evaluating Graph Generative Models with Geometric Deep Learning : Abstract: Graph generation is a crucial task in many fields, including network science and bioinformatics, as it enables the creation of synthetic graphs that mimic the properties of real-world networ...
From Context to EDUs: Faithful and Structured Context Compression via Elementary Discourse Unit Decomposition : Abstract: Managing extensive context remains a critical bottleneck for Large Language Models (LLMs), particularly in applications like long-document question answering and autonomous agents where leng...
Explainable Preference Learning: a Decision Tree-based Surrogate Model for Preferential Bayesian Optimization : Abstract: Current Preferential Bayesian Optimization methods rely on Gaussian Processes (GPs) as surrogate models. These models are hard to interpret, struggle with handling categorical data, and are ...
SPARQL-LLM: Real-Time SPARQL Query Generation from Natural Language Questions : Abstract: The advent of large language models is contributing to the emergence of novel approaches that promise to better tackle the challenge of generating structured queries, such as SPARQL queries,...
Leveraging LLMs for Structured Data Extraction from Unstructured Patient Records : Abstract: Manual chart review remains an extremely time-consuming and resource-intensive component of clinical research, requiring experts to extract often complex information from unstructured electr...
Blind Radio Mapping via Spatially Regularized Bayesian Trajectory Inference : Abstract: Radio maps enable intelligent wireless applications by capturing the spatial distribution of channel characteristics. However, conventional construction methods demand extensive location-lab...
Adjudicator: Correcting Noisy Labels with a KG-Informed Council of LLM Agents : Abstract: The performance of production machine learning systems is fundamentally limited by the quality of their training data. In high-stakes industrial applications, noisy labels can degrade perfor...
LoopBench: Discovering Emergent Symmetry Breaking Strategies with LLM Swarms : Abstract: Large Language Models (LLMs) are increasingly being utilized as autonomous agents, yet their ability to coordinate in distributed systems remains poorly understood. We introduce \textbf{Loop...
AI-Powered Annotation Pipelines for Stabilizing Large Language Models: A Human-AI Synergy Approach : Abstract: LLM implementations are failing in highly regulated industries owing to instability issues, inconsistent reasoning, hallucinations and performance variability, especially in workflows. These...
Meta Hierarchical Reinforcement Learning for Scalable Resource Management in O-RAN : Abstract: The increasing complexity of modern applications demands wireless networks capable of real time adaptability and efficient resource management. The Open Radio Access Network (O-RAN) architec...
ValuePilot: A Two-Phase Framework for Value-Driven Decision-Making : Abstract: Personalized decision-making is essential for human-AI interaction, enabling AI agents to act in alignment with individual users' value preferences. As AI systems expand into real-world appl...
Compressed Causal Reasoning: Quantization and GraphRAG Effects on Interventional and Counterfactual Accuracy : Abstract: Causal reasoning in Large Language Models spanning association, intervention, and counterfactual inference is essential for reliable decision making in high stakes settings. As deployment sh...
State-Dependent Refusal and Learned Incapacity in RLHF-Aligned Language Models : Abstract: Large language models (LLMs) are widely deployed as general-purpose tools, yet extended interaction can reveal behavioral patterns not captured by standard quantitative benchmarks. We presen...
Mathematics and Coding are Universal AI Benchmarks : Abstract: We study the special role of mathematics and coding inside the moduli space of psychometric batteries for AI agents. Building on the AAI framework and GVU dynamics from previous works, we de...
Semantic Grounding Index: Geometric Bounds on Context Engagement in RAG Systems : Abstract: When retrieval-augmented generation (RAG) systems hallucinate, what geometric trace does this leave in embedding space? We introduce the Semantic Grounding Index (SGI), defined as the ratio ...
EvoLattice: Persistent Internal-Population Evolution through Multi-Alternative Quality-Diversity Graph Representations for LLM-Guided Program Discovery : Abstract: Large language models (LLMs) are increasingly used to evolve programs and multi-agent systems, yet most existing approaches rely on overwrite-based mutations that maintain only a single cand...
MURIM: Multidimensional Reputation-based Incentive Mechanism for Federated Learning : Abstract: Federated Learning (FL) has emerged as a leading privacy-preserving machine learning paradigm, enabling participants to share model updates instead of raw data. However, FL continues to face...
Evaluating Frontier LLMs on PhD-Level Mathematical Reasoning: A Benchmark on a Textbook in Theoretical Computer Science about Randomized Algorithms : Abstract: The rapid advancement of large language models (LLMs) has led to significant breakthroughs in automated mathematical reasoning and scientific discovery. Georgiev, G${ó}$mez-Serrano, Tao, and...
ReflCtrl: Controlling LLM Reflection via Representation Engineering : Abstract: Large language models (LLMs) with Chain-of-Thought (CoT) reasoning have achieved strong performance across diverse tasks, including mathematics, coding, and general reasoning. A distinctive ...
Sparsity-Controllable Dynamic Top-p MoE for Large Foundation Model Pre-training : Abstract: Sparse Mixture-of-Experts (MoE) architectures effectively scale model capacity by activating only a subset of experts for each input token. However, the standard Top-k routing strategy impos...
MobileWorldBench: Towards Semantic World Modeling For Mobile Agents : Abstract: World models have shown great utility in improving the task performance of embodied agents. While prior work largely focuses on pixel-space world models, these approaches face practical limi...
Evaluating Small Language Models for Agentic On-Farm Decision Support Systems : Abstract: Large Language Models (LLM) hold potential to support dairy scholars and farmers by supporting decision-making and broadening access to knowledge for stakeholders with limited technical expe...
Intention Chain-of-Thought Prompting with Dynamic Routing for Code Generation : Abstract: Large language models (LLMs) exhibit strong generative capabilities and have shown great potential in code generation. Existing chain-of-thought (CoT) prompting methods enhance model reasoni...
OpenDataArena: A Fair and Open Arena for Benchmarking Post-Training Dataset Value : Abstract: The rapid evolution of Large Language Models (LLMs) is predicated on the quality and diversity of post-training datasets. However, a critical dichotomy persists: while models are rigorously ...
RADAR: Accelerating Large Language Model Inference With RL-Based Dynamic Draft Trees : Abstract: Inference with modern Large Language Models (LLMs) is expensive and slow, and speculative sampling has emerged as an effective solution to this problem, however, the number of the calls to t...
Grammar Search for Multi-Agent Systems : Abstract: Automatic search for Multi-Agent Systems has recently emerged as a key focus in agentic AI research. Several prior approaches have relied on LLM-based free-form search over the code space. I...
HydroGEM: A Self Supervised Zero Shot Hybrid TCN Transformer Foundation Model for Continental Scale Streamflow Quality Control : Abstract: Real-time streamflow monitoring networks generate millions of observations annually, yet maintaining data quality across thousands of remote sensors remains labor-intensive. We introduce Hyd...
Optimizing Multi-Tier Supply Chain Ordering with a Hybrid Liquid Neural Network and Extreme Gradient Boosting Model : Abstract: Supply chain management (SCM) faces significant challenges like demand fluctuations and the bullwhip effect. Traditional methods and even state-of-the-art LLMs struggle with benchmarks like ...
Incentivizing Tool-augmented Thinking with Images for Medical Image Analysis : Abstract: Recent reasoning based medical MLLMs have made progress in generating step by step textual reasoning chains. However, they still struggle with complex tasks that necessitate dynamic and iter...
Georeferencing complex relative locality descriptions with large language models : Abstract: Georeferencing text documents has typically relied on either gazetteer-based methods to assign geographic coordinates to place names, or on language modelling approaches that associate textu...
G\"odel's Poetry : Abstract: Formal, automated theorem proving has long been viewed as a challenge to artificial intelligence. We introduce here a new approach to computer theorem proving, one that employs specialized l...
Leveraging LLMs for Collaborative Ontology Engineering in Parkinson Disease Monitoring and Alerting : Abstract: This paper explores the integration of Large Language Models (LLMs) in the engineering of a Parkinson's Disease (PD) monitoring and alerting ontology through four key methodologies: One Shot...
TiCard: Deployable EXPLAIN-only Residual Learning for Cardinality Estimation : Abstract: Cardinality estimation is a key bottleneck for cost-based query optimization, yet deployable improvements remain difficult: classical estimators miss correlations, while learned estimators o...
Massive Editing for Large Language Models Based on Dynamic Weight Generation : Abstract: Knowledge Editing (KE) is a field that studies how to modify some knowledge in Large Language Models (LLMs) at a low cost (compared to pre-training). Currently, performing large-scale edits ...
PortAgent: LLM-driven Vehicle Dispatching Agent for Port Terminals : Abstract: Vehicle Dispatching Systems (VDSs) are critical to the operational efficiency of Automated Container Terminals (ACTs). However, their widespread commercialization is hindered due to their lo...
Seismology modeling agent: A smart assistant for geophysical researchers : Abstract: To address the steep learning curve and reliance on complex manual file editing and command-line operations in the traditional workflow of the mainstream open-source seismic wave simulation ...
Context-Picker: Dynamic context selection using multi-stage reinforcement learning : Abstract: In long-context question answering (LCQA), determining the optimal amount of context for a given query is a significant challenge. Including too few passages may omit critical information, w...
Model-First Reasoning LLM Agents: Reducing Hallucinations through Explicit Problem Modeling : Abstract: Large Language Models (LLMs) often struggle with complex multi-step planning tasks, showing high rates of constraint violations and inconsistent solutions. Existing strategies such as Chain-...
Sparse Multi-Modal Transformer with Masking for Alzheimer's Disease Classification : Abstract: Transformer-based multi-modal intelligent systems often suffer from high computational and energy costs due to dense self-attention, limiting their scalability under resource constraints. Th...
Dynamic Learning Rate Scheduling based on Loss Changes Leads to Faster Convergence : Abstract: Despite significant advances in optimizers for training, most research works use common scheduler choices like Cosine or exponential decay. In this paper, we study \emph{GreedyLR}, a novel s...

Research Sources: 361 | Generated: 12/18/2025