AI RESEARCH PAPERS & ACADEMIC SOURCES
- VideoAfford: Grounding 3D Affordance from Human-Object-Interaction Videos via Multimodal Large Language Model : Abstract: 3D affordance grounding aims to highlight the actionable regions on 3D objects, which is crucial for robotic manipulation. Previous research primarily focused on learning affordance knowledg...
- Time2General: Learning Spatiotemporal Invariant Representations for Domain-Generalization Video Semantic Segmentation : Abstract: Domain Generalized Video Semantic Segmentation (DGVSS) is trained on a single labeled driving domain and is directly deployed on unseen domains without target labels and test-time adaptation...
- TreeCUA: Efficiently Scaling GUI Automation with Tree-Structured Verifiable Evolution : Abstract: Effectively scaling GUI automation is essential for computer-use agents (CUAs); however, existing work primarily focuses on scaling GUI grounding rather than the more crucial GUI planning, w...
- Semi-supervised Liver Segmentation and Patch-based Fibrosis Staging with Registration-aided Multi-parametric MRI : Abstract: Liver fibrosis poses a substantial challenge in clinical practice, emphasizing the necessity for precise liver segmentation and accurate disease staging. Based on the CARE Liver 2025 Track 4...
- GenSeg-R1: RL-Driven Vision-Language Grounding for Fine-Grained Referring Segmentation : Abstract: We study fine-grained referring image segmentation via a decoupled reason-then-segment pipeline. A vision-language model (VLM) receives an image and a natural-language query, reasons about t...
- Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models : Abstract: Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fin...
- From Lightweight CNNs to SpikeNets: Benchmarking Accuracy-Energy Tradeoffs with Pruned Spiking SqueezeNet : Abstract: Spiking Neural Networks (SNNs) are increasingly studied as energy-efficient alternatives to Convolutional Neural Networks (CNNs), particularly for edge intelligence. However, prior work has ...
- Toward Fine-Grained Facial Control in 3D Talking Head Generation : Abstract: Audio-driven talking head generation is a core component of digital avatars, and 3D Gaussian Splatting has shown strong performance in real-time rendering of high-fidelity talking heads. How...
- Robust Vision Systems for Connected and Autonomous Vehicles: Security Challenges and Attack Vectors : Abstract: This article investigates the robustness of vision systems in Connected and Autonomous Vehicles (CAVs), which is critical for developing Level-5 autonomous driving capabilities. Safe and rel...
- Where Do Images Come From? Analyzing Captions to Geographically Profile Datasets : Abstract: Recent studies show that text-to-image models often fail to generate geographically representative images, raising concerns about the representativeness of their training data and motivating...
- SciFlow-Bench: Evaluating Structure-Aware Scientific Diagram Generation via Inverse Parsing : Abstract: Scientific diagrams convey explicit structural information, yet modern text-to-image models often produce visually plausible but structurally incorrect results. Existing benchmarks either re...
- CompSplat: Compression-aware 3D Gaussian Splatting for Real-world Video : Abstract: High-quality novel view synthesis (NVS) from real-world videos is crucial for applications such as cultural heritage preservation, digital twins, and immersive media. However, real-world vid...
- SAKED: Mitigating Hallucination in Large Vision-Language Models via Stability-Aware Knowledge Enhanced Decoding : Abstract: Hallucinations in Large Vision-Language Models (LVLMs) pose significant security and reliability risks in real-world applications. Inspired by the observation that humans are more error-pron...
- ARK: A Dual-Axis Multimodal Retrieval Benchmark along Reasoning and Knowledge : Abstract: Existing multimodal retrieval benchmarks largely emphasize semantic matching on daily-life images and offer limited diagnostics of professional knowledge and complex reasoning. To address th...
- Kelix Technique Report : Abstract: Autoregressive large language models (LLMs) scale well by expressing diverse tasks as sequences of discrete natural-language tokens and training with next-token prediction, which unifies com...
- Reason-IAD: Knowledge-Guided Dynamic Latent Reasoning for Explainable Industrial Anomaly Detection : Abstract: Industrial anomaly detection demands precise reasoning over fine-grained defect patterns. However, existing multimodal large language models (MLLMs), pretrained on general-domain data, often...
- Free-GVC: Towards Training-Free Extreme Generative Video Compression with Temporal Coherence : Abstract: Building on recent advances in video generation, generative video compression has emerged as a new paradigm for achieving visually pleasing reconstructions. However, existing methods exhibit...
- BabyMamba-HAR: Lightweight Selective State Space Models for Efficient Human Activity Recognition on Resource Constrained Devices : Abstract: Human activity recognition (HAR) on wearable and mobile devices is constrained by memory footprint and computational budget, yet competitive accuracy must be maintained across heterogeneous ...
- MVISTA-4D: View-Consistent 4D World Model with Test-Time Action Inference for Robotic Manipulation : Abstract: World-model-based imagine-then-act becomes a promising paradigm for robotic manipulation, yet existing approaches typically support either purely image-based forecasting or reasoning over pa...
- AdaTSQ: Pushing the Pareto Frontier of Diffusion Transformers via Temporal-Sensitivity Quantization : Abstract: Diffusion Transformers (DiTs) have emerged as the state-of-the-art backbone for high-fidelity image and video generation. However, their massive computational cost and memory footprint hinde...
- SARS: A Novel Face and Body Shape and Appearance Aware 3D Reconstruction System extends Morphable Models : Abstract: Morphable Models (3DMMs) are a type of morphable model that takes 2D images as inputs and recreates the structure and physical appearance of 3D objects, especially human faces and bodies. 3D...
- A benchmark for video-based laparoscopic skill analysis and assessment : Abstract: Laparoscopic surgery is a complex surgical technique that requires extensive training. Recent advances in deep learning have shown promise in supporting this training by enabling automatic v...
- Monocular Normal Estimation via Shading Sequence Estimation : Abstract: Monocular normal estimation aims to estimate the normal map from a single RGB image of an object under arbitrary lights. Existing methods rely on deep models to directly predict normal maps....
- GeoFormer: A Swin Transformer-Based Framework for Scene-Level Building Height and Footprint Estimation from Sentinel Imagery : Abstract: Accurate three-dimensional urban data are critical for climate modelling, disaster risk assessment, and urban planning, yet remain scarce due to reliance on proprietary sensors or poor cross...
- Unbalanced optimal transport for robust longitudinal lesion evolution with registration-aware and appearance-guided priors : Abstract: Evaluating lesion evolution in longitudinal CT scans of can cer patients is essential for assessing treatment response, yet establishing reliable lesion correspondence across time remains ch...
- VersaViT: Enhancing MLLM Vision Backbones via Task-Guided Optimization : Abstract: Multimodal Large Language Models (MLLMs) have recently achieved remarkable success in visual-language understanding, demonstrating superior high-level semantic alignment within their vision ...
- Bladder Vessel Segmentation using a Hybrid Attention-Convolution Framework : Abstract: Urinary bladder cancer surveillance requires tracking tumor sites across repeated interventions, yet the deformable and hollow bladder lacks stable landmarks for orientation. While blood ves...
- Learning to Detect Baked Goods with Limited Supervision : Abstract: Monitoring leftover products provides valuable insights that can be used to optimize future production. This is especially important for German bakeries because freshly baked goods have a ve...
- Efficient Special Stain Classification : Abstract: Stains are essential in histopathology to visualize specific tissue characteristics, with Haematoxylin and Eosin (H&E) serving as the clinical standard. However, pathologists frequently ut...
- Faster-GS: Analyzing and Improving Gaussian Splatting Optimization : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have focused on accelerating optimization while preserving reconstruction quality. However, many proposed methods entangle implementation-leve...
- Perception with Guarantees: Certified Pose Estimation via Reachability Analysis : Abstract: Agents in cyber-physical systems are increasingly entrusted with safety-critical tasks. Ensuring safety of these agents often requires localizing the pose for subsequent actions. Pose estima...
- Fake-HR1: Rethinking reasoning of vision language model for synthetic image detection : Abstract: Recent studies have demonstrated that incorporating Chain-of-Thought (CoT) reasoning into the detection process can enhance a model's ability to detect synthetic images. However, excessively...
- Simple Image Processing and Similarity Measures Can Link Data Samples across Databases through Brain MRI : Abstract: Head Magnetic Resonance Imaging (MRI) is routinely collected and shared for research under strict regulatory frameworks. These frameworks require removing potential identifiers before sharin...
- Spatio-Temporal Attention for Consistent Video Semantic Segmentation in Automated Driving : Abstract: Deep neural networks, especially transformer-based architectures, have achieved remarkable success in semantic segmentation for environmental perception. However, existing models process vid...
- Can Image Splicing and Copy-Move Forgery Be Detected by the Same Model? Forensim: An Attention-Based State-Space Approach : Abstract: We introduce Forensim, an attention-based state-space framework for image forgery detection that jointly localizes both manipulated (target) and source regions. Unlike traditional approaches...
- 4RC: 4D Reconstruction via Conditional Querying Anytime and Anywhere : Abstract: We present 4RC, a unified feed-forward framework for 4D reconstruction from monocular videos. Unlike existing approaches that typically decouple motion from geometry or produce limited 4D at...
- VideoWorld 2: Learning Transferable Knowledge from Real-world Videos : Abstract: Learning transferable knowledge from unlabeled video data and applying it in new environments is a fundamental capability of intelligent agents. This work presents VideoWorld 2, which extend...
- ConsID-Gen: View-Consistent and Identity-Preserving Image-to-Video Generation : Abstract: Image-to-Video generation (I2V) animates a static image into a temporally coherent video sequence following textual instructions, yet preserving fine-grained object identity under changing v...
- Quantum Multiple Rotation Averaging : Abstract: Multiple rotation averaging (MRA) is a fundamental optimization problem in 3D vision and robotics that aims to recover globally consistent absolute rotations from noisy relative measurements...
- SAGE: Scalable Agentic 3D Scene Generation for Embodied AI : Abstract: Real-world data collection for embodied agents remains costly and unsafe, calling for scalable, realistic, and simulator-ready 3D environments. However, existing scene-generation systems oft...
- Mamba-FCS: Joint Spatio- Frequency Feature Fusion, Change-Guided Attention, and SeK Loss for Enhanced Semantic Change Detection in Remote Sensing : Abstract: Semantic Change Detection (SCD) from remote sensing imagery requires models balancing extensive spatial context, computational efficiency, and sensitivity to class-imbalanced land-cover tran...
- SAS-Net: Scene-Appearance Separation Network for Robust Spatiotemporal Registration in Bidirectional Photoacoustic Microscopy : Abstract: High-speed optical-resolution photoacoustic microscopy (OR-PAM) with bidirectional scanning enables rapid functional brain imaging but introduces severe spatiotemporal misalignment from co...
- Towards Human-AI Accessibility Mapping in India: VLM-Guided Annotations and POI-Centric Analysis in Chandigarh : Abstract: Project Sidewalk is a web-based platform that enables crowdsourcing accessibility of sidewalks at city-scale by virtually walking through city streets using Google Street View. The tool has ...
- Understanding and Enhancing Encoder-based Adversarial Transferability against Large Vision-Language Models : Abstract: Large vision-language models (LVLMs) have achieved impressive success across multimodal tasks, but their reliance on visual inputs exposes them to significant adversarial threats. Existing e...
- LLM-Grounded Dynamic Task Planning with Hierarchical Temporal Logic for Human-Aware Multi-Robot Collaboration : Abstract: While Large Language Models (LLM) enable non-experts to specify open-world multi-robot tasks, the generated plans often lack kinematic feasibility and are not efficient, especially in long-h...
- AnyTouch 2: General Optical Tactile Representation Learning For Dynamic Tactile Perception : Abstract: Real-world contact-rich manipulation demands robots to perceive temporal tactile feedback, capture subtle surface deformations, and reason about object properties as well as force dynamics. ...
- VLA-JEPA: Enhancing Vision-Language-Action Model with Latent World Model : Abstract: Pretraining Vision-Language-Action (VLA) policies on internet-scale video is appealing, yet current latent-action objectives often learn the wrong thing: they remain anchored to pixel variat...
- Deep Learning-Based Object Pose Estimation: A Comprehensive Survey : Abstract: Object pose estimation is a fundamental computer vision problem with broad applications in augmented reality and robotics. Over the past decade, deep learning models, due to their superior a...
- MpoxSLDNet: A Novel CNN Model for Detecting Monkeypox Lesions and Performance Comparison with Pre-trained Models : Abstract: Monkeypox virus (MPXV) is a zoonotic virus that poses a significant threat to public health, particularly in remote parts of Central and West Africa. Early detection of monkeypox lesions is ...
- Story-Iter: A Training-free Iterative Paradigm for Long Story Visualization : Abstract: This paper introduces Story-Iter, a new training-free iterative paradigm to enhance long-story generation. Unlike existing methods that rely on fixed reference images to construct a complete...
- Driving as a Diagnostic Tool: Scenario-based Cognitive Assessment in Older Drivers from Driving Video : Abstract: We introduce scenario-based cognitive status identification in older drivers from naturalistic driving videos, leveraging large vision models. In recent times, cognitive decline including De...
- LingxiDiagBench: A Multi-Agent Framework for Benchmarking LLMs in Chinese Psychiatric Consultation and Diagnosis : Abstract: Mental disorders are highly prevalent worldwide, but the shortage of psychiatrists and the inherent subjectivity of interview-based diagnosis create substantial barriers to timely and consis...
- SWE-AGI: Benchmarking Specification-Driven Software Construction with MoonBit in the Era of Autonomous Agents : Abstract: Although large language models (LLMs) have demonstrated impressive coding capabilities, their ability to autonomously build production-scale software from explicit specifications remains an ...
- AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms : Abstract: Vericoding refers to the generation of formally verified code from rigorous specifications. Recent AI models show promise in vericoding, but a unified methodology for cross-paradigm evaluati...
- Would a Large Language Model Pay Extra for a View? Inferring Willingness to Pay from Subjective Choices : Abstract: As Large Language Models (LLMs) are increasingly deployed in applications such as travel assistance and purchasing support, they are often required to make subjective choices on behalf of us...
- Covo-Audio Technical Report : Abstract: In this work, we present Covo-Audio, a 7B-parameter end-to-end LALM that directly processes continuous audio inputs and generates audio outputs within a single unified architecture. Through ...
- Code2World: A GUI World Model via Renderable Code Generation : Abstract: Autonomous GUI agents interact with environments by perceiving interfaces and executing actions. As a virtual sandbox, the GUI World model empowers agents with human-like foresight by enabli...
- QP-OneModel: A Unified Generative LLM for Multi-Task Query Understanding in Xiaohongshu Search : Abstract: Query Processing (QP) bridges user intent and content supply in large-scale Social Network Service (SNS) search engines. Traditional QP systems rely on pipelines of isolated discriminative m...
- Overview of the TREC 2025 RAGTIME Track : Abstract: The principal goal of the RAG TREC Instrument for Multilingual Evaluation (RAGTIME) track at TREC is to study report generation from multilingual source documents. The track has created a do...
- CAPID: Context-Aware PII Detection for Question-Answering Systems : Abstract: Detecting personally identifiable information (PII) in user queries is critical for ensuring privacy in question-answering systems. Current approaches mainly redact all PII, disregarding the...
- EAMET: Robust Massive Model Editing via Embedding Alignment Optimization : Abstract: Model editing techniques are essential for efficiently updating knowledge in large language models (LLMs). However, the effectiveness of existing approaches degrades in massive editing scena...
- SAGE: An Agentic Explainer Framework for Interpreting SAE Features in Language Models : Abstract: Large language models (LLMs) have achieved remarkable progress, yet their internal mechanisms remain largely opaque, posing a significant challenge to their safe and reliable deployment. Spa...
- Online Density-Based Clustering for Real-Time Narrative Evolution Monitorin : Abstract: Automated narrative intelligence systems for social media monitoring face significant scalability challenges when relying on batch clustering methods to process continuous data streams. We i...
- Survey of Video Diffusion Models: Foundations, Implementations, and Applications : Abstract: Recent advances in diffusion models have revolutionized video generation, offering superior temporal consistency and visual quality compared to traditional generative adversarial networks-ba...
- Agent Banana: High-Fidelity Image Editing with Agentic Thinking and Tooling : Abstract: We study instruction-based image editing under professional workflows and identify three persistent challenges: (i) editors often over-edit, modifying content beyond the user's intent; (ii) ...
- SemanticMoments: Training-Free Motion Similarity via Third Moment Features : Abstract: Retrieving videos based on semantic motion is a fundamental, yet unsolved, problem. Existing video representation approaches overly rely on static appearance and scene context rather than mo...
- A Hybrid Deterministic Framework for Named Entity Extraction in Broadcast News Video : Abstract: The growing volume of video-based news content has heightened the need for transparent and reliable methods to extract on-screen information. Yet the variability of graphical layouts, typogr...
- All-in-One Conditioning for Text-to-Image Synthesis : Abstract: Accurate interpretation and visual representation of complex prompts involving multiple objects, attributes, and spatial relationships is a critical challenge in text-to-image synthesis. Des...
- Wearable environmental sensing to forecast how legged systems will interact with upcoming terrain : Abstract: Computer-vision (CV) has been used for environmental classification during gait and is often used to inform control in assistive systems; however, the ability to predict how the foot will co...
- VLM-UQBench: A Benchmark for Modality-Specific and Cross-Modality Uncertainties in Vision Language Models : Abstract: Uncertainty quantification (UQ) is vital for ensuring that vision-language models (VLMs) behave safely and reliably. A central challenge is to localize uncertainty to its source, determining...
- VLM-Guided Iterative Refinement for Surgical Image Segmentation with Foundation Models : Abstract: Surgical image segmentation is essential for robot-assisted surgery and intraoperative guidance. However, existing methods are constrained to predefined categories, produce one-shot predicti...
- Rethinking Global Text Conditioning in Diffusion Transformers : Abstract: Diffusion transformers typically incorporate textual information via attention layers and a modulation mechanism using a pooled text embedding. Nevertheless, recent approaches discard modula...
- A Deep Multi-Modal Method for Patient Wound Healing Assessment : Abstract: Hospitalization of patients is one of the major factors for high wound care costs. Most patients do not acquire a wound which needs immediate hospitalization. However, due to factors such as...
- GAFR-Net: A Graph Attention and Fuzzy-Rule Network for Interpretable Breast Cancer Image Classification : Abstract: Accurate classification of breast cancer histopathology images is pivotal for early oncological diagnosis and therapeutic intervention.However, conventional deep learning architectures often...
- Deep Modeling and Interpretation for Bladder Cancer Classification : Abstract: Deep models based on vision transformer (ViT) and convolutional neural network (CNN) have demonstrated remarkable performance on natural datasets. However, these models may not be similar in...
- Kyrtos: A methodology for automatic deep analysis of graphic charts with curves in technical documents : Abstract: Deep Understanding of Technical Documents (DUTD) has become a very attractive field with great potential due to large amounts of accumulated documents and the valuable knowledge contained in...
- Impact of domain adaptation in deep learning for medical image classifications : Abstract: Domain adaptation (DA) is a quickly expanding area in machine learning that involves adjusting a model trained in one domain to perform well in another domain. While there have been notable ...
- Fully Differentiable Bidirectional Dual-Task Synergistic Learning for Semi-Supervised 3D Medical Image Segmentation : Abstract: Semi-supervised learning relaxes the need of large pixel-wise labeled datasets for image segmentation by leveraging unlabeled data. The scarcity of high-quality labeled data remains a major ...
- Single-Slice-to-3D Reconstruction in Medical Imaging and Natural Objects: A Comparative Benchmark with SAM 3D : Abstract: A 3D understanding of anatomy is central to diagnosis and treatment planning, yet volumetric imaging remains costly with long wait times. Image-to-3D foundations models can solve this issue ...
- K-Sort Eval: Efficient Preference Evaluation for Visual Generation via Corrected VLM-as-a-Judge : Abstract: The rapid development of visual generative models raises the need for more scalable and human-aligned evaluation methods. While the crowdsourced Arena platforms offer human preference assess...
- Stability and Concentration in Nonlinear Inverse Problems with Block-Structured Parameters: Lipschitz Geometry, Identifiability, and an Application to Gaussian Splatting : Abstract: We develop an operator-theoretic framework for stability and statistical concentration in nonlinear inverse problems with block-structured parameters. Under a unified set of assumptions comb...
- SceneReVis: A Self-Reflective Vision-Grounded Framework for 3D Indoor Scene Synthesis via Multi-turn RL : Abstract: Current one-pass 3D scene synthesis methods often suffer from spatial hallucinations, such as collisions, due to a lack of deliberative reasoning. To bridge this gap, we introduce SceneReVis...
- Fine-T2I: An Open, Large-Scale, and Diverse Dataset for High-Quality T2I Fine-Tuning : Abstract: High-quality and open datasets remain a major bottleneck for text-to-image (T2I) fine-tuning. Despite rapid progress in model architectures and training pipelines, most publicly available fi...
- Look-Ahead and Look-Back Flows: Training-Free Image Generation with Trajectory Smoothing : Abstract: Recent advances have reformulated diffusion models as deterministic ordinary differential equations (ODEs) through the framework of flow matching, providing a unified formulation for the noi...
- FD-DB: Frequency-Decoupled Dual-Branch Network for Unpaired Synthetic-to-Real Domain Translation : Abstract: Synthetic data provide low-cost, accurately annotated samples for geometry-sensitive vision tasks, but appearance and imaging differences between synthetic and real domains cause severe doma...
- Weakly Supervised Contrastive Learning for Histopathology Patch Embeddings : Abstract: Digital histopathology whole slide images (WSIs) provide gigapixel-scale high-resolution images that are highly useful for disease diagnosis. However, digital histopathology image analysis f...
- Beyond Next-Token Alignment: Distilling Multimodal Large Language Models via Token Interactions : Abstract: Multimodal Large Language Models (MLLMs) demonstrate impressive cross-modal capabilities, yet their substantial size poses significant deployment challenges. Knowledge distillation (KD) is a...
- OSI: One-step Inversion Excels in Extracting Diffusion Watermarks : Abstract: Watermarking is an important mechanism for provenance and copyright protection of diffusion-generated images. Training-free methods, exemplified by Gaussian Shading, embed watermarks into th...
- Equilibrium contrastive learning for imbalanced image classification : Abstract: Contrastive learning (CL) is a predominant technique in image classification, but they showed limited performance with an imbalanced dataset. Recently, several supervised CL methods have bee...
- Robust Depth Super-Resolution via Adaptive Diffusion Sampling : Abstract: We propose AdaDS, a generalizable framework for depth super-resolution that robustly recovers high-resolution depth maps from arbitrarily degraded low-resolution inputs. Unlike conventional ...
- Energy-Efficient Fast Object Detection on Edge Devices for IoT Systems : Abstract: This paper presents an Internet of Things (IoT) application that utilizes an AI classifier for fast-object detection using the frame difference method. This method, with its shorter duration...
- A Universal Action Space for General Behavior Analysis : Abstract: Analyzing animal and human behavior has long been a challenging task in computer vision. Early approaches from the 1970s to the 1990s relied on hand-crafted edge detection, segmentation, and...
- Attention to details, logits to truth: visual-aware attention and logits enhancement to mitigate hallucinations in LVLMs : Abstract: Existing Large Vision-Language Models (LVLMs) exhibit insufficient visual attention, leading to hallucinations. To alleviate this problem, some previous studies adjust and amplify visual att...
- Singpath-VL Technical Report : Abstract: We present Singpath-VL, a vision-language large model, to fill the vacancy of AI assistant in cervical cytology. Recent advances in multi-modal large language models (MLLMs) have significant...
- HLGFA: High-Low Resolution Guided Feature Alignment for Unsupervised Anomaly Detection : Abstract: Unsupervised industrial anomaly detection (UAD) is essential for modern manufacturing inspection, where defect samples are scarce and reliable detection is required. In this paper, we propos...
- Schr\"oMind: Mitigating Hallucinations in Multimodal Large Language Models via Solving the Schr\"odinger Bridge Problem : Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have achieved significant success across various domains. However, their use in high-stakes fields like healthcare remains lim...
- SCA-Net: Spatial-Contextual Aggregation Network for Enhanced Small Building and Road Change Detection : Abstract: Automated change detection in remote sensing imagery is critical for urban management, environmental monitoring, and disaster assessment. While deep learning models have advanced this field,...
- DR.Experts: Differential Refinement of Distortion-Aware Experts for Blind Image Quality Assessment : Abstract: Blind Image Quality Assessment, aiming to replicate human perception of visual quality without reference, plays a key role in vision tasks, yet existing models often fail to effectively capt...
- RAD: Retrieval-Augmented Monocular Metric Depth Estimation for Underrepresented Classes : Abstract: Monocular Metric Depth Estimation (MMDE) is essential for physically intelligent systems, yet accurate depth estimation for underrepresented classes in complex scenes remains a persistent ch...
- AUHead: Realistic Emotional Talking Head Generation via Action Units Control : Abstract: Realistic talking-head video generation is critical for virtual avatars, film production, and interactive systems. Current methods struggle with nuanced emotional expressions due to the lack...
- Scalpel: Fine-Grained Alignment of Attention Activation Manifolds via Mixture Gaussian Bridges to Mitigate Multimodal Hallucination : Abstract: Rapid progress in large vision-language models (LVLMs) has achieved unprecedented performance in vision-language tasks. However, due to the strong prior of large language models (LLMs) and m...
- Delving into Spectral Clustering with Vision-Language Representations : Abstract: Spectral clustering is known as a powerful technique in unsupervised data analysis. The vast majority of approaches to spectral clustering are driven by a single modality, leaving the rich i...
- MieDB-100k: A Comprehensive Dataset for Medical Image Editing : Abstract: The scarcity of high-quality data remains a primary bottleneck in adapting multimodal generative models for medical image editing. Existing medical image editing datasets often suffer from l...
- Hand2World: Autoregressive Egocentric Interaction Generation via Free-Space Hand Gestures : Abstract: Egocentric interactive world models are essential for augmented reality and embodied AI, where visual generation must respond to user input with low latency, geometric consistency, and long-...
- Tele-Omni: a Unified Multimodal Framework for Video Generation and Editing : Abstract: Recent advances in diffusion-based video generation have substantially improved visual fidelity and temporal coherence. However, most existing approaches remain task-specific and rely primar...
- AGMark: Attention-Guided Dynamic Watermarking for Large Vision-Language Models : Abstract: Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks may ...
- Towards Training-free Multimodal Hate Localisation with Large Language Models : Abstract: The proliferation of hateful content in online videos poses severe threats to individual well-being and societal harmony. However, existing solutions for video hate detection either rely hea...
- ECGFlowCMR: Pretraining with ECG-Generated Cine CMR Helps Cardiac Disease Classification and Phenotype Prediction : Abstract: Cardiac Magnetic Resonance (CMR) imaging provides a comprehensive assessment of cardiac structure and function but remains constrained by high acquisition costs and reliance on expert annota...
- Overview of PAN 2026: Voight-Kampff Generative AI Detection, Text Watermarking, Multi-Author Writing Style Analysis, Generative Plagiarism Detection, and Reasoning Trajectory Detection : Abstract: The goal of the PAN workshop is to advance computational stylometry and text forensics via objective and reproducible evaluation. In 2026, we run the following five tasks: (1) Voight-Kampff ...
- Measuring Inclusion in Interaction: Inclusion Analytics for Human-AI Collaborative Learning : Abstract: Inclusion, equity, and access are widely valued in AI and education, yet are often assessed through coarse sample descriptors or post-hoc self-reports that miss how inclusion is shaped momen...
- FM SO.P: A Progressive Task Mixture Framework with Automatic Evaluation for Cross-Domain SOP Understanding : Abstract: Standard Operating Procedures (SOPs) are critical for enterprise operations, yet existing language models struggle with SOP understanding and cross-domain generalization. Current methods fai...
- Understanding Risk and Dependency in AI Chatbot Use from User Discourse : Abstract: Generative AI systems are increasingly embedded in everyday life, yet empirical understanding of how psychological risk associated with AI use emerges, is experienced, and is regulated by us...
- Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs : Abstract: This study examines the extent to which Large Language Models (LLMs) capture geographic lexical variation in Spanish, a language that exhibits substantial regional variation. Treating LLMs a...
- Unsupervised Cross-Lingual Part-of-Speech Tagging with Monolingual Corpora Only : Abstract: Due to the scarcity of part-of-speech annotated data, existing studies on low-resource languages typically adopt unsupervised approaches for POS tagging. Among these, POS tag projection with...
- AgentSkiller: Scaling Generalist Agent Intelligence through Semantically Integrated Cross-Domain Data Synthesis : Abstract: Large Language Model agents demonstrate potential in solving real-world problems via tools, yet generalist intelligence is bottlenecked by scarce high-quality, long-horizon data. Existing me...
- AfriNLLB: Efficient Translation Models for African Languages : Abstract: In this work, we present AfriNLLB, a series of lightweight models for efficient translation from and into African languages. AfriNLLB supports 15 language pairs (30 translation directions), ...
- BiasScope: Towards Automated Detection of Bias in LLM-as-a-Judge Evaluation : Abstract: LLM-as-a-Judge has been widely adopted across various research and practical applications, yet the robustness and reliability of its evaluation remain a critical issue. A core challenge it f...
- Contractual Deepfakes: Can Large Language Models Generate Contracts? : Abstract: Notwithstanding their unprecedented ability to generate text, LLMs do not understand the meaning of words, have no sense of context and cannot reason. Their output constitutes an approximati...
- Effective vocabulary expanding of multilingual language models for extremely low-resource languages : Abstract: Multilingual pre-trained language models(mPLMs) offer significant benefits for many low-resource languages. To further expand the range of languages these models can support, many works focu...
- Are Language Models Sensitive to Morally Irrelevant Distractors? : Abstract: With the rapid development and uptake of large language models (LLMs) across high-stakes settings, it is increasingly important to ensure that LLMs behave in ways that align with human value...
- Breaking the Pre-Sampling Barrier: Activation-Informed Difficulty-Aware Self-Consistency : Abstract: Self-Consistency (SC) is an effective decoding strategy that improves the reasoning performance of Large Language Models (LLMs) by generating multiple chain-of-thought reasoning paths and se...
- Evaluating Social Bias in RAG Systems: When External Context Helps and Reasoning Hurts : Abstract: Social biases inherent in large language models (LLMs) raise significant fairness concerns. Retrieval-Augmented Generation (RAG) architectures, which retrieve external knowledge sources to e...
- Conceptual Cultural Index: A Metric for Cultural Specificity via Relative Generality : Abstract: Large language models (LLMs) are increasingly deployed in multicultural settings; however, systematic evaluation of cultural specificity at the sentence level remains underexplored. We propo...
- NOWJ @BioCreative IX ToxHabits: An Ensemble Deep Learning Approach for Detecting Substance Use and Contextual Information in Clinical Texts : Abstract: Extracting drug use information from unstructured Electronic Health Records remains a major challenge in clinical Natural Language Processing. While Large Language Models demonstrate advance...
- Listen to the Layers: Mitigating Hallucinations with Inter-Layer Disagreement : Abstract: Pretrained Large Language Models (LLMs) are prone to generating fluent yet factually incorrect text-a phenomenon known as hallucinations, undermining their reliability and utility in downstr...
- Where-to-Unmask: Ground-Truth-Guided Unmasking Order Learning for Masked Diffusion Language Models : Abstract: Masked Diffusion Language Models (MDLMs) generate text by iteratively filling masked tokens, requiring two coupled decisions at each step: which positions to unmask (where-to-unmask) and whi...
- EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies : Abstract: Long-horizon planning is widely recognized as a core capability of autonomous LLM-based agents; however, current evaluation frameworks suffer from being largely episodic, domain-specific, or...
- The CLEF-2026 CheckThat! Lab: Advancing Multilingual Fact-Checking : Abstract: The CheckThat! lab aims to advance the development of innovative technologies combating disinformation and manipulation efforts in online communication across a multitude of languages and pl...
- Knowledge Integration Decay in Search-Augmented Reasoning of Large Language Models : Abstract: Modern Large Language Models (LLMs) have demonstrated remarkable capabilities in complex tasks by employing search-augmented reasoning to incorporate external knowledge into long chains of t...
- UniARM: Towards a Unified Autoregressive Reward Model for Multi-Objective Test-Time Alignment : Abstract: Multi-objective alignment aims to align LLM responses with multiple human preference objectives. Among existing methods, guiding the generation of frozen LLMs through autoregressive reward m...
- Comprehensive Comparison of RAG Methods Across Multi-Domain Conversational QA : Abstract: Conversational question answering increasingly relies on retrieval-augmented generation (RAG) to ground large language models (LLMs) in external knowledge. Yet, most existing studies evaluat...
- Advancing Block Diffusion Language Models for Test-Time Scaling : Abstract: Recent advances in block diffusion language models have demonstrated competitive performance and strong scalability on reasoning tasks. However, existing BDLMs have limited exploration under...
- LEMUR: A Corpus for Robust Fine-Tuning of Multilingual Law Embedding Models for Retrieval : Abstract: Large language models (LLMs) are increasingly used to access legal information. Yet, their deployment in multilingual legal settings is constrained by unreliable retrieval and the lack of do...
- Context-Aware Counterfactual Data Augmentation for Gender Bias Mitigation in Language Models : Abstract: A challenge in mitigating social bias in fine-tuned language models (LMs) is the potential reduction in language modeling capability, which can harm downstream performance. Counterfactual da...
- Learning from the Irrecoverable: Error-Localized Policy Optimization for Tool-Integrated LLM Reasoning : Abstract: Tool-integrated reasoning (TIR) enables LLM agents to solve tasks through planning, tool use, and iterative revision, but outcome-only reinforcement learning in this setting suffers from spa...
- MILE-RefHumEval: A Reference-Free, Multi-Independent LLM Framework for Human-Aligned Evaluation : Abstract: We introduce MILE-RefHumEval, a reference-free framework for evaluating Large Language Models (LLMs) without ground-truth annotations or evaluator coordination. It leverages an ensemble of i...
- MATA: Multi-Agent Framework for Reliable and Flexible Table Question Answering : Abstract: Recent advances in Large Language Models (LLMs) have significantly improved table understanding tasks such as Table Question Answering (TableQA), yet challenges remain in ensuring reliabilit...
- Maastricht University at AMIYA: Adapting LLMs for Dialectal Arabic using Fine-tuning and MBR Decoding : Abstract: Large Language Models (LLMs) are becoming increasingly multilingual, supporting hundreds of languages, especially high resource ones. Unfortunately, Dialect variations are still underreprese...
- TraceMem: Weaving Narrative Memory Schemata from User Conversational Traces : Abstract: Sustaining long-term interactions remains a bottleneck for Large Language Models (LLMs), as their limited context windows struggle to manage dialogue histories that extend over time. Existin...
- Unsupervised Layer-Wise Dynamic Test Time Adaptation for LLMs : Abstract: Test-time adaptation (TTA) for large language models (LLMs) updates model parameters at inference time using signals available at deployment. This paper focuses on a common yet under-explore...
- AI-Assisted Scientific Assessment: A Case Study on Climate Change : Abstract: The emerging paradigm of AI co-scientists focuses on tasks characterized by repeatable verification, where agents explore search spaces in 'guess and check' loops. This paradigm does not ext...
- Targum -- A Multilingual New Testament Translation Corpus : Abstract: Many European languages possess rich biblical translation histories, yet existing corpora - in prioritizing linguistic breadth - often fail to capture this depth. To address this gap, we int...
- Improving Interpretability of Lexical Semantic Change with Neurobiological Features : Abstract: Lexical Semantic Change (LSC) is the phenomenon in which the meaning of a word change over time. Most studies on LSC focus on improving the performance of estimating the degree of LSC, howev...
- Where Are We At with Automatic Speech Recognition for the Bambara Language? : Abstract: This paper introduces the first standardized benchmark for evaluating Automatic Speech Recognition (ASR) in the Bambara language, utilizing one hour of professionally recorded Malian constit...
- AnalyticsGPT: An LLM Workflow for Scientometric Question Answering : Abstract: This paper introduces AnalyticsGPT, an intuitive and efficient large language model (LLM)-powered workflow for scientometric question answering. This underrepresented downstream task address...
- Text summarization via global structure awareness : Abstract: Text summarization is a fundamental task in natural language processing (NLP), and the information explosion has made long-document processing increasingly demanding, making summarization es...
- From FusHa to Folk: Exploring Cross-Lingual Transfer in Arabic Language Models : Abstract: Arabic Language Models (LMs) are pretrained predominately on Modern Standard Arabic (MSA) and are expected to transfer to its dialects. While MSA as the standard written variety is commonly ...
- LLM Reasoning Predicts When Models Are Right: Evidence from Coding Classroom Discourse : Abstract: Large Language Models (LLMs) are increasingly deployed to automatically label and analyze educational dialogue at scale, yet current pipelines lack reliable ways to detect when models are wr...
- How Do People Quantify Naturally: Evidence from Mandarin Picture Description : Abstract: Quantification is a fundamental component of everyday language use, yet little is known about how speakers decide whether and how to quantify in naturalistic production. We investigate quant...
- SinFoS: A Parallel Dataset for Translating Sinhala Figures of Speech : Abstract: Figures of Speech (FoS) consist of multi-word phrases that are deeply intertwined with culture. While Neural Machine Translation (NMT) performs relatively well with the figurative expression...
- Steer2Edit: From Activation Steering to Component-Level Editing : Abstract: Steering methods influence Large Language Model behavior by identifying semantic directions in hidden representations, but are typically realized through inference-time activation interventi...
- The Devil Behind Moltbook: Anthropic Safety is Always Vanishing in Self-Evolving AI Societies : Abstract: The emergence of multi-agent systems built from large language models (LLMs) offers a promising paradigm for scalable collective intelligence and self-evolution. Ideally, such systems would ...
- AmharicIR+Instr: A Two-Dataset Resource for Neural Retrieval and Instruction Tuning : Abstract: Neural retrieval and GPT-style generative models rely on large, high-quality supervised data, which is still scarce for low-resource languages such as Amharic. We release an Amharic data res...
- ATTNPO: Attention-Guided Process Supervision for Efficient Reasoning : Abstract: Large reasoning models trained with reinforcement learning and verifiable rewards (RLVR) achieve strong performance on complex reasoning tasks, yet often overthink, generating redundant reas...
- ViMultiChoice: Toward a Method That Gives Explanation for Multiple-Choice Reading Comprehension in Vietnamese : Abstract: Multiple-choice Reading Comprehension (MCRC) models aim to select the correct answer from a set of candidate options for a given question. However, they typically lack the ability to explain...
- A Unified Assessment of the Poverty of the Stimulus Argument for Neural Language Models : Abstract: How can children acquire native-level syntax from limited input? According to the Poverty of the Stimulus Hypothesis (PoSH), the linguistic input children receive is insufficient to explain ...
- ViSpeechFormer: A Phonemic Approach for Vietnamese Automatic Speech Recognition : Abstract: Vietnamese has a phonetic orthography, where each grapheme corresponds to at most one phoneme and vice versa. Exploiting this high grapheme-phoneme transparency, we propose ViSpeechFormer (\...
- SCORE: Specificity, Context Utilization, Robustness, and Relevance for Reference-Free LLM Evaluation : Abstract: Large language models (LLMs) are increasingly used to support question answering and decision-making in high-stakes, domain-specific settings such as natural hazard response and infrastructu...
- Decoupled Reasoning with Implicit Fact Tokens (DRIFT): A Dual-Model Framework for Efficient Long-Context Inference : Abstract: The integration of extensive, dynamic knowledge into Large Language Models (LLMs) remains a significant challenge due to the inherent entanglement of factual data and reasoning patterns. Exi...
- MEVER: Multi-Modal and Explainable Claim Verification with Graph-based Evidence Retrieval : Abstract: Verifying the truthfulness of claims usually requires joint multi-modal reasoning over both textual and visual evidence, such as analyzing both textual caption and chart image for claim veri...
- Anagent For Enhancing Scientific Table & Figure Analysis : Abstract: In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specifi...
- Quantum-Audit: Evaluating the Reasoning Limits of LLMs on Quantum Computing : Abstract: Language models have become practical tools for quantum computing education and research, from summarizing technical papers to explaining theoretical concepts and answering questions about r...
- PABU: Progress-Aware Belief Update for Efficient LLM Agents : Abstract: Large Language Model (LLM) agents commonly condition actions on full action-observation histories, which introduce task-irrelevant information that easily leads to redundant actions and high...
- FlyAOC: Evaluating Agentic Ontology Curation of Drosophila Scientific Knowledge Bases : Abstract: Scientific knowledge bases accelerate discovery by curating findings from primary literature into structured, queryable formats for both human researchers and emerging AI systems. Maintainin...
- Collective Behavior of AI Agents: the Case of Moltbook : Abstract: We present a large scale data analysis of Moltbook, a Reddit-style social media platform exclusively populated by AI agents. Analyzing over 369,000 posts and 3.0 million comments from approx...
- Triggered: A Statistical Analysis of Environmental Influences on Extremist Groups : Abstract: Online extremist communities operate within a wider information ecosystem shaped by real-world events, news coverage, and cross-community interaction. We adopt a systems perspective to exami...
- Not-in-Perspective: Towards Shielding Google's Perspective API Against Adversarial Negation Attacks : Abstract: The rise of cyberbullying in social media platforms involving toxic comments has escalated the need for effective ways to monitor and moderate online interactions. Existing solutions of auto...
- AlignTune: Modular Toolkit for Post-Training Alignment of Large Language Models : Abstract: Post-training alignment is central to deploying large language models (LLMs), yet practical workflows remain split across backend-specific tools and ad-hoc glue code, making experiments hard...
- The Entropic Signature of Class Speciation in Diffusion Models : Abstract: Diffusion models do not recover semantic structure uniformly over time. Instead, samples transition from semantic ambiguity to class commitment within a narrow regime. Recent theoretical wor...
- Life Cycle-Aware Evaluation of Knowledge Distillation for Machine Translation: Environmental Impact and Translation Quality Trade-offs : Abstract: Knowledge distillation (KD) is a tool to compress a larger system (teacher) into a smaller one (student). In machine translation, studies typically report only the translation quality of the...
- SAQNN: Spectral Adaptive Quantum Neural Network as a Universal Approximator : Abstract: Quantum machine learning (QML), as an interdisciplinary field bridging quantum computing and machine learning, has garnered significant attention in recent years. Currently, the field as a w...
- Continual Learning for non-stationary regression via Memory-Efficient Replay : Abstract: Data streams are rarely static in dynamic environments like Industry 4.0. Instead, they constantly change, making traditional offline models outdated unless they can quickly adjust to the ne...
- Allure of Craquelure: A Variational-Generative Approach to Crack Detection in Paintings : Abstract: Recent advances in imaging technologies, deep learning and numerical performance have enabled non-invasive detailed analysis of artworks, supporting their documentation and conservation. In ...
- Linear Model Extraction via Factual and Counterfactual Queries : Abstract: In model extraction attacks, the goal is to reveal the parameters of a black-box machine learning model by querying the model for a selected set of data points. Due to an increasing demand f...
- Self-Supervised Learning as Discrete Communication : Abstract: Most self-supervised learning (SSL) methods learn continuous visual representations by aligning different views of the same input, offering limited control over how information is structured...
- Toeplitz Based Spectral Methods for Data-driven Dynamical Systems : Abstract: We introduce a Toeplitz-based framework for data-driven spectral estimation of linear evolution operators in dynamical systems. Focusing on transfer and Koopman operators from equilibrium tr...
- Decomposing Reasoning Efficiency in Large Language Models : Abstract: Large language models trained for reasoning trade off inference tokens against accuracy, yet standard evaluations report only final accuracy, obscuring where tokens are spent or wasted. We i...
- Hybrid Responsible AI-Stochastic Approach for SLA Compliance in Multivendor 6G Networks : Abstract: The convergence of AI and 6G network automation introduces new challenges in maintaining transparency, fairness, and accountability across multivendor management systems. Although closed-loo...
- Step-Size Stability in Stochastic Optimization: A Theoretical Perspective : Abstract: We present a theoretical analysis of stochastic optimization methods in terms of their sensitivity with respect to the step size. We identify a key quantity that, for each method, describes ...
- Stabilized Maximum-Likelihood Iterative Quantum Amplitude Estimation for Structural CVaR under Correlated Random Fields : Abstract: Conditional Value-at-Risk (CVaR) is a central tail-risk measure in stochastic structural mechanics, yet its accurate evaluation under high-dimensional, spatially correlated material uncertai...
- Robust Processing and Learning: Principles, Methods, and Wireless Applications : Abstract: This tutorial-style overview article examines the fundamental principles and methods of robustness, using wireless sensing and communication (WSC) as the narrative and exemplifying framework...
- Stemphonic: All-at-once Flexible Multi-stem Music Generation : Abstract: Music stem generation, the task of producing musically-synchronized and isolated instrument audio clips, offers the potential of greater user control and better alignment with musician workf...
- Routing, Cascades, and User Choice for LLMs : Abstract: To mitigate the trade-offs between performance and costs, LLM providers route user tasks to different models based on task difficulty and latency. We study the effect of LLM routing with res...
- LLMs Encode Their Failures: Predicting Success from Pre-Generation Activations : Abstract: Running LLMs with extended reasoning on every problem is expensive, but determining which inputs actually require additional compute remains challenging. We investigate whether their own lik...
- The Catastrophic Failure of The k-Means Algorithm in High Dimensions, and How Hartigan's Algorithm Avoids It : Abstract: Lloyd's k-means algorithm is one of the most widely used clustering methods. We prove that in high-dimensional, high-noise settings, the algorithm exhibits catastrophic failure: with high pr...
- Statistical-Computational Trade-offs in Learning Multi-Index Models via Harmonic Analysis : Abstract: We study the problem of learning multi-index models (MIMs), where the label depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown $\mathsf{s}$-dimensional projection...
- Coupled Inference in Diffusion Models for Semantic Decomposition : Abstract: Many visual scenes can be described as compositions of latent factors. Effective recognition, reasoning, and editing often require not only forming such compositional representations, but al...
- Conformal Prediction Sets for Instance Segmentation : Abstract: Current instance segmentation models achieve high performance on average predictions, but lack principled uncertainty quantification: their outputs are not calibrated, and there is no guaran...
- Evaluating Disentangled Representations for Controllable Music Generation : Abstract: Recent approaches in music generation rely on disentangled representations, often labeled as structure and timbre or local and global, to enable controllable synthesis. Yet the underlying pr...
- Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning : Abstract: Recent advances in large language model (LLM) have empowered autonomous agents to perform complex tasks that require multi-turn interactions with tools and environments. However, scaling suc...
- Causality in Video Diffusers is Separable from Denoising : Abstract: Causality -- referring to temporal, uni-directional cause-effect relationships between components -- underlies many complex generative processes, including videos, language, and robot trajec...
- Olaf-World: Orienting Latent Actions for Video World Modeling : Abstract: Scaling action-controllable world models is limited by the scarcity of action labels. While latent action learning promises to extract control interfaces from unlabeled video, learned latent...
- On-Policy Policy Gradient Reinforcement Learning Without On-Policy Sampling : Abstract: On-policy reinforcement learning (RL) algorithms are typically characterized as algorithms that perform policy updates using i.i.d. trajectories collected by the agent's current policy. Howe...
- Data-efficient and Interpretable Inverse Materials Design using a Disentangled Variational Autoencoder : Abstract: Inverse materials design has proven successful in accelerating novel material discovery. Many inverse materials design methods use unsupervised learning where a latent space is learned to of...
- Influence of Recommender Systems on Users: A Dynamical Systems Analysis : Abstract: We analyze the unintended effects that recommender systems have on the preferences of users that they are learning. We consider a contextual multi-armed bandit recommendation algorithm that ...
- Detecting and Mitigating Memorization in Diffusion Models through Anisotropy of the Log-Probability : Abstract: Diffusion-based image generative models produce high-fidelity images through iterative denoising but remain vulnerable to memorization, where they unintentionally reproduce exact copies or p...
- TwinWeaver: An LLM-Based Foundation Model Framework for Pan-Cancer Digital Twins : Abstract: Precision oncology requires forecasting clinical events and trajectories, yet modeling sparse, multi-modal clinical time series remains a critical challenge. We introduce TwinWeaver, an open...
- Information-Theoretic Limits of Quantum Learning via Data Compression : Abstract: Understanding the power of quantum data in machine learning is central to many proposed applications of quantum technologies. While access to quantum data can offer exponential advantages fo...
- Pre-training Tensor-Train Networks Facilitates Machine Learning with Variational Quantum Circuits : Abstract: Data encoding remains a fundamental bottleneck in quantum machine learning, where amplitude encoding of high-dimensional classical vectors into quantum states incurs exponential cost. In thi...
- A Generalized Version of Chung's Lemma and its Applications : Abstract: Chung's Lemma is a classical tool for establishing asymptotic convergence rates of (stochastic) optimization methods under strong convexity-type assumptions and appropriate polynomial dimini...
- Decomposed Direct Preference Optimization for Structure-Based Drug Design : Abstract: Diffusion models have achieved promising results for Structure-Based Drug Design (SBDD). Nevertheless, high-quality protein subpocket and ligand data are relatively scarce, which hinders the...
- Aggregation Models with Optimal Weights for Distributed Gaussian Processes : Abstract: Gaussian process (GP) models have received increasing attention in recent years due to their superb prediction accuracy and modeling flexibility. To address the computational burdens of GP m...
- Differentiable Modeling for Low-Inertia Grids: Benchmarking PINNs, NODEs, and DP for Identification and Control of SMIB System : Abstract: The transition toward low-inertia power systems demands modeling frameworks that provide not only accurate state predictions but also physically consistent sensitivities for control. While s...
- Resilient Class-Incremental Learning: on the Interplay of Drifting, Unlabelled and Imbalanced Data Streams : Abstract: In today's connected world, the generation of massive streaming data across diverse domains has become commonplace. In the presence of concept drift, class imbalance, label scarcity, and new...
- Model soups need only one ingredient : Abstract: Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations special...
- Contextual and Seasonal LSTMs for Time Series Anomaly Detection : Abstract: Univariate time series (UTS), where each timestamp records a single variable, serve as crucial indicators in web systems and cloud servers. Anomaly detection in UTS plays an essential role i...
- Physics-informed diffusion models in spectral space : Abstract: We propose a methodology that combines generative latent diffusion models with physics-informed machine learning to generate solutions of parametric partial differential equations (PDEs) con...
- BRAVA-GNN: Betweenness Ranking Approximation Via Degree MAss Inspired Graph Neural Network : Abstract: Computing node importance in networks is a long-standing fundamental problem that has driven extensive study of various centrality measures. A particularly well-known centrality measure is b...
- ExO-PPO: an Extended Off-policy Proximal Policy Optimization Algorithm : Abstract: Deep reinforcement learning has been able to solve various tasks successfully, however, due to the construction of policy gradient and training dynamics, tuning deep reinforcement learning m...
- Towards Poisoning Robustness Certification for Natural Language Generation : Abstract: Understanding the reliability of natural language generation is critical for deploying foundation models in security-sensitive domains. While certified poisoning defenses provide provable ro...
- Grounding LTL Tasks in Sub-Symbolic RL Environments for Zero-Shot Generalization : Abstract: In this work we address the problem of training a Reinforcement Learning agent to follow multiple temporally-extended instructions expressed in Linear Temporal Logic in sub-symbolic environm...
- Explainability in Generative Medical Diffusion Models: A Faithfulness-Based Analysis on MRI Synthesis : Abstract: This study investigates the explainability of generative diffusion models in the context of medical imaging, focusing on Magnetic resonance imaging (MRI) synthesis. Although diffusion models...
- Flexible Entropy Control in RLVR with Gradient-Preserving Perspective : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a critical method for enhancing the reasoning capabilities of Large Language Models (LLMs). However, continuous training ...
- Why Linear Interpretability Works: Invariant Subspaces as a Result of Architectural Constraints : Abstract: Linear probes and sparse autoencoders consistently recover meaningful structure from transformer representations -- yet why should such simple methods succeed in deep, nonlinear systems? We ...
- Circuit Fingerprints: How Answer Tokens Encode Their Geometrical Path : Abstract: Circuit discovery and activation steering in transformers have developed as separate research threads, yet both operate on the same representational space. Are they two views of the same und...
- When Less is More: The LLM Scaling Paradox in Context Compression : Abstract: Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compr...
- Fully-automated sleep staging: multicenter validation of a generalizable deep neural network for Parkinson's disease and isolated REM sleep behavior disorder : Abstract: Isolated REM sleep behavior disorder (iRBD) is a key prodromal marker of Parkinson's disease (PD), and video-polysomnography (vPSG) remains the diagnostic gold standard. However, manual slee...
- A Controlled Study of Double DQN and Dueling DQN Under Cross-Environment Transfer : Abstract: Transfer learning in deep reinforcement learning is often motivated by improved stability and reduced training cost, but it can also fail under substantial domain shift. This paper presents ...
- PlugSI: Plug-and-Play Test-Time Graph Adaptation for Spatial Interpolation : Abstract: With the rapid advancement of IoT and edge computing, sensor networks have become indispensable, driving the need for large-scale sensor deployment. However, the high deployment cost hinders...
- CoFEH: LLM-driven Feature Engineering Empowered by Collaborative Bayesian Hyperparameter Optimization : Abstract: Feature Engineering (FE) is pivotal in automated machine learning (AutoML) but remains a bottleneck for traditional methods, which treat it as a black-box search, operating within rigid, pre...
- Differentiable Tripartite Modularity for Clustering Heterogeneous Graphs : Abstract: Clustering heterogeneous relational data remains a central challenge in graph learning, particularly when interactions involve more than two types of entities. While differentiable modularit...
- Statistical benchmarking of transformer models in low signal-to-noise time-series forecasting : Abstract: We study the performance of transformer architectures for multivariate time-series forecasting in low-data regimes consisting of only a few years of daily observations. Using synthetically g...
- Safeguarding Privacy: Privacy-Preserving Detection of Mind Wandering and Disengagement Using Federated Learning in Online Education : Abstract: Since the COVID-19 pandemic, online courses have expanded access to education, yet the absence of direct instructor support challenges learners' ability to self-regulate attention and engage...
- Drug Release Modeling using Physics-Informed Neural Networks : Abstract: Accurate modeling of drug release is essential for designing and developing controlled-release systems. Classical models (Fick, Higuchi, Peppas) rely on simplifying assumptions that limit th...
- Causal Identification in Multi-Task Demand Learning with Confounding : Abstract: We study a canonical multi-task demand learning problem motivated by retail pricing, in which a firm seeks to estimate heterogeneous linear price-response functions across a large collection...
- Supervised Metric Regularization Through Alternating Optimization for Multi-Regime Physics-Informed Neural Networks : Abstract: Standard Physics-Informed Neural Networks (PINNs) often face challenges when modeling parameterized dynamical systems with sharp regime transitions, such as bifurcations. In these scenarios,...
- Online Monitoring Framework for Automotive Time Series Data using JEPA Embeddings : Abstract: As autonomous vehicles are rolled out, measures must be taken to ensure their safe operation. In order to supervise a system that is already in operation, monitoring frameworks are frequentl...
- Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions : Abstract: Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, ...
- Empirical Stability Analysis of Kolmogorov-Arnold Networks in Hard-Constrained Recurrent Physics-Informed Discovery : Abstract: We investigate the integration of Kolmogorov-Arnold Networks (KANs) into hard-constrained recurrent physics-informed architectures (HRPINN) to evaluate the fidelity of learned residual manif...
- Answer First, Reason Later: Aligning Search Relevance via Mode-Balanced Reinforcement Learning : Abstract: Building a search relevance model that achieves both low latency and high performance is a long-standing challenge in the search industry. To satisfy the millisecond-level response requireme...
- A Task-Centric Theory for Iterative Self-Improvement with Easy-to-Hard Curricula : Abstract: Iterative self-improvement fine-tunes an autoregressive large language model (LLM) on reward-verified outputs generated by the LLM itself. In contrast to the empirical success of self-improv...
- ADORA: Training Reasoning Models with Dynamic Advantage Estimation on Reinforcement Learning : Abstract: Reinforcement learning has become a cornerstone technique for developing reasoning models in complex tasks, ranging from mathematical problem-solving to imaginary reasoning. The optimization...
- Position: Message-passing and spectral GNNs are two sides of the same coin : Abstract: Graph neural networks (GNNs) are commonly divided into message-passing neural networks (MPNNs) and spectral graph neural networks, reflecting two largely separate research traditions in mach...
- Effectiveness of Binary Autoencoders for QUBO-Based Optimization Problems : Abstract: In black-box combinatorial optimization, objective evaluations are often expensive, so high quality solutions must be found under a limited budget. Factorization machine with quantum anneali...
- Optimistic World Models: Efficient Exploration in Model-Based Deep Reinforcement Learning : Abstract: Efficient exploration remains a central challenge in reinforcement learning (RL), particularly in sparse-reward environments. We introduce Optimistic World Models (OWMs), a principled and sc...
- Long Chain-of-Thought Compression via Fine-Grained Group Policy Optimization : Abstract: Large Language Models (LLMs) often generate unnecessarily verbose Chain-of-Thought (CoT) reasoning that increases computational costs and latency without proportional performance gains. In t...
- WildCat: Near-Linear Attention in Theory and Practice : Abstract: We introduce WildCat, a high-accuracy, low-cost approach to compressing the attention mechanism in neural networks. While attention is a staple of modern network architectures, it is also no...
- Vendi Novelty Scores for Out-of-Distribution Detection : Abstract: Out-of-distribution (OOD) detection is critical for the safe deployment of machine learning systems. Existing post-hoc detectors typically rely on model confidence scores or likelihood estim...
- Features as Rewards: Scalable Supervision for Open-Ended Tasks via Interpretability : Abstract: Language models trained on large-scale datasets have been shown to learn features that encode abstract concepts such as factuality or intent. Such features are traditionally used for test-ti...
- Step-resolved data attribution for looped transformers : Abstract: We study how individual training examples shape the internal computation of looped transformers, where a shared block is applied for $τ$ recurrent iterations to enable latent reasoning. Exis...
- Learning on the Manifold: Unlocking Standard Diffusion Transformers with Representation Encoders : Abstract: Leveraging representation encoders for generative modeling offers a path for efficient, high-fidelity synthesis. However, standard diffusion transformers fail to converge on these representa...
- Towards Explainable Federated Learning: Understanding the Impact of Differential Privacy : Abstract: Data privacy and eXplainable Artificial Intelligence (XAI) are two important aspects for modern Machine Learning systems. To enhance data privacy, recent machine learning models have been de...
- Biases in the Blind Spot: Detecting What LLMs Fail to Mention : Abstract: Large Language Models (LLMs) often provide chain-of-thought (CoT) reasoning traces that appear plausible, but may hide internal biases. We call these *unverbalized biases*. Monitoring models...
- Predicting Gene Disease Associations in Type 2 Diabetes Using Machine Learning on Single-Cell RNA-Seq Data : Abstract: Diabetes is a chronic metabolic disorder characterized by elevated blood glucose levels due to impaired insulin production or function. Two main forms are recognized: type 1 diabetes (T1D), ...
- Soft Clustering Anchors for Self-Supervised Speech Representation Learning in Joint Embedding Prediction Architectures : Abstract: Joint Embedding Predictive Architectures (JEPA) offer a promising approach to self-supervised speech representation learning, but suffer from representation collapse without explicit groundi...
- Windowed SummaryMixing: An Efficient Fine-Tuning of Self-Supervised Learning Models for Low-resource Speech Recognition : Abstract: Self-supervised learning (SSL) has advanced speech processing but suffers from quadratic complexity due to self-attention. To address this, SummaryMixing (SM) has been proposed as a linear-t...
- SVD-Preconditioned Gradient Descent Method for Solving Nonlinear Least Squares Problems : Abstract: This paper introduces a novel optimization algorithm designed for nonlinear least-squares problems. The method is derived by preconditioning the gradient descent direction using the Singular...
- Persistent Entropy as a Detector of Phase Transitions : Abstract: Persistent entropy (PE) is an information-theoretic summary statistic of persistence barcodes that has been widely used to detect regime changes in complex systems. Despite its empirical suc...
- Predicting Open Source Software Sustainability with Deep Temporal Neural Hierarchical Architectures and Explainable AI : Abstract: Open Source Software (OSS) projects follow diverse lifecycle trajectories shaped by evolving patterns of contribution, coordination, and community engagement. Understanding these trajectorie...
- DRAGON: Robust Classification for Very Large Collections of Software Repositories : Abstract: The ability to automatically classify source code repositories with ''topics'' that reflect their content and purpose is very useful, especially when navigating or searching through large so...
- UI-Venus-1.5 Technical Report : Abstract: GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains chall...
- Predicting magnetism with first-principles AI : Abstract: Computational discovery of magnetic materials remains challenging because magnetism arises from the competition between kinetic energy and Coulomb interaction that is often beyond the reach ...
- Decoding Future Risk: Deep Learning Analysis of Tubular Adenoma Whole-Slide Images : Abstract: Colorectal cancer (CRC) remains a significant cause of cancer-related mortality, despite the widespread implementation of prophylactic initiatives aimed at detecting and removing precancerou...
- Minimum Distance Summaries for Robust Neural Posterior Estimation : Abstract: Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary...
- Quantifying Epistemic Uncertainty in Diffusion Models : Abstract: To ensure high quality outputs, it is important to quantify the epistemic uncertainty of diffusion models.Existing methods are often unreliable because they mix epistemic and aleatoric uncer...
- One RNG to Rule Them All: How Randomness Becomes an Attack Vector in Machine Learning : Abstract: Machine learning relies on randomness as a fundamental component in various steps such as data sampling, data augmentation, weight initialization, and optimization. Most machine learning fra...
- EExApp: GNN-Based Reinforcement Learning for Radio Unit Energy Optimization in 5G O-RAN : Abstract: With over 3.5 million 5G base stations deployed globally, their collective energy consumption (projected to exceed 131 TWh annually) raises significant concerns over both operational costs a...
- Optimal Estimation in Orthogonally Invariant Generalized Linear Models: Spectral Initialization and Approximate Message Passing : Abstract: We consider the problem of parameter estimation from a generalized linear model with a random design matrix that is orthogonally invariant in law. Such a model allows the design have an arbi...
- Effective Reasoning Chains Reduce Intrinsic Dimensionality : Abstract: Chain-of-thought (CoT) reasoning and its variants have substantially improved the performance of language models on complex reasoning tasks, yet the precise mechanisms by which different str...
- Mutual Information Collapse Explains Disentanglement Failure in $\beta$-VAEs : Abstract: The $β$-VAE is a foundational framework for unsupervised disentanglement, using $β$ to regulate the trade-off between latent factorization and reconstruction fidelity. Empirically, however, ...
- X-Mark: Saliency-Guided Robust Dataset Ownership Verification for Medical Imaging : Abstract: High-quality medical imaging datasets are essential for training deep learning models, but their unauthorized use raises serious copyright and ethical concerns. Medical imaging presents a un...
- How Far Can You Grow? Characterizing the Extrapolation Frontier of Graph Generative Models for Materials Science : Abstract: Every generative model for crystalline materials harbors a critical structure size beyond which its outputs quietly become unreliable -- we call this the extrapolation frontier. Despite its ...
- Don't Shoot The Breeze: Topic Continuity Model Using Nonlinear Naive Bayes With Attention : Abstract: Utilizing Large Language Models (LLM) as chatbots in diverse business scenarios often presents the challenge of maintaining topic continuity. Abrupt shifts in topics can lead to poor user ex...
- Beyond Uniform Credit: Causal Credit Assignment for Policy Optimization : Abstract: Policy gradient methods for language model reasoning, such as GRPO and DAPO, assign uniform credit to all generated tokens - the filler phrase "Let me think" receives the same gradient updat...
- TVTSyn: Content-Synchronous Time-Varying Timbre for Streaming Voice Conversion and Anonymization : Abstract: Real-time voice conversion and speaker anonymization require causal, low-latency synthesis without sacrificing intelligibility or naturalness. Current systems have a core representational mi...
- The Critical Horizon: Inspection Design Principles for Multi-Stage Operations and Deep Reasoning : Abstract: Manufacturing lines, service journeys, supply chains, and AI reasoning chains share a common challenge: attributing a terminal outcome to the intermediate stage that caused it. We establish ...
- Is Memorization Helpful or Harmful? Prior Information Sets the Threshold : Abstract: We examine the connection between training error and generalization error for arbitrary estimating procedures, working in an overparameterized linear model under general priors in a Bayesian...
- LARV: Data-Free Layer-wise Adaptive Rescaling Veneer for Model Merging : Abstract: Model merging aims to combine multiple fine-tuned models into a single multi-task model without access to training data. Existing task-vector merging methods such as TIES, TSV-M, and Iso-C/C...
- Bridging the Modality Gap in Roadside LiDAR: A Training-Free Vision-Language Model Framework for Vehicle Classification : Abstract: Fine-grained truck classification is critical for intelligent transportation systems (ITS), yet current LiDAR-based methods face scalability challenges due to their reliance on supervised de...
- A Scoping Review of Deep Learning for Urban Visual Pollution and Proposal of a Real-Time Monitoring Framework with a Visual Pollution Index : Abstract: Urban Visual Pollution (UVP) has emerged as a critical concern, yet research on automatic detection and application remains fragmented. This scoping review maps the existing deep learning-ba...
- The Wisdom of Many Queries: Complexity-Diversity Principle for Dense Retriever Training : Abstract: Prior work reports conflicting results on query diversity in synthetic data generation for dense retrieval. We identify this conflict and design Q-D metrics to quantify diversity's impact, m...
- Enhancing Affine Maximizer Auctions with Correlation-Aware Payment : Abstract: Affine Maximizer Auctions (AMAs), a generalized mechanism family from VCG, are widely used in automated mechanism design due to their inherent dominant-strategy incentive compatibility (DSIC...
- From Average Sensitivity to Small-Loss Regret Bounds under Random-Order Model : Abstract: We study online learning in the random-order model, where the multiset of loss functions is chosen adversarially but revealed in a uniformly random order. Building on the batch-to-online con...
- ArtifactLens: Hundreds of Labels Are Enough for Artifact Detection with VLMs : Abstract: Modern image generators produce strikingly realistic images, where only artifacts like distorted hands or warped objects reveal their synthetic origin. Detecting these artifacts is essential...
- Predictive Query Language: A Domain-Specific Language for Predictive Modeling on Relational Databases : Abstract: The purpose of predictive modeling on relational data is to predict future or missing values in a relational database, for example, future purchases of a user, risk of readmission of the pat...
- Aligning Tree-Search Policies with Fixed Token Budgets in Test-Time Scaling of LLMs : Abstract: Tree-search decoding is an effective form of test-time scaling for large language models (LLMs), but real-world deployment imposes a fixed per-query token budget that varies across settings....
- Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows : Abstract: Real-world fine-tuning of dexterous manipulation policies remains challenging due to limited real-world interaction budgets and highly multimodal action distributions. Diffusion-based polici...
- On the Optimal Reasoning Length for RL-Trained Language Models : Abstract: Reinforcement learning substantially improves reasoning in large language models, but it also tends to lengthen chain of thought outputs and increase computational cost during both training ...
- Tracking Finite-Time Lyapunov Exponents to Robustify Neural ODEs : Abstract: We investigate finite-time Lyapunov exponents (FTLEs), a measure for exponential separation of input perturbations, of deep neural networks within the framework of continuous-depth neural OD...
- From Adam to Adam-Like Lagrangians: Second-Order Nonlocal Dynamics : Abstract: In this paper, we derive an accelerated continuous-time formulation of Adam by modeling it as a second-order integro-differential dynamical system. We relate this inertial nonlocal model to ...
- Distributed Hybrid Parallelism for Large Language Models: Comparative Study and System Design Guide : Abstract: With the rapid growth of large language models (LLMs), a wide range of methods have been developed to distribute computation and memory across hardware devices for efficient training and inf...
- Benchmarking the Energy Savings with Speculative Decoding Strategies : Abstract: Speculative decoding has emerged as an effective method to reduce latency and inference cost of LLM inferences. However, there has been inadequate attention towards the energy requirements o...
- Importance inversion transfer identifies shared principles for cross-domain learning : Abstract: The capacity to transfer knowledge across scientific domains relies on shared organizational principles. However, existing transfer-learning methodologies often fail to bridge radically hete...
- SpinCastML an Open Decision-Making Application for Inverse Design of Electrospinning Manufacturing: A Machine Learning, Optimal Sampling and Inverse Monte Carlo Approach : Abstract: Electrospinning is a powerful technique for producing micro to nanoscale fibers with application specific architectures. Small variations in solution or operating conditions can shift the je...
- Epistemic Throughput: Fundamental Limits of Attention-Constrained Inference : Abstract: Recent generative and tool-using AI systems can surface a large volume of candidates at low marginal cost, yet only a small fraction can be checked carefully. This creates a decoder-side bot...
- Counterfactual Maps: What They Are and How to Find Them : Abstract: Counterfactual explanations are a central tool in interpretable machine learning, yet computing them exactly for complex models remains challenging. For tree ensembles, predictions are piece...
- UniComp: A Unified Evaluation of Large Language Model Compression via Pruning, Quantization and Distillation : Abstract: Model compression is increasingly essential for deploying large language models (LLMs), yet existing evaluations are limited in method coverage and focus primarily on knowledge-centric bench...
- What do Geometric Hallucination Detection Metrics Actually Measure? : Abstract: Hallucination remains a barrier to deploying generative models in high-consequence applications. This is especially true in cases where external ground truth is not readily available to vali...
- Boltzmann Reinforcement Learning for Noise resilience in Analog Ising Machines : Abstract: Analog Ising machines (AIMs) have emerged as a promising paradigm for combinatorial optimization, utilizing physical dynamics to solve Ising problems with high energy efficiency. However, th...
- Faster Rates For Federated Variational Inequalities : Abstract: In this paper, we study federated optimization for solving stochastic variational inequalities (VIs), a problem that has attracted growing attention in recent years. Despite substantial prog...
- Train Less, Infer Faster: Efficient Model Finetuning and Compression via Structured Sparsity : Abstract: Fully finetuning foundation language models (LMs) with billions of parameters is often impractical due to high computational costs, memory requirements, and the risk of overfitting. Although...
- $n$-Musketeers: Reinforcement Learning Shapes Collaboration Among Language Models : Abstract: Recent progress in reinforcement learning with verifiable rewards (RLVR) shows that small, specialized language models (SLMs) can exhibit structured reasoning without relying on large monoli...
- Weighted Wasserstein Barycenter of Gaussian Processes for exotic Bayesian Optimization tasks : Abstract: Exploiting the analogy between Gaussian Distributions and Gaussian Processes' posterior, we present how the weighted Wasserstein Barycenter of Gaussian Processes (W2BGP) can be used to unify...
- Gradient Residual Connections : Abstract: Existing work has linked properties of a function's gradient to the difficulty of function approximation. Motivated by these insights, we study how gradient information can be leveraged to i...
- ML-DCN: Masked Low-Rank Deep Crossing Network Towards Scalable Ads Click-through Rate Prediction at Pinterest : Abstract: Deep learning recommendation systems rely on feature interaction modules to model complex user-item relationships across sparse categorical and dense features. In large-scale ad ranking, inc...
- Fair Feature Importance Scores via Feature Occlusion and Permutation : Abstract: As machine learning models increasingly impact society, their opaque nature poses challenges to trust and accountability, particularly in fairness contexts. Understanding how individual feat...
- CausalGDP: Causality-Guided Diffusion Policies for Reinforcement Learning : Abstract: Reinforcement learning (RL) has achieved remarkable success in a wide range of sequential decision-making problems. Recent diffusion-based policies further improve RL by modeling complex, hi...
- A Lightweight Multi-View Approach to Short-Term Load Forecasting : Abstract: Time series forecasting is a critical task across domains such as energy, finance, and meteorology, where accurate predictions enable informed decision-making. While transformer-based and la...
- Barycentric alignment for instance-level comparison of neural representations : Abstract: Comparing representations across neural networks is challenging because representations admit symmetries, such as arbitrary reordering of units or rotations of activation space, that obscure...
- Beyond the Unit Hypersphere: Embedding Magnitude in Contrastive Learning : Abstract: Cosine similarity is prevalent in contrastive learning, yet it makes an implicit assumption: embedding magnitude is noise. Prior work occasionally found dot product and cosine similarity com...
- Do Neural Networks Lose Plasticity in a Gradually Changing World? : Abstract: Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually...
- RAPID: Risk of Attribute Prediction-Induced Disclosure in Synthetic Microdata : Abstract: Statistical data anonymization increasingly relies on fully synthetic microdata, for which classical identity disclosure measures are less informative than an adversary's ability to infer se...
- Feature salience -- not task-informativeness -- drives machine learning model explanations : Abstract: Explainable AI (XAI) promises to provide insight into machine learning models' decision processes, where one goal is to identify failures such as shortcut learning. This promise relies on th...
- Generalizing GNNs with Tokenized Mixture of Experts : Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inferenc...
- The effect of whitening on explanation performance : Abstract: Explainable Artificial Intelligence (XAI) aims to provide transparent insights into machine learning models, yet the reliability of many feature attribution methods remains a critical challe...
- Measuring Privacy Risks and Tradeoffs in Financial Synthetic Data Generation : Abstract: We explore the privacy-utility tradeoff of synthetic data generation schemes on tabular financial datasets, a domain characterized by high regulatory risk and severe class imbalance. We cons...
- Positive-Unlabelled Active Learning to Curate a Dataset for Orca Resident Interpretation : Abstract: This work presents the largest curation of Southern Resident Killer Whale (SRKW) acoustic data to date, also containing other marine mammals in their environment. We systematically search al...
- The Laplacian Mechanism Improves Transformers by Reshaping Token Geometry : Abstract: Transformers leverage attention, the residual connection, and layer normalization to control the variance of token representations. We propose to modify attention into a Laplacian mechanism ...
- Risk-sensitive reinforcement learning using expectiles, shortfall risk and optimized certainty equivalent risk : Abstract: We propose risk-sensitive reinforcement learning algorithms catering to three families of risk measures, namely expectiles, utility-based shortfall risk and optimized certainty equivalent ri...
- Stabilizing Physics-Informed Consistency Models via Structure-Preserving Training : Abstract: We propose a physics-informed consistency modeling framework for solving partial differential equations (PDEs) via fast, few-step generative inference. We identify a key stability challenge ...
- Statistical Roughness-Informed Machine Unlearning : Abstract: Machine unlearning aims to remove the influence of a designated forget set from a trained model while preserving utility on the retained data. In modern deep networks, approximate unlearning...
- Reward Modeling for Reinforcement Learning-Based LLM Reasoning: Design, Challenges, and Evaluation : Abstract: Large Language Models (LLMs) demonstrate transformative potential, yet their reasoning remains inconsistent and unreliable. Reinforcement learning (RL)-based fine-tuning is a key mechanism f...
- Empowering Contrastive Federated Sequential Recommendation with LLMs : Abstract: Federated sequential recommendation (FedSeqRec) aims to perform next-item prediction while keeping user data decentralised, yet model quality is frequently constrained by fragmented, noisy, ...
- Clarifying Shampoo: Adapting Spectral Descent to Stochasticity and the Parameter Trajectory : Abstract: Optimizers leveraging the matrix structure in neural networks, such as Shampoo and Muon, are more data-efficient than element-wise algorithms like Adam and Signum. While in specific settings...
- Effective MoE-based LLM Compression by Exploiting Heterogeneous Inter-Group Experts Routing Frequency and Information Density : Abstract: Mixture-of-Experts (MoE) based Large Language Models (LLMs) have achieved superior performance, yet the massive memory overhead caused by storing multiple expert networks severely hinders th...
- SnareNet: Flexible Repair Layers for Neural Networks with Hard Constraints : Abstract: Neural networks are increasingly used as surrogate solvers and control policies, but unconstrained predictions can violate physical, operational, or safety requirements. We propose SnareNet,...
- Priority-Aware Shapley Value : Abstract: Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributo...
- In-Hospital Stroke Prediction from PPG-Derived Hemodynamic Features : Abstract: The absence of pre-hospital physiological data in standard clinical datasets fundamentally constrains the early prediction of stroke, as patients typically present only after stroke has occu...
- MacrOData: New Benchmarks of Thousands of Datasets for Tabular Outlier Detection : Abstract: Quality benchmarks are essential for fairly and accurately tracking scientific progress and enabling practitioners to make informed methodological choices. Outlier detection (OD) on tabular ...
- Large Language Models for Designing Participatory Budgeting Rules : Abstract: Participatory budgeting (PB) is a democratic paradigm for deciding the funding of public projects given the residents' preferences, which has been adopted in numerous cities across the world...
- Latent Poincar\'e Shaping for Agentic Reinforcement Learning : Abstract: We propose LaPha, a method for training AlphaZero-like LLM agents in a Poincaré latent space. Under LaPha, the search process can be visualized as a tree rooted at the prompt and growing out...
- Sparse Layer Sharpness-Aware Minimization for Efficient Fine-Tuning : Abstract: Sharpness-aware minimization (SAM) seeks the minima with a flat loss landscape to improve the generalization performance in machine learning tasks, including fine-tuning. However, its extra ...
- Squeezing More from the Stream : Learning Representation Online for Streaming Reinforcement Learning : Abstract: In streaming Reinforcement Learning (RL), transitions are observed and discarded immediately after a single update. While this minimizes resource usage for on-device applications, it makes a...
- Learning with Multiple Correct Answers -- A Trichotomy of Regret Bounds under Different Feedback Models : Abstract: We study an online learning problem with multiple correct answers, where each instance admits a set of valid labels, and in each round the learner must output a valid label for the queried e...
- Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design : Abstract: Discrete diffusion models have recently emerged as a powerful class of generative models for chemistry and biology data. In these fields, the goal is to generate various samples with high re...
- Diffusion-Guided Pretraining for Brain Graph Foundation Models : Abstract: With the growing interest in foundation models for brain signals, graph-based pretraining has emerged as a promising paradigm for learning transferable representations from connectome data. ...
- Taming the Monster Every Context: Complexity Measure and Unified Framework for Offline-Oracle Efficient Contextual Bandits : Abstract: We propose an algorithmic framework, Offline Estimation to Decisions (OE2D), that reduces contextual bandit learning with general reward function approximation to offline regression. The fra...
- Scalable and Reliable State-Aware Inference of High-Impact N-k Contingencies : Abstract: Increasing penetration of inverter-based resources, flexible loads, and rapidly changing operating conditions make higher-order $N\!-\!k$ contingency assessment increasingly important but co...
- Online Learning in MDPs with Partially Adversarial Transitions and Losses : Abstract: We study reinforcement learning in MDPs whose transition function is stochastic at most steps but may behave adversarially at a fixed subset of $Λ$ steps per episode. This model captures env...
- Adaptive recurrent flow map operator learning for reaction diffusion dynamics : Abstract: Reaction-diffusion (RD) equations underpin pattern formation across chemistry, biology, and physics, yet learning stable operators that forecast their long-term dynamics from data remains ch...
- Beware of the Batch Size: Hyperparameter Bias in Evaluating LoRA : Abstract: Low-rank adaptation (LoRA) is a standard approach for fine-tuning large language models, yet its many variants report conflicting empirical gains, often on the same benchmarks. We show that ...
- Computationally Efficient Replicable Learning of Parities : Abstract: We study the computational relationship between replicability (Impagliazzo et al. [STOC `22], Ghazi et al. [NeurIPS `21]) and other stability notions. Specifically, we focus on replicable PA...
- Improved Approximate Regret for Decentralized Online Continuous Submodular Maximization via Reductions : Abstract: To expand the applicability of decentralized online learning, previous studies have proposed several algorithms for decentralized online continuous submodular maximization (D-OCSM) -- a non-...
- Towards Uniformity and Alignment for Multimodal Representation Learning : Abstract: Multimodal representation learning aims to construct a shared embedding space in which heterogeneous modalities are semantically aligned. Despite strong empirical results, InfoNCE-based obje...
- Beyond Student: An Asymmetric Network for Neural Network Inheritance : Abstract: Knowledge Distillation (KD) has emerged as a powerful technique for model compression, enabling lightweight student networks to benefit from the performance of redundant teacher networks. Ho...
- Rashomon Sets and Model Multiplicity in Federated Learning : Abstract: The Rashomon set captures the collection of models that achieve near-identical empirical performance yet may differ substantially in their decision boundaries. Understanding the differences ...
- Learning to Discover Iterative Spectral Algorithms : Abstract: We introduce AutoSpec, a neural network framework for discovering iterative spectral algorithms for large-scale numerical linear algebra and numerical optimization. Our self-supervised model...
- ECG-IMN: Interpretable Mesomorphic Neural Networks for 12-Lead Electrocardiogram Interpretation : Abstract: Deep learning has achieved expert-level performance in automated electrocardiogram (ECG) diagnosis, yet the "black-box" nature of these models hinders their clinical deployment. Trust in med...
- Training deep physical neural networks with local physical information bottleneck : Abstract: Deep learning has revolutionized modern society but faces growing energy and latency constraints. Deep physical neural networks (PNNs) are interconnected computing systems that directly expl...
- Rollout-Training Co-Design for Efficient LLM-Based Multi-Agent Reinforcement Learning : Abstract: Despite algorithm-level innovations for multi-agent reinforcement learning (MARL), the underlying networked infrastructure for large-scale MARL training remains underexplored. Existing train...
- Mitigating the Likelihood Paradox in Flow-based OOD Detection via Entropy Manipulation : Abstract: Deep generative models that can tractably compute input likelihoods, including normalizing flows, often assign unexpectedly high likelihoods to out-of-distribution (OOD) inputs. We mitigate ...
- Why the Counterintuitive Phenomenon of Likelihood Rarely Appears in Tabular Anomaly Detection with Deep Generative Models? : Abstract: Deep generative models with tractable and analytically computable likelihoods, exemplified by normalizing flows, offer an effective basis for anomaly detection through likelihood-based scori...
- LLM-FS: Zero-Shot Feature Selection for Effective and Interpretable Malware Detection : Abstract: Feature selection (FS) remains essential for building accurate and interpretable detection models, particularly in high-dimensional malware datasets. Conventional FS methods such as Extra Tr...
- Blind denoising diffusion models and the blessings of dimensionality : Abstract: We analyze, theoretically and empirically, the performance of generative diffusion models based on \emph{blind denoisers}, in which the denoiser is not given the noise amplitude in either th...
- Enhanced Graph Transformer with Serialized Graph Tokens : Abstract: Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level represe...
- Spectral Disentanglement and Enhancement: A Dual-domain Contrastive Framework for Representation Learning : Abstract: Large-scale multimodal contrastive learning has recently achieved impressive success in learning rich and transferable representations, yet it remains fundamentally limited by the uniform tr...
- Learning to Remember, Learn, and Forget in Attention-Based Models : Abstract: In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated line...
- Patient foundation model for risk stratification in low-risk overweight patients : Abstract: Accurate risk stratification in patients with overweight or obesity is critical for guiding preventive care and allocating high-cost therapies such as GLP-1 receptor agonists. We present Pat...
- Looping Back to Move Forward: Recursive Transformers for Efficient and Flexible Large Multimodal Models : Abstract: Large Multimodal Models (LMMs) have achieved remarkable success in vision-language tasks, yet their vast parameter counts are often underutilized during both training and inference. In this ...
- DMamba: Decomposition-enhanced Mamba for Time Series Forecasting : Abstract: State Space Models (SSMs), particularly Mamba, have shown potential in long-term time series forecasting. However, existing Mamba-based architectures often struggle with datasets characteriz...
Research Sources: 350 | Generated: 2/11/2026
