AI RESEARCH PAPERS & ACADEMIC SOURCES
- OS-Marathon: Benchmarking Computer-Use Agents on Long-Horizon Repetitive Tasks : Abstract: Long-horizon, repetitive workflows are common in professional settings, such as processing expense reports from receipts and entering student grades from exam papers. These tasks are often t...
- MPF-Net: Exposing High-Fidelity AI-Generated Video Forgeries via Hierarchical Manifold Deviation and Micro-Temporal Fluctuations : Abstract: With the rapid advancement of video generation models such as Veo and Wan, the visual quality of synthetic content has reached a level where macro-level semantic errors and temporal inconsis...
- Past- and Future-Informed KV Cache Policy with Salience Estimation in Autoregressive Video Diffusion : Abstract: Video generation is pivotal to digital media creation, and recent advances in autoregressive video generation have markedly enhanced the efficiency of real-time video synthesis. However, exi...
- Beyond Global Alignment: Fine-Grained Motion-Language Retrieval via Pyramidal Shapley-Taylor Learning : Abstract: As a foundational task in human-centric cross-modal intelligence, motion-language retrieval aims to bridge the semantic gap between natural language and human motion, enabling intuitive moti...
- VideoAesBench: Benchmarking the Video Aesthetics Perception Capabilities of Large Multimodal Models : Abstract: Large multimodal models (LMMs) have demonstrated outstanding capabilities in various visual perception tasks, which has in turn made the evaluation of LMMs significant. However, the capabili...
- Deep learning enables urban change profiling through alignment of historical maps : Abstract: Prior to modern Earth observation technologies, historical maps provide a unique record of long-term urban transformation and offer a lens on the evolving identity of cities. However, extrac...
- LoopViT: Scaling Visual ARC with Looped Transformers : Abstract: Recent advances in visual reasoning have leveraged vision transformers to tackle the ARC-AGI benchmark. However, we argue that the feed-forward architecture, where computational depth is str...
- Reg4Pru: Regularisation Through Random Token Routing for Token Pruning : Abstract: Transformers are widely adopted in modern vision models due to their strong ability to scale with dataset size and generalisability. However, this comes with a major drawback: computation sc...
- Lung Nodule Image Synthesis Driven by Two-Stage Generative Adversarial Networks : Abstract: The limited sample size and insufficient diversity of lung nodule CT datasets severely restrict the performance and generalization ability of detection models. Existing methods generate imag...
- CIEC: Coupling Implicit and Explicit Cues for Multimodal Weakly Supervised Manipulation Localization : Abstract: To mitigate the threat of misinformation, multimodal manipulation localization has garnered growing attention. Consider that current methods rely on costly and time-consuming fine-grained an...
- Learning Topology-Aware Implicit Field for Unified Pulmonary Tree Modeling with Incomplete Topological Supervision : Abstract: Pulmonary trees extracted from CT images frequently exhibit topological incompleteness, such as missing or disconnected branches, which substantially degrades downstream anatomical analysis ...
- SSI-DM: Singularity Skipping Inversion of Diffusion Models : Abstract: Inverting real images into the noise space is essential for editing tasks using diffusion models, yet existing methods produce non-Gaussian noise with poor editability due to the inaccuracy ...
- MAIN-VLA: Modeling Abstraction of Intention and eNvironment for Vision-Language-Action Models : Abstract: Despite significant progress in Visual-Language-Action (VLA), in highly complex and dynamic environments that involve real-time unpredictable interactions (such as 3D open worlds and large-s...
- Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation : Abstract: To achieve real-time interactive video generation, current methods distill pretrained bidirectional video diffusion models into few-step autoregressive (AR) models, facing an architectural g...
- LangMap: A Hierarchical Benchmark for Open-Vocabulary Goal Navigation : Abstract: The relationships between objects and language are fundamental to meaningful communication between humans and AI, and to practically useful embodied intelligence. We introduce HieraNav, a mu...
- MIRROR: Manifold Ideal Reference ReconstructOR for Generalizable AI-Generated Image Detection : Abstract: High-fidelity generative models have narrowed the perceptual gap between synthetic and real images, posing serious threats to media security. Most existing AI-generated image (AIGI) detector...
- Evaluating OCR Performance for Assistive Technology: Effects of Walking Speed, Camera Placement, and Camera Type : Abstract: Optical character recognition (OCR), which converts printed or handwritten text into machine-readable form, is widely used in assistive technology for people with blindness and low vision. Y...
- Show, Don't Tell: Morphing Latent Reasoning into Image Generation : Abstract: Text-to-image (T2I) generation has achieved remarkable progress, yet existing methods often lack the ability to dynamically reason and refine during generation--a hallmark of human creativit...
- LiFlow: Flow Matching for 3D LiDAR Scene Completion : Abstract: In autonomous driving scenarios, the collected LiDAR point clouds can be challenged by occlusion and long-range sparsity, limiting the perception of autonomous driving systems. Scene complet...
- Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation : Abstract: Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty v...
- LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization : Abstract: We present LongVPO, a novel two-stage Direct Preference Optimization framework that enables short-context vision-language models to robustly understand ultra-long videos without any long-vid...
- Uncertainty-Aware Image Classification In Biomedical Imaging Using Spectral-normalized Neural Gaussian Processes : Abstract: Accurate histopathologic interpretation is key for clinical decision-making; however, current deep learning models for digital pathology are often overconfident and poorly calibrated in out-...
- Unified Personalized Reward Model for Vision Generation : Abstract: Recent advancements in multimodal reward models (RMs) have significantly propelled the development of visual generation. Existing frameworks typically adopt Bradley-Terry-style preference mo...
- Superman: Unifying Skeleton and Vision for Human Motion Perception and Generation : Abstract: Human motion analysis tasks, such as temporal 3D pose estimation, motion prediction, and motion in-betweening, play an essential role in computer vision. However, current paradigms suffer fr...
- Catalyst: Out-of-Distribution Detection via Elastic Scaling : Abstract: Out-of-distribution (OOD) detection is critical for the safe deployment of deep neural networks. State-of-the-art post-hoc methods typically derive OOD scores from the output logits or penul...
- SelvaMask: Segmenting Trees in Tropical Forests and Beyond : Abstract: Tropical forests harbor most of the planet's tree biodiversity and are critical to global ecological balance. Canopy trees in particular play a disproportionate role in carbon storage and fu...
- Visual Affect Analysis: Predicting Emotions of Image Viewers with Vision-Language Models : Abstract: Vision-language models (VLMs) show promise as tools for inferring affect from visual stimuli at scale; it is not yet clear how closely their outputs align with human affective ratings. We be...
- Toward a Unified Semantic Loss Model for Deep JSCC-based Transmission of EO Imagery : Abstract: Modern Earth Observation (EO) systems increasingly rely on high-resolution imagery to support critical applications such as environmental monitoring, disaster response, and land-use analysis...
- SurfelSoup: Learned Point Cloud Geometry Compression With a Probablistic SurfelTree Representation : Abstract: This paper presents SurfelSoup, an end-to-end learned surface-based framework for point cloud geometry compression, with surface-structured primitives for representation. It proposes a proba...
- A Renderer-Enabled Framework for Computing Parameter Estimation Lower Bounds in Plenoptic Imaging Systems : Abstract: This work focuses on assessing the information-theoretic limits of scene parameter estimation in plenoptic imaging systems. A general framework to compute lower bounds on the parameter estim...
- Advanced Geometric Correction Algorithms for 3D Medical Reconstruction: Comparison of Computed Tomography and Macroscopic Imaging : Abstract: This paper introduces a hybrid two-stage registration framework for reconstructing three-dimensional (3D) kidney anatomy from macroscopic slices, using CT-derived models as the geometric ref...
- Benchmarking Vanilla GAN, DCGAN, and WGAN Architectures for MRI Reconstruction: A Quantitative Analysis : Abstract: Magnetic Resonance Imaging (MRI) is a crucial imaging modality for viewing internal body structures. This research work analyses the performance of popular GAN models for accurate and precis...
- Dual Quaternion SE(3) Synchronization with Recovery Guarantees : Abstract: Synchronization over the special Euclidean group SE(3) aims to recover absolute poses from noisy pairwise relative transformations and is a core primitive in robotics and 3D vision. Standard...
- A 30-item Test for Assessing Chinese Character Amnesia in Child Handwriters : Abstract: Handwriting literacy is an important skill for learning and communication in school-age children. In the digital age, handwriting has been largely replaced by typing, leading to a decline in...
- Recent Advances of End-to-End Video Coding Technologies for AVS Standard Development : Abstract: Video coding standards are essential to enable the interoperability and widespread adoption of efficient video compression technologies. In pursuit of greater video compression efficiency, t...
- APEX: A Decoupled Memory-based Explorer for Asynchronous Aerial Object Goal Navigation : Abstract: Aerial Object Goal Navigation, a challenging frontier in Embodied AI, requires an Unmanned Aerial Vehicle (UAV) agent to autonomously explore, reason, and identify a specific target using on...
- Can Vision-Language Models Handle Long-Context Code? An Empirical Study on Visual Compression : Abstract: Large Language Models (LLMs) struggle with long-context code due to window limitations. Existing textual code compression methods mitigate this via selective filtering but often disrupt depe...
- SyNeT: Synthetic Negatives for Traversability Learning : Abstract: Reliable traversability estimation is crucial for autonomous robots to navigate complex outdoor environments safely. Existing self-supervised learning frameworks primarily rely on positive a...
- KAN We Flow? Advancing Robotic Manipulation with 3D Flow Matching via KAN & RWKV : Abstract: Diffusion-based visuomotor policies excel at modeling action distributions but are inference-inefficient, since recursively denoising from noise to policy requires many steps and heavy UNet ...
- Seeing, Hearing, and Knowing Together: Multimodal Strategies in Deepfake Videos Detection : Abstract: As deepfake videos become increasingly difficult for people to recognise, understanding the strategies humans use is key to designing effective media literacy interventions. We conducted a s...
- A texture-based framework for foundational ultrasound models : Abstract: Ultrasound is the most widely used medical imaging modality, yet the images it produces are fundamentally unique, arising from tissue-dependent scattering, reflection, and speed-of-sound var...
- TreeLoc: 6-DoF LiDAR Global Localization in Forests via Inter-Tree Geometric Matching : Abstract: Reliable localization is crucial for navigation in forests, where GPS is often degraded and LiDAR measurements are repetitive, occluded, and structurally complex. These conditions weaken the...
- UniDWM: Towards a Unified Driving World Model via Multifaceted Representation Learning : Abstract: Achieving reliable and efficient planning in complex driving environments requires a model that can reason over the scene's geometry, appearance, and dynamics. We present UniDWM, a unified d...
- Visible Light Positioning With Lam\'e Curve LEDs: A Generic Approach for Camera Pose Estimation : Abstract: Camera-based visible light positioning (VLP) is a promising technique for accurate and low-cost indoor camera pose estimation (CPE). To reduce the number of required light-emitting diodes (L...
- Hyperspectral Image Fusion with Spectral-Band and Fusion-Scale Agnosticism : Abstract: Current deep learning models for Multispectral and Hyperspectral Image Fusion (MS/HS fusion) are typically designed for fixed spectral bands and spatial scales, which limits their transferab...
- Multi-Task Learning for Robot Perception with Imbalanced Data : Abstract: Multi-task problem solving has been shown to improve the accuracy of the individual tasks, which is an important feature for robots, as they have a limited resource. However, when the number...
- LIEREx: Language-Image Embeddings for Robotic Exploration : Abstract: Semantic maps allow a robot to reason about its surroundings to fulfill tasks such as navigating known environments, finding specific objects, and exploring unmapped areas. Traditional mappi...
- FD-VLA: Force-Distilled Vision-Language-Action Model for Contact-Rich Manipulation : Abstract: Force sensing is a crucial modality for Vision-Language-Action (VLA) frameworks, as it enables fine-grained perception and dexterous manipulation in contact-rich tasks. We present Force-Dist...
- RANKVIDEO: Reasoning Reranking for Text-to-Video Retrieval : Abstract: Reranking is a critical component of modern retrieval systems, which typically pair an efficient first-stage retriever with a more expressive model to refine results. While large reasoning m...
- Towards Artwork Explanation in Large-scale Vision Language Models : Abstract: Large-scale Vision-Language Models (LVLMs) output text from images and instructions, demonstrating capabilities in text generation and comprehension. However, it has not been clarified to wh...
- MCTR: Multi Camera Tracking Transformer : Abstract: Multi-camera tracking plays a pivotal role in various real-world applications. While end-to-end methods have gained significant interest in single-camera tracking, multi-camera tracking rema...
- EgoFSD: Ego-Centric Fully Sparse Paradigm with Uncertainty Denoising and Iterative Refinement for Efficient End-to-End Self-Driving : Abstract: Current End-to-End Autonomous Driving (E2E-AD) methods resort to unifying modular designs for various tasks (e.g. perception, prediction and planning). Although optimized with a fully differ...
- Advances in Photoacoustic Imaging Reconstruction and Quantitative Analysis for Biomedical Applications : Abstract: Photoacoustic imaging (PAI) represents an innovative biomedical imaging modality that harnesses the advantages of optical resolution and acoustic penetration depth while ensuring enhanced sa...
- Edge Weight Prediction For Category-Agnostic Pose Estimation : Abstract: Category-Agnostic Pose Estimation (CAPE) localizes keypoints across diverse object categories with a single model, using one or a few annotated support images. Recent works have shown that u...
- Feat2GS: Probing Visual Foundation Models with Gaussian Splatting : Abstract: Given that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the dif...
- ReasonVQA: A Multi-hop Reasoning Benchmark with Structural Knowledge for Visual Question Answering : Abstract: In this paper, we propose a new dataset, ReasonVQA, for the Visual Question Answering (VQA) task. Our dataset is automatically integrated with structured encyclopedic knowledge and construct...
- Reinforcement Learning Meets Masked Generative Models: Mask-GRPO for Text-to-Image Generation : Abstract: Reinforcement learning (RL) has garnered increasing attention in text-to-image (T2I) generation. However, most existing RL approaches are tailored to either diffusion models or autoregressiv...
- UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy : Abstract: Computational replication of Chinese calligraphy remains challenging. Existing methods falter, either creating high-quality isolated characters while ignoring page-level aesthetics like liga...
- PISA: Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers : Abstract: Diffusion Transformers are fundamental for video and image generation, but their efficiency is bottlenecked by the quadratic complexity of attention. While block sparse attention accelerates...
- MedAD-R1: Eliciting Consistent Reasoning in Interpretible Medical Anomaly Detection via Consistency-Reinforced Policy Optimization : Abstract: Medical Anomaly Detection (MedAD) presents a significant opportunity to enhance diagnostic accuracy using Large Multimodal Models (LMMs) to interpret and answer questions based on medical im...
- Differential Vector Erasure: Unified Training-Free Concept Erasure for Flow Matching Models : Abstract: Text-to-image diffusion models have demonstrated remarkable capabilities in generating high-quality images, yet their tendency to reproduce undesirable concepts, such as NSFW content, copyri...
- PandaPose: 3D Human Pose Lifting from a Single Image via Propagating 2D Pose Prior to 3D Anchor Space : Abstract: 3D human pose lifting from a single RGB image is a challenging task in 3D vision. Existing methods typically establish a direct joint-to-joint mapping from 2D to 3D poses based on 2D feature...
- Robust Harmful Meme Detection under Missing Modalities via Shared Representation Learning : Abstract: Internet memes are powerful tools for communication, capable of spreading political, psychological, and sociocultural ideas. However, they can be harmful and can be used to disseminate hate ...
- LightCity: An Urban Dataset for Outdoor Inverse Rendering and Reconstruction under Multi-illumination Conditions : Abstract: Inverse rendering in urban scenes is pivotal for applications like autonomous driving and digital twins. Yet, it faces significant challenges due to complex illumination conditions, includin...
- Koo-Fu CLIP: Closed-Form Adaptation of Vision-Language Models via Fukunaga-Koontz Linear Discriminant Analysis : Abstract: Visual-language models such as CLIP provide powerful general-purpose representations, but their raw embeddings are not optimized for supervised classification, often exhibiting limited class...
- Improving Robustness of Vision-Language-Action Models by Restoring Corrupted Visual Inputs : Abstract: Vision-Language-Action (VLA) models have emerged as a dominant paradigm for generalist robotic manipulation, unifying perception and control within a single end-to-end architecture. However,...
- EEmo-Logic: A Unified Dataset and Multi-Stage Framework for Comprehensive Image-Evoked Emotion Assessment : Abstract: Understanding the multi-dimensional attributes and intensity nuances of image-evoked emotions is pivotal for advancing machine empathy and empowering diverse human-computer interaction appli...
- EMFormer: Efficient Multi-Scale Transformer for Accumulative Context Weather Forecasting : Abstract: Long-term weather forecasting is critical for socioeconomic planning and disaster preparedness. While recent approaches employ finetuning to extend prediction horizons, they remain constrain...
- Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis : Abstract: Developing 3D vision-language models with robust clinical reasoning remains a challenge due to the inherent complexity of volumetric medical imaging, the tendency of models to overfit superf...
- Boosting Point-supervised Temporal Action Localization via Text Refinement and Alignment : Abstract: Recently, point-supervised temporal action localization has gained significant attention for its effective balance between labeling costs and localization accuracy. However, current methods ...
- OASIS-DC: Generalizable Depth Completion via Output-level Alignment of Sparse-Integrated Monocular Pseudo Depth : Abstract: Recent monocular foundation models excel at zero-shot depth estimation, yet their outputs are inherently relative rather than metric, limiting direct use in robotics and autonomous driving. ...
- Q-DiT4SR: Exploration of Detail-Preserving Diffusion Transformer Quantization for Real-World Image Super-Resolution : Abstract: Recently, Diffusion Transformers (DiTs) have emerged in Real-World Image Super-Resolution (Real-ISR) to generate high-quality textures, yet their heavy inference burden hinders real-world de...
- TF-Lane: Traffic Flow Module for Robust Lane Perception : Abstract: Autonomous driving systems require robust lane perception capabilities, yet existing vision-based detection methods suffer significant performance degradation when visual sensors provide ins...
- DSFC-Net: A Dual-Encoder Spatial and Frequency Co-Awareness Network for Rural Road Extraction : Abstract: Accurate extraction of rural roads from high-resolution remote sensing imagery is essential for infrastructure planning and sustainable development. However, this task presents unique challe...
- Who Transfers Safety? Identifying and Targeting Cross-Lingual Shared Safety Neurons : Abstract: Multilingual safety remains significantly imbalanced, leaving non-high-resource (NHR) languages vulnerable compared to robust high-resource (HR) ones. Moreover, the neural mechanisms driving...
- Interacted Planes Reveal 3D Line Mapping : Abstract: 3D line mapping from multi-view RGB images provides a compact and structured visual representation of scenes. We study the problem from a physical and topological perspective: a 3D line most...
- Interaction-Consistent Object Removal via MLLM-Based Reasoning : Abstract: Image-based object removal often erases only the named target, leaving behind interaction evidence that renders the result semantically inconsistent. We formalize this problem as Interaction...
- ReDiStory: Region-Disentangled Diffusion for Consistent Visual Story Generation : Abstract: Generating coherent visual stories requires maintaining subject identity across multiple images while preserving frame-specific semantics. Recent training-free methods concatenate identity a...
- StoryState: Agent-Based State Control for Consistent and Editable Storybooks : Abstract: Large multimodal models have enabled one-click storybook generation, where users provide a short description and receive a multi-page illustrated story. However, the underlying story state, ...
- DeCorStory: Gram-Schmidt Prompt Embedding Decorrelation for Consistent Storytelling : Abstract: Maintaining visual and semantic consistency across frames is a key challenge in text-to-image storytelling. Existing training-free methods, such as One-Prompt-One-Story, concatenate all prom...
- FlowCast: Trajectory Forecasting for Scalable Zero-Cost Speculative Flow Matching : Abstract: Flow Matching (FM) has recently emerged as a powerful approach for high-quality visual generation. However, their prohibitively slow inference due to a large number of denoising steps limits...
- What Does Vision Tool-Use Reinforcement Learning Really Learn? Disentangling Tool-Induced and Intrinsic Effects for Crop-and-Zoom : Abstract: Vision tool-use reinforcement learning (RL) can equip vision-language models with visual operators such as crop-and-zoom and achieves strong performance gains, yet it remains unclear whether...
- MTC-VAE: Multi-Level Temporal Compression with Content Awareness : Abstract: Latent Video Diffusion Models (LVDMs) rely on Variational Autoencoders (VAEs) to compress videos into compact latent representations. For continuous Variational Autoencoders (VAEs), achievin...
- Adaptive Visual Autoregressive Acceleration via Dual-Linkage Entropy Analysis : Abstract: Visual AutoRegressive modeling (VAR) suffers from substantial computational cost due to the massive token count involved. Failing to account for the continuous evolution of modeling dynamics...
- T2M Mamba: Motion Periodicity-Saliency Coupling Approach for Stable Text-Driven Motion Generation : Abstract: Text-to-motion generation, which converts motion language descriptions into coherent 3D human motion sequences, has attracted increasing attention in fields, such as avatar animation and hum...
- Exposing and Defending the Achilles' Heel of Video Mixture-of-Experts : Abstract: Mixture-of-Experts (MoE) has demonstrated strong performance in video understanding tasks, yet its adversarial robustness remains underexplored. Existing attack methods often treat MoE as a ...
- Stronger Semantic Encoders Can Harm Relighting Performance: Probing Visual Priors via Augmented Latent Intrinsics : Abstract: Image-to-image relighting requires representations that disentangle scene properties from illumination. Recent methods rely on latent intrinsic representations but remain under-constrained a...
- BioTamperNet: Affinity-Guided State-Space Model Detecting Tampered Biomedical Images : Abstract: We propose BioTamperNet, a novel framework for detecting duplicated regions in tampered biomedical images, leveraging affinity-guided attention inspired by State Space Model (SSM) approximat...
- Preserving Localized Patch Semantics in VLMs : Abstract: Logit Lens has been proposed for visualizing tokens that contribute most to LLM answers. Recently, Logit Lens was also shown to be applicable in autoregressive Vision-Language Models (VLMs),...
- FSCA-Net: Feature-Separated Cross-Attention Network for Robust Multi-Dataset Training : Abstract: Crowd counting plays a vital role in public safety, traffic regulation, and smart city management. However, despite the impressive progress achieved by CNN- and Transformer-based models, the...
- Combined Flicker-banding and Moire Removal for Screen-Captured Images : Abstract: Capturing display screens with mobile devices has become increasingly common, yet the resulting images often suffer from severe degradations caused by the coexistence of moiré patterns and f...
- One-Step Diffusion for Perceptual Image Compression : Abstract: Diffusion-based image compression methods have achieved notable progress, delivering high perceptual quality at low bitrates. However, their practical deployment is hindered by significant i...
- SGHA-Attack: Semantic-Guided Hierarchical Alignment for Transferable Targeted Attacks on Vision-Language Models : Abstract: Large vision-language models (VLMs) are vulnerable to transfer-based adversarial perturbations, enabling attackers to optimize on surrogate models and manipulate black-box VLM outputs. Prior...
- HandMCM: Multi-modal Point Cloud-based Correspondence State Space Model for 3D Hand Pose Estimation : Abstract: 3D hand pose estimation that involves accurate estimation of 3D human hand keypoint locations is crucial for many human-computer interaction applications such as augmented reality. However, ...
- Know Your Step: Faster and Better Alignment for Flow Matching Models via Step-aware Advantages : Abstract: Recent advances in flow matching models, particularly with reinforcement learning (RL), have significantly enhanced human preference alignment in few step text to image generators. However, ...
- Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework : Abstract: Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Tra...
- UV-M3TL: A Unified and Versatile Multimodal Multi-Task Learning Framework for Assistive Driving Perception : Abstract: Advanced Driver Assistance Systems (ADAS) need to understand human driver behavior while perceiving their navigation context, but jointly learning these heterogeneous tasks would cause inter...
- Token Pruning for In-Context Generation in Diffusion Transformers : Abstract: In-context generation significantly enhances Diffusion Transformers (DiTs) by enabling controllable image-to-image generation through reference examples. However, the resulting input concate...
- Omni-Judge: Can Omni-LLMs Serve as Human-Aligned Judges for Text-Conditioned Audio-Video Generation? : Abstract: State-of-the-art text-to-video generation models such as Sora 2 and Veo 3 can now produce high-fidelity videos with synchronized audio directly from a textual prompt, marking a new milestone...
- PISCES: Annotation-free Text-to-Video Post-Training via Optimal Transport-Aligned Rewards : Abstract: Text-to-video (T2V) generation aims to synthesize videos with high visual quality and temporal consistency that are semantically aligned with input text. Reward-based post-training has emerg...
- Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks : Abstract: World models have emerged as a critical frontier in AI research, aiming to enhance large models by infusing them with physical dynamics and world knowledge. The core objective is to enable a...
- Federated Vision Transformer with Adaptive Focal Loss for Medical Image Classification : Abstract: While deep learning models like Vision Transformer (ViT) have achieved significant advances, they typically require large datasets. With data privacy regulations, access to many original dat...
- ReCALL: Recalibrating Capability Degradation for MLLM-based Composed Image Retrieval : Abstract: Composed Image Retrieval (CIR) aims to retrieve target images based on a hybrid query comprising a reference image and a modification text. Early dual-tower Vision-Language Models (VLMs) str...
- From Frames to Sequences: Temporally Consistent Human-Centric Dense Prediction : Abstract: In this work, we focus on the challenge of temporally consistent human-centric dense prediction across video sequences. Existing models achieve strong per-frame accuracy but often flicker un...
- Moonworks Lunara Aesthetic II: An Image Variation Dataset : Abstract: We introduce Lunara Aesthetic II, a publicly released, ethically sourced image dataset designed to support controlled evaluation and learning of contextual consistency in modern image genera...
- VRGaussianAvatar: Integrating 3D Gaussian Avatars into VR : Abstract: We present VRGaussianAvatar, an integrated system that enables real-time full-body 3D Gaussian Splatting (3DGS) avatars in virtual reality using only head-mounted display (HMD) tracking sign...
- SMTrack: State-Aware Mamba for Efficient Temporal Modeling in Visual Tracking : Abstract: Visual tracking aims to automatically estimate the state of a target object in a video sequence, which is challenging especially in dynamic scenarios. Thus, numerous methods are proposed to ...
- FastPhysGS: Accelerating Physics-based Dynamic 3DGS Simulation via Interior Completion and Adaptive Optimization : Abstract: Extending 3D Gaussian Splatting (3DGS) to 4D physical simulation remains challenging. Based on the Material Point Method (MPM), existing methods either rely on manual parameter tuning or dis...
- DenVisCoM: Dense Vision Correspondence Mamba for Efficient and Real-time Optical Flow and Stereo Estimation : Abstract: In this work, we propose a novel Mamba block DenVisCoM, as well as a novel hybrid architecture specifically tailored for accurate and real-time estimation of optical flow and disparity estim...
- Simplicity Prevails: The Emergence of Generalizable AIGI Detection in Visual Foundation Models : Abstract: While specialized detectors for AI-Generated Images (AIGI) achieve near-perfect accuracy on curated benchmarks, they suffer from a dramatic performance collapse in realistic, in-the-wild sce...
- Tail-Aware Post-Training Quantization for 3D Geometry Models : Abstract: The burgeoning complexity and scale of 3D geometry models pose significant challenges for deployment on resource-constrained platforms. While Post-Training Quantization (PTQ) enables efficie...
- ObjEmbed: Towards Universal Multimodal Object Embeddings : Abstract: Aligning objects with corresponding textual descriptions is a fundamental challenge and a realistic requirement in vision-language understanding. While recent multimodal embedding models exc...
- Spot-Wise Smart Parking: An Edge-Enabled Architecture with YOLOv11 and Digital Twin Integration : Abstract: Smart parking systems help reduce congestion and minimize users' search time, thereby contributing to smart city adoption and enhancing urban mobility. In previous works, we presented a syst...
- Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation : Abstract: While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fai...
- MagicFuse: Single Image Fusion for Visual and Semantic Reinforcement : Abstract: This paper focuses on a highly practical scenario: how to continue benefiting from the advantages of multi-modal image fusion under harsh conditions when only visible imaging sensors are ava...
- GDPR-Compliant Person Recognition in Industrial Environments Using MEMS-LiDAR and Hybrid Data : Abstract: The reliable detection of unauthorized individuals in safety-critical industrial indoor spaces is crucial to avoid plant shutdowns, property damage, and personal hazards. Conventional vision...
- DDP-WM: Disentangled Dynamics Prediction for Efficient World Models : Abstract: World models are essential for autonomous robotic planning. However, the substantial computational overhead of existing dense Transformerbased models significantly hinders real-time deployme...
- Automated Discontinuity Set Characterisation in Enclosed Rock Face Point Clouds Using Single-Shot Filtering and Cyclic Orientation Transformation : Abstract: Characterisation of structural discontinuity sets in exposed rock faces of underground mine cavities is essential for assessing rock-mass stability, excavation safety, and operational effici...
- FlowBypass: Rectified Flow Trajectory Bypass for Training-Free Image Editing : Abstract: Training-free image editing has attracted increasing attention for its efficiency and independence from training data. However, existing approaches predominantly rely on inversion-reconstruc...
- LDRNet: Large Deformation Registration Model for Chest CT Registration : Abstract: Most of the deep learning based medical image registration algorithms focus on brain image registration tasks.Compared with brain registration, the chest CT registration has larger deformati...
- GPD: Guided Progressive Distillation for Fast and High-Quality Video Generation : Abstract: Diffusion models have achieved remarkable success in video generation; however, the high computational cost of the denoising process remains a major bottleneck. Existing approaches have show...
- Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies : Abstract: Multimodal Large Language Models (MLLMs) have shown remarkable proficiency on general-purpose vision-language benchmarks, reaching or even exceeding human-level performance. However, these e...
- Efficient Cross-Country Data Acquisition Strategy for ADAS via Street-View Imagery : Abstract: Deploying ADAS and ADS across countries remains challenging due to differences in legislation, traffic infrastructure, and visual conventions, which introduce domain shifts that degrade perc...
- SPIRIT: Adapting Vision Foundation Models for Unified Single- and Multi-Frame Infrared Small Target Detection : Abstract: Infrared small target detection (IRSTD) is crucial for surveillance and early-warning, with deployments spanning both single-frame analysis and video-mode tracking. A practical solution shou...
- WS-IMUBench: Can Weakly Supervised Methods from Audio, Image, and Video Be Adapted for IMU-based Temporal Action Localization? : Abstract: IMU-based Human Activity Recognition (HAR) has enabled a wide range of ubiquitous computing applications, yet its dominant clip classification paradigm cannot capture the rich temporal struc...
- How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing : Abstract: Recent generative models have achieved remarkable progress in image editing. However, existing systems and benchmarks remain largely text-guided. In contrast, human communication is inherent...
- Fact or Fake? Assessing the Role of Deepfake Detectors in Multimodal Misinformation Detection : Abstract: In multimodal misinformation, deception usually arises not just from pixel-level manipulations in an image, but from the semantic and contextual claim jointly expressed by the image-text pai...
- Trust but Verify: Adaptive Conditioning for Reference-Based Diffusion Super-Resolution via Implicit Reference Correlation Modeling : Abstract: Recent works have explored reference-based super-resolution (RefSR) to mitigate hallucinations in diffusion-based image restoration. A key challenge is that real-world degradations make corr...
- ProxyImg: Towards Highly-Controllable Image Representation via Hierarchical Disentangled Proxy Embedding : Abstract: Prevailing image representation methods, including explicit representations such as raster images and Gaussian primitives, as well as implicit representations such as latent images, either s...
- Q Cache: Visual Attention is Valuable in Less than Half of Decode Layers for Multimodal Large Language Model : Abstract: Multimodal large language models (MLLMs) are plagued by exorbitant inference costs attributable to the profusion of visual tokens within the vision encoder. The redundant visual tokens engen...
- Enabling Progressive Whole-slide Image Analysis with Multi-scale Pyramidal Network : Abstract: Multiple-instance Learning (MIL) is commonly used to undertake computational pathology (CPath) tasks, and the use of multi-scale patches allows diverse features across scales to be learned. ...
- Beyond Open Vocabulary: Multimodal Prompting for Object Detection in Remote Sensing Images : Abstract: Open-vocabulary object detection in remote sensing commonly relies on text-only prompting to specify target categories, implicitly assuming that inference-time category queries can be reliab...
- Enhancing Multi-Image Understanding through Delimiter Token Scaling : Abstract: Large Vision-Language Models (LVLMs) achieve strong performance on single-image tasks, but their performance declines when multiple images are provided as input. One major reason is the cros...
- Leveraging Latent Vector Prediction for Localized Control in Image Generation via Diffusion Models : Abstract: Diffusion models emerged as a leading approach in text-to-image generation, producing high-quality images from textual descriptions. However, attempting to achieve detailed control to get a ...
- UniDriveDreamer: A Single-Stage Multimodal World Model for Autonomous Driving : Abstract: World models have demonstrated significant promise for data synthesis in autonomous driving. However, existing methods predominantly concentrate on single-modality generation, typically focu...
- UrbanGS: A Scalable and Efficient Architecture for Geometrically Accurate Large-Scene Reconstruction : Abstract: While 3D Gaussian Splatting (3DGS) enables high-quality, real-time rendering for bounded scenes, its extension to large-scale urban environments gives rise to critical challenges in terms of...
- FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space : Abstract: We introduce FSVideo, a fast speed transformer-based image-to-video (I2V) diffusion framework. We build our framework on the following key components: 1.) a new video autoencoder with highly...
- Teacher-Guided Student Self-Knowledge Distillation Using Diffusion Model : Abstract: Existing Knowledge Distillation (KD) methods often align feature information between teacher and student by exploring meaningful feature processing and loss functions. However, due to the di...
- MLV-Edit: Towards Consistent and Highly Efficient Editing for Minute-Level Videos : Abstract: We propose MLV-Edit, a training-free, flow-based framework that address the unique challenges of minute-level video editing. While existing techniques excel in short-form video manipulation,...
- Eliminating Registration Bias in Synthetic CT Generation: A Physics-Based Simulation Framework : Abstract: Supervised synthetic CT generation from CBCT requires registered training pairs, yet perfect registration between separately acquired scans remains unattainable. This registration bias propa...
- Efficient UAV trajectory prediction: A multi-modal deep diffusion framework : Abstract: To meet the requirements for managing unauthorized UAVs in the low-altitude economy, a multi-modal UAV trajectory prediction method based on the fusion of LiDAR and millimeter-wave radar inf...
- Robustness of Presentation Attack Detection in Remote Identity Validation Scenarios : Abstract: Presentation attack detection (PAD) subsystems are an important part of effective and user-friendly remote identity validation (RIV) systems. However, ensuring robust performance across dive...
- From Manual Observation to Automated Monitoring: Space Allowance Effects on Play Behaviour in Group-Housed Dairy Calves : Abstract: Play behaviour serves as a positive welfare indicator in dairy calves, yet the influence of space allowance under commercial conditions remains poorly characterized, particularly at intermed...
- AI-Driven Three-Dimensional Reconstruction and Quantitative Analysis for Burn Injury Assessment : Abstract: Accurate, reproducible burn assessment is critical for treatment planning, healing monitoring, and medico-legal documentation, yet conventional visual inspection and 2D photography are subje...
- Context-Aware Autoencoders for Anomaly Detection in Maritime Surveillance : Abstract: The detection of anomalies is crucial to ensuring the safety and security of maritime vessel traffic surveillance. Although autoencoders are popular for anomaly detection, their effectivenes...
- D3R-Net: Dual-Domain Denoising Reconstruction Network for Robust Industrial Anomaly Detection : Abstract: Unsupervised anomaly detection (UAD) is a key ingredient of automated visual inspection in modern manufacturing. The reconstruction-based methods appeal because they have basic architectural...
- PovNet+: A Deep Learning Architecture for Socially Assistive Robots to Learn and Assist with Multiple Activities of Daily Living : Abstract: A significant barrier to the long-term deployment of autonomous socially assistive robots is their inability to both perceive and assist with multiple activities of daily living (ADLs). In t...
- Shedding the Facades, Connecting the Domains: Detecting Shifting Multimodal Hate Video with Test-Time Adaptation : Abstract: Hate Video Detection (HVD) is crucial for online ecosystems. Existing methods assume identical distributions between training (source) and inference (target) data. However, hateful content o...
- LLaVA-FA: Learning Fourier Approximation for Compressing Large Multimodal Models : Abstract: Large multimodal models (LMMs) have achieved impressive performance on various vision-language tasks, but their substantial computational and memory costs hinder their practical deployment. ...
- DensiThAI, A Multi-View Deep Learning Framework for Breast Density Estimation using Infrared Images : Abstract: Breast tissue density is a key biomarker of breast cancer risk and a major factor affecting mammographic sensitivity. However, density assessment currently relies almost exclusively on X-ray...
- SDCM: Simulated Densifying and Compensatory Modeling Fusion for Radar-Vision 3-D Object Detection in Internet of Vehicles : Abstract: 3-D object detection based on 4-D radar-vision is an important part in Internet of Vehicles (IoV). However, there are two challenges which need to be faced. First, the 4-D radar point clouds...
- Deep Learning Pose Estimation for Multi-Label Recognition of Combined Hyperkinetic Movement Disorders : Abstract: Hyperkinetic movement disorders (HMDs) such as dystonia, tremor, chorea, myoclonus, and tics are disabling motor manifestations across childhood and adulthood. Their fluctuating, intermitten...
- YOLOE-26: Integrating YOLO26 with YOLOE for Real-Time Open-Vocabulary Instance Segmentation : Abstract: This paper presents YOLOE-26, a unified framework that integrates the deployment-optimized YOLO26(or YOLOv26) architecture with the open-vocabulary learning paradigm of YOLOE for real-time o...
- Intra-Class Subdivision for Pixel Contrastive Learning: Application to Semi-supervised Cardiac Image Segmentation : Abstract: We propose an intra-class subdivision pixel contrastive learning (SPCL) framework for cardiac image segmentation to address representation contamination at boundaries. The novel concept ``Un...
- Deep Learning Based CNN Model for Automated Detection of Pneumonia from Chest XRay Images : Abstract: Pneumonia has been one of the major causes of morbidities and mortality in the world and the prevalence of this disease is disproportionately high among the pediatric and elderly populations...
- Development of a Cacao Disease Identification and Management App Using Deep Learning : Abstract: Smallholder cacao producers often rely on outdated farming techniques and face significant challenges from pests and diseases, unlike larger plantations with more resources and expertise. In...
- World-Shaper: A Unified Framework for 360{\deg} Panoramic Editing : Abstract: Being able to edit panoramic images is crucial for creating realistic 360° visual experiences. However, existing perspective-based image editing methods fail to model the spatial structure o...
- Computer Vision and Its Relationship to Cognitive Science: A perspective from Bayes Decision Theory : Abstract: This document presents an introduction to computer vision, and its relationship to Cognitive Science, from the perspective of Bayes Decision Theory (Berger 1985). Computer vision is a vast a...
- On the Assessment of Sensitivity of Autonomous Vehicle Perception : Abstract: The viability of automated driving is heavily dependent on the performance of perception systems to provide real-time accurate and reliable information for robust decision-making and maneuve...
- MASC: Metal-Aware Sampling and Correction via Reinforcement Learning for Accelerated MRI : Abstract: Metal implants in MRI cause severe artifacts that degrade image quality and hinder clinical diagnosis. Traditional approaches address metal artifact reduction (MAR) and accelerated MRI acqui...
- ReLAPSe: Reinforcement-Learning-trained Adversarial Prompt Search for Erased concepts in unlearned diffusion models : Abstract: Machine unlearning is a key defense mechanism for removing unauthorized concepts from text-to-image diffusion models, yet recent evidence shows that latent visual information often persists ...
- Deep Learning-Based Object Detection for Autonomous Vehicles: A Comparative Study of One-Stage and Two-Stage Detectors on Basic Traffic Objects : Abstract: Object detection is a crucial component in autonomous vehicle systems. It enables the vehicle to perceive and understand its environment by identifying and locating various objects around it...
- Robust automatic brain vessel segmentation in 3D CTA scans using dynamic 4D-CTA data : Abstract: In this study, we develop a novel methodology for annotating the brain vasculature using dynamic 4D-CTA head scans. By using multiple time points from dynamic CTA acquisitions, we subtract b...
- Modeling Art Evaluations from Comparative Judgments: A Deep Learning Approach to Predicting Aesthetic Preferences : Abstract: Modeling human aesthetic judgments in visual art presents significant challenges due to individual preference variability and the high cost of obtaining labeled data. To reduce cost of acqui...
- Model Optimization for Multi-Camera 3D Detection and Tracking : Abstract: Outside-in multi-camera perception is increasingly important in indoor environments, where networks of static cameras must support multi-target tracking under occlusion and heterogeneous vie...
- PSGS: Text-driven Panorama Sliding Scene Generation via Gaussian Splatting : Abstract: Generating realistic 3D scenes from text is crucial for immersive applications like VR, AR, and gaming. While text-driven approaches promise efficiency, existing methods suffer from limited ...
- ZS-TreeSeg: A Zero-Shot Framework for Tree Crown Instance Segmentation : Abstract: Individual tree crown segmentation is an important task in remote sensing for forest biomass estimation and ecological monitoring. However, accurate delineation in dense, overlapping canopie...
- GTATrack: Winner Solution to SoccerTrack 2025 with Deep-EIoU and Global Tracklet Association : Abstract: Multi-object tracking (MOT) in sports is highly challenging due to irregular player motion, uniform appearances, and frequent occlusions. These difficulties are further exacerbated by the ge...
- Refining Strokes by Learning Offset Attributes between Strokes for Flexible Sketch Edit at Stroke-Level : Abstract: Sketch edit at stroke-level aims to transplant source strokes onto a target sketch via stroke expansion or replacement, while preserving semantic consistency and visual fidelity with the tar...
- HSSDCT: Factorized Spatial-Spectral Correlation for Hyperspectral Image Fusion : Abstract: Hyperspectral image (HSI) fusion aims to reconstruct a high-resolution HSI (HR-HSI) by combining the rich spectral information of a low-resolution HSI (LR-HSI) with the fine spatial details ...
- RGBX-R1: Visual Modality Chain-of-Thought Guided Reinforcement Learning for Multimodal Grounding : Abstract: Multimodal Large Language Models (MLLM) are primarily pre-trained on the RGB modality, thereby limiting their performance on other modalities, such as infrared, depth, and event data, which ...
- DuoGen: Towards General Purpose Interleaved Multimodal Generation : Abstract: Interleaved multimodal generation enables capabilities beyond unimodal generation models, such as step-by-step instructional guides, visual planning, and generating visual drafts for reasoni...
- SPARK: Stochastic Propagation via Affinity-guided Random walK for training-free unsupervised segmentation : Abstract: We argue that existing training-free segmentation methods rely on an implicit and limiting assumption, that segmentation is a spectral graph partitioning problem over diffusion-derived affin...
- MRAD: Zero-Shot Anomaly Detection with Memory-Driven Retrieval : Abstract: Zero-shot anomaly detection (ZSAD) often leverages pretrained vision or vision-language models, but many existing methods use prompt learning or complex modeling to fit the data distribution...
- SAGE: Accelerating Vision-Language Models via Entropy-Guided Adaptive Speculative Decoding : Abstract: Speculative decoding has emerged as a promising approach to accelerate inference in vision-language models (VLMs) by enabling parallel verification of multiple draft tokens. However, existin...
- Enhancing Open-Vocabulary Object Detection through Multi-Level Fine-Grained Visual-Language Alignment : Abstract: Traditional object detection systems are typically constrained to predefined categories, limiting their applicability in dynamic environments. In contrast, open-vocabulary object detection (...
- SADER: Structure-Aware Diffusion Framework with DEterministic Resampling for Multi-Temporal Remote Sensing Cloud Removal : Abstract: Cloud contamination severely degrades the usability of remote sensing imagery and poses a fundamental challenge for downstream Earth observation tasks. Recently, diffusion-based models have ...
- GLAD: Generative Language-Assisted Visual Tracking for Low-Semantic Templates : Abstract: Vision-language tracking has gained increasing attention in many scenarios. This task simultaneously deals with visual and linguistic information to localize objects in videos. Despite its g...
- Bridging Degradation Discrimination and Generation for Universal Image Restoration : Abstract: Universal image restoration is a critical task in low-level vision, requiring the model to remove various degradations from low-quality images to produce clean images with rich detail. The c...
- Tune-Your-Style: Intensity-tunable 3D Style Transfer with Gaussian Splatting : Abstract: 3D style transfer refers to the artistic stylization of 3D assets based on reference style images. Recently, 3DGS-based stylization methods have drawn considerable attention, primarily due t...
- Towards Interpretable Hallucination Analysis and Mitigation in LVLMs via Contrastive Neuron Steering : Abstract: LVLMs achieve remarkable multimodal understanding and generation but remain susceptible to hallucinations. Existing mitigation methods predominantly focus on output-level adjustments, leavin...
- FaceSnap: Enhanced ID-fidelity Network for Tuning-free Portrait Customization : Abstract: Benefiting from the significant advancements in text-to-image diffusion models, research in personalized image generation, particularly customized portrait generation, has also made great st...
- VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning : Abstract: Scene understanding and reasoning has been a fundamental problem in 3D computer vision, requiring models to identify objects, their properties, and spatial or comparative relationships among...
- Diff-PC: Identity-preserving and 3D-aware Controllable Diffusion for Zero-shot Portrait Customization : Abstract: Portrait customization (PC) has recently garnered significant attention due to its potential applications. However, existing PC methods lack precise identity (ID) preservation and face contr...
- A Hybrid Mamba-SAM Architecture for Efficient 3D Medical Image Segmentation : Abstract: Accurate segmentation of 3D medical images such as MRI and CT is essential for clinical diagnosis and treatment planning. Foundation models like the Segment Anything Model (SAM) provide powe...
- Schr\"odinger-Inspired Time-Evolution for 4D Deformation Forecasting : Abstract: Spatiotemporal forecasting of complex three-dimensional phenomena (4D: 3D + time) is fundamental to applications in medical imaging, fluid and material dynamics, and geophysics. In contrast ...
- HPC: Hierarchical Point-based Latent Representation for Streaming Dynamic Gaussian Splatting Compression : Abstract: While dynamic Gaussian Splatting has driven significant advances in free-viewpoint video, maintaining its rendering quality with a small memory footprint for efficient streaming transmission...
- Video Understanding: Through A Temporal Lens : Abstract: This thesis explores the central question of how to leverage temporal relations among video elements to advance video understanding. Addressing the limitations of existing methods, the work ...
- V2X-DSC: Multi-Agent Collaborative Perception with Distributed Source Coding Guided Communication : Abstract: Collaborative perception improves 3D understanding by fusing multi-agent observations, yet intermediate-feature sharing faces strict bandwidth constraints as dense BEV features saturate V2X ...
- JoyAvatar: Unlocking Highly Expressive Avatars via Harmonized Text-Audio Conditioning : Abstract: Existing video avatar models have demonstrated impressive capabilities in scenarios such as talking, public speaking, and singing. However, the majority of these methods exhibit limited alig...
- StomataSeg: Semi-Supervised Instance Segmentation for Sorghum Stomatal Components : Abstract: Sorghum is a globally important cereal grown widely in water-limited and stress-prone regions. Its strong drought tolerance makes it a priority crop for climate-resilient agriculture. Improv...
- Supervised makeup transfer with a curated dataset: Decoupling identity and makeup features for enhanced transformation : Abstract: Diffusion models have recently shown strong progress in generative tasks, offering a more stable alternative to GAN-based approaches for makeup transfer. Existing methods often suffer from l...
- Diffusion-Driven Inter-Outer Surface Separation for Point Clouds with Open Boundaries : Abstract: We propose a diffusion-based algorithm for separating the inter and outer layer surfaces from double-layered point clouds, particularly those exhibiting the "double surface artifact" caused ...
- HSI-VAR: Rethinking Hyperspectral Restoration through Spatial-Spectral Visual Autoregression : Abstract: Hyperspectral images (HSIs) capture richer spatial-spectral information beyond RGB, yet real-world HSIs often suffer from a composite mix of degradations, such as noise, blur, and missing ba...
- DVLA-RL: Dual-Level Vision-Language Alignment with Reinforcement Learning Gating for Few-Shot Learning : Abstract: Few-shot learning (FSL) aims to generalize to novel categories with only a few samples. Recent approaches incorporate large language models (LLMs) to enrich visual representations with seman...
- Any3D-VLA: Enhancing VLA Robustness via Diverse Point Clouds : Abstract: Existing Vision-Language-Action (VLA) models typically take 2D images as visual input, which limits their spatial understanding in complex scenes. How can we incorporate 3D information to en...
- VVLoc: Prior-free 3-DoF Vehicle Visual Localization : Abstract: Localization is a critical technology in autonomous driving, encompassing both topological localization, which identifies the most similar map keyframe to the current observation, and metric...
- Generating a Paracosm for Training-Free Zero-Shot Composed Image Retrieval : Abstract: Composed Image Retrieval (CIR) is the task of retrieving a target image from a database using a multimodal query, which consists of a reference image and a modification text. The text specif...
- Edge-Native Generative De-identification: Inversion-Free Flow for Privacy-Preserving Federated Skin Image Analysis : Abstract: The deployment of Federated Learning (FL) for clinical dermatology is hindered by the competing requirements of protecting patient privacy and preserving diagnostic features. Traditional de-...
- TransNormal: Dense Visual Semantics for Diffusion-based Transparent Object Normal Estimation : Abstract: Monocular normal estimation for transparent objects is critical for laboratory automation, yet it remains challenging due to complex light refraction and reflection. These optical properties...
- Invariance on Manifolds: Understanding Robust Visual Representations for Place Recognition : Abstract: Visual Place Recognition (VPR) demands representations robust to drastic environmental and viewpoint shifts. Current aggregation paradigms, however, either rely on data-hungry supervision or...
- Distill3R: A Pipeline for Democratizing 3D Foundation Models on Commodity Hardware : Abstract: While multi-view 3D reconstruction has shifted toward large-scale foundation models capable of inferring globally consistent geometry, their reliance on massive computational clusters for tr...
- OCTOPUS: Enhancing the Spatial-Awareness of Vision SSMs with Multi-Dimensional Scans and Traversal Selection : Abstract: State space models (SSMs) have recently emerged as an alternative to transformers due to their unique ability of modeling global relationships in text with linear complexity. However, their ...
- ConsensusDrop: Fusing Visual and Cross-Modal Saliency for Efficient Vision Language Models : Abstract: Vision-Language Models (VLMs) are expensive because the LLM processes hundreds of largely redundant visual tokens. Existing token reduction methods typically exploit \textit{either} vision-e...
- Data Augmentation for High-Fidelity Generation of CAR-T/NK Immunological Synapse Images : Abstract: Chimeric antigen receptor (CAR)-T and NK cell immunotherapies have transformed cancer treatment, and recent studies suggest that the quality of the CAR-T/NK cell immunological synapse (IS) m...
- Unveiling the Cognitive Compass: Theory-of-Mind-Guided Multimodal Emotion Reasoning : Abstract: Despite rapid progress in multimodal large language models (MLLMs), their capability for deep emotional understanding remains limited. We argue that genuine affective intelligence requires e...
- VAMOS-OCTA: Vessel-Aware Multi-Axis Orthogonal Supervision for Inpainting Motion-Corrupted OCT Angiography Volumes : Abstract: Handheld Optical Coherence Tomography Angiography (OCTA) enables noninvasive retinal imaging in uncooperative or pediatric subjects, but is highly susceptible to motion artifacts that severe...
- SRVAU-R1: Enhancing Video Anomaly Understanding via Reflection-Aware Learning : Abstract: Multi-modal large language models (MLLMs) have demonstrated significant progress in reasoning capabilities and shown promising effectiveness in video anomaly understanding (VAU) tasks. Howev...
- LocalScore: Local Density-Aware Similarity Scoring for Biometrics : Abstract: Open-set biometrics faces challenges with probe subjects who may not be enrolled in the gallery, as traditional biometric systems struggle to detect these non-mated probes. Despite the growi...
- Effectiveness of Automatically Curated Dataset in Thyroid Nodules Classification Algorithms Using Deep Learning : Abstract: The diagnosis of thyroid nodule cancers commonly utilizes ultrasound images. Several studies showed that deep learning algorithms designed to classify benign and malignant thyroid nodules co...
- GMAC: Global Multi-View Constraint for Automatic Multi-Camera Extrinsic Calibration : Abstract: Automatic calibration of multi-camera systems, namely the accurate estimation of spatial extrinsic parameters, is fundamental for 3D reconstruction, panoramic perception, and multi-view data...
- FUSE-Flow: Scalable Real-Time Multi-View Point Cloud Reconstruction Using Confidence : Abstract: Real-time multi-view point cloud reconstruction is a core problem in 3D vision and immersive perception, with wide applications in VR, AR, robotic navigation, digital twins, and computer int...
- From Videos to Conversations: Egocentric Instructions for Task Assistance : Abstract: Many everyday tasks, ranging from appliance repair and cooking to car maintenance, require expert knowledge, particularly for complex, multi-step procedures. Despite growing interest in AI a...
- ReLayout: Versatile and Structure-Preserving Design Layout Editing via Relation-Aware Design Reconstruction : Abstract: Automated redesign without manual adjustments marks a key step forward in the design workflow. In this work, we focus on a foundational redesign task termed design layout editing, which seek...
- Baseline Method of the Foundation Model Challenge for Ultrasound Image Analysis : Abstract: Ultrasound (US) imaging exhibits substantial heterogeneity across anatomical structures and acquisition protocols, posing significant challenges to the development of generalizable analysis ...
- Radioactive 3D Gaussian Ray Tracing for Tomographic Reconstruction : Abstract: 3D Gaussian Splatting (3DGS) has recently emerged in computer vision as a promising rendering technique. By adapting the principles of Elliptical Weighted Average (EWA) splatting to a modern...
- DRFormer: A Dual-Regularized Bidirectional Transformer for Person Re-identification : Abstract: Both fine-grained discriminative details and global semantic features can contribute to solving person re-identification challenges, such as occlusion and pose variations. Vision foundation ...
- Read As Human: Compressing Context via Parallelizable Close Reading and Skimming : Abstract: Large Language Models (LLMs) demonstrate exceptional capability across diverse tasks. However, their deployment in long-context scenarios is hindered by two challenges: computational ineffic...
- PretrainRL: Alleviating Factuality Hallucination of Large Language Models at the Beginning : Abstract: Large language models (LLMs), despite their powerful capabilities, suffer from factual hallucinations where they generate verifiable falsehoods. We identify a root of this issue: the imbalan...
- GuideWeb: A Benchmark for Automatic In-App Guide Generation on Real-World Web UIs : Abstract: Digital Adoption Platform (DAP) provide web-based overlays that deliver operation guidance and contextual hints to help users navigate complex websites. Although modern DAP tools enable non-...
- From Code-Centric to Concept-Centric: Teaching NLP with LLM-Assisted "Vibe Coding" : Abstract: The rapid advancement of Large Language Models (LLMs) presents both challenges and opportunities for Natural Language Processing (NLP) education. This paper introduces ``Vibe Coding,'' a ped...
- Orthogonal Hierarchical Decomposition for Structure-Aware Table Understanding with Large Language Models : Abstract: Complex tables with multi-level headers, merged cells and heterogeneous layouts pose persistent challenges for LLMs in both understanding and reasoning. Existing approaches typically rely on...
- Beyond Local Edits: Embedding-Virtualized Knowledge for Broader Evaluation and Preservation of Model Editing : Abstract: Knowledge editing methods for large language models are commonly evaluated using predefined benchmarks that assess edited facts together with a limited set of related or neighboring knowledg...
- S3-CoT: Self-Sampled Succinct Reasoning Enables Efficient Chain-of-Thought LLMs : Abstract: Large language models (LLMs) equipped with chain-of-thought (CoT) achieve strong performance and offer a window into LLM behavior. However, recent evidence suggests that improvements in CoT ...
- From Latent Signals to Reflection Behavior: Tracing Meta-Cognitive Activation Trajectory in R1-Style LLMs : Abstract: R1-style LLMs have attracted growing attention for their capacity for self-reflection, yet the internal mechanisms underlying such behavior remain unclear. To bridge this gap, we anchor on t...
- NEAT: Neuron-Based Early Exit for Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) often suffer from \emph{overthinking}, a phenomenon in which redundant reasoning steps are generated after a correct solution has already been reached. Existing...
- WildGraphBench: Benchmarking GraphRAG with Wild-Source Corpora : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) organizes external knowledge as a hierarchical graph, enabling efficient retrieval and aggregation of scattered evidence across multiple...
- Closing the Loop: Universal Repository Representation with RPG-Encoder : Abstract: Current repository agents encounter a reasoning disconnect due to fragmented representations, as existing methods rely on isolated API documentation or dependency graphs that lack semantic d...
- Dicta-LM 3.0: Advancing The Frontier of Hebrew Sovereign LLMs : Abstract: Open-weight LLMs have been released by frontier labs; however, sovereign Large Language Models (for languages other than English) remain low in supply yet high in demand. Training large lang...
- Out of the Memory Barrier: A Highly Memory Efficient Training System for LLMs with Million-Token Contexts : Abstract: Training Large Language Models (LLMs) on long contexts is severely constrained by prohibitive GPU memory overhead, not training time. The primary culprits are the activations, whose memory f...
- There Is More to Refusal in Large Language Models than a Single Direction : Abstract: Prior work argues that refusal in large language models is mediated by a single activation-space direction, enabling effective steering and ablation. We show that this account is incomplete....
- Quantifying the Gap between Understanding and Generation within Unified Multimodal Models : Abstract: Recent advances in unified multimodal models (UMM) have demonstrated remarkable progress in both understanding and generation tasks. However, whether these two capabilities are genuinely ali...
- Focus-dLLM: Accelerating Long-Context Diffusion LLM Inference via Confidence-Guided Context Focusing : Abstract: Diffusion Large Language Models (dLLMs) deliver strong long-context processing capability in a non-autoregressive decoding paradigm. However, the considerable computational cost of bidirecti...
- D-CORE: Incentivizing Task Decomposition in Large Reasoning Models for Complex Tool Use : Abstract: Effective tool use and reasoning are essential capabilities for large reasoning models~(LRMs) to address complex real-world problems. Through empirical analysis, we identify that current LRM...
- AR-MAP: Are Autoregressive Large Language Models Implicit Teachers for Diffusion Large Language Models? : Abstract: Diffusion Large Language Models (DLLMs) have emerged as a powerful alternative to autoregressive models, enabling parallel token generation across multiple positions. However, preference ali...
- Evaluating Metalinguistic Knowledge in Large Language Models across the World's Languages : Abstract: Large language models (LLMs) are routinely evaluated on language use tasks, yet their knowledge of linguistic structure remains poorly understood. Existing linguistic benchmarks typically fo...
- Sinhala Physical Common Sense Reasoning Dataset for Global PIQA : Abstract: This paper presents the first-ever Sinhala physical common sense reasoning dataset created as part of Global PIQA. It contains 110 human-created and verified data samples, where each sample ...
- Am I More Pointwise or Pairwise? Revealing Position Bias in Rubric-Based LLM-as-a-Judge : Abstract: Large language models (LLMs) are now widely used to evaluate the quality of text, a field commonly referred to as LLM-as-a-judge. While prior works mainly focus on point-wise and pair-wise e...
- Using Correspondence Patterns to Identify Irregular Words in Cognate sets Through Leave-One-Out Validation : Abstract: Regular sound correspondences constitute the principal evidence in historical language comparison. Despite the heuristic focus on regularity, it is often more an intuitive judgement than a q...
- dziribot: rag based intelligent conversational agent for algerian arabic dialect : Abstract: The rapid digitalization of customer service has intensified the demand for conversational agents capable of providing accurate and natural interactions. In the Algerian context, this is com...
- Kimi K2.5: Visual Agentic Intelligence : Abstract: We introduce Kimi K2.5, an open-source multimodal agentic model designed to advance general agentic intelligence. K2.5 emphasizes the joint optimization of text and vision so that two modali...
- Cross-Lingual Stability of LLM Judges Under Controlled Generation: Evidence from Finno-Ugric Languages : Abstract: Cross-lingual evaluation of large language models (LLMs) typically conflates two sources of variance: genuine model performance differences and measurement instability. We investigate evalua...
- The Shape of Beliefs: Geometry, Dynamics, and Interventions along Representation Manifolds of Language Models' Posteriors : Abstract: Large language models (LLMs) represent prompt-conditioned beliefs (posteriors over answers and claims), but we lack a mechanistic account of how these beliefs are encoded in representation s...
- Language Steering for Multilingual In-Context Learning : Abstract: While multilingual large language models have gained widespread adoption, their performance on non-English languages remains substantially inferior to English. This disparity is particularly...
- Automated Multiple Mini Interview (MMI) Scoring : Abstract: Assessing soft skills such as empathy, ethical judgment, and communication is essential in competitive selection processes, yet human scoring is often inconsistent and biased. While Large La...
- Proof-RM: A Scalable and Generalizable Reward Model for Math Proof : Abstract: While Large Language Models (LLMs) have demonstrated strong math reasoning abilities through Reinforcement Learning with *Verifiable Rewards* (RLVR), many advanced mathematical problems are ...
- ROG: Retrieval-Augmented LLM Reasoning for Complex First-Order Queries over Knowledge Graphs : Abstract: Answering first-order logic (FOL) queries over incomplete knowledge graphs (KGs) is difficult, especially for complex query structures that compose projection, intersection, union, and negat...
- Large Language Models for Mental Health: A Multilingual Evaluation : Abstract: Large Language Models (LLMs) have remarkable capabilities across NLP tasks. However, their performance in multilingual contexts, especially within the mental health domain, has not been thor...
- From Directions to Regions: Decomposing Activations in Language Models via Local Geometry : Abstract: Activation decomposition methods in language models are tightly coupled to geometric assumptions on how concepts are realized in activation space. Existing approaches search for individual g...
- Indications of Belief-Guided Agency and Meta-Cognitive Monitoring in Large Language Models : Abstract: Rapid advancements in large language models (LLMs) have sparked the question whether these models possess some form of consciousness. To tackle this challenge, Butlin et al. (2023) introduce...
- Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability : Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often...
- SpeechLess: Micro-utterance with Personalized Spatial Memory-aware Assistant in Everyday Augmented Reality : Abstract: Speaking aloud to a wearable AR assistant in public can be socially awkward, and re-articulating the same requests every day creates unnecessary effort. We present SpeechLess, a wearable AR ...
- Adapting Where It Matters: Depth-Aware Adaptation for Efficient Multilingual Speech Recognition in Low-Resource Languages : Abstract: Recent speech foundation models excel at multilingual automatic speech recognition (ASR) for high-resource languages, but adapting them to low-resource languages remains challenging due to d...
- A-MapReduce: Executing Wide Search via Agentic MapReduce : Abstract: Contemporary large language model (LLM)-based multi-agent systems exhibit systematic advantages in deep research tasks, which emphasize iterative, vertically structured information seeking. ...
- Causally Disentangled Contrastive Learning for Multilingual Speaker Embeddings : Abstract: Self-supervised speaker embeddings are widely used in speaker verification systems, but prior work has shown that they often encode sensitive demographic attributes, raising fairness and pri...
- From Lengthy to Lucid: A Systematic Literature Review on NLP Techniques for Taming Long Sentences : Abstract: Long sentences have been a persistent issue in written communication for many years since they make it challenging for readers to grasp the main points or follow the initial intention of the...
- ALiiCE: Evaluating Positional Fine-grained Citation Generation : Abstract: Large Language Model (LLM) can enhance its credibility and verifiability by generating text with citations. However, existing research on citation generation is predominantly limited to sent...
- Paraphrase Types Elicit Prompt Engineering Capabilities : Abstract: Much of the success of modern language models depends on finding a suitable prompt to instruct the model. Until now, it has been largely unknown how variations in the linguistic expression o...
- LFQA-E: Carefully Benchmarking Long-form QA Evaluation : Abstract: Long-Form Question Answering (LFQA) involves generating comprehensive, paragraph-level responses to open-ended questions, which poses a significant challenge for evaluation due to the richne...
- Leveraging LLMs for Translating and Classifying Mental Health Data : Abstract: Large language models (LLMs) are increasingly used in medical fields. In mental health support, the early identification of linguistic markers associated with mental health conditions can pr...
- Evolutionary Pre-Prompt Optimization for Mathematical Reasoning : Abstract: Recent advancements have highlighted that large language models (LLMs), when given a small set of task-specific examples, demonstrate remarkable proficiency, a capability that extends to com...
- FreeChunker: A Cross-Granularity Chunking Framework : Abstract: Chunking strategies significantly impact the effectiveness of Retrieval-Augmented Generation (RAG) systems. Existing methods operate within fixed-granularity paradigms that rely on static bo...
- PIRA: Preference-Oriented Instruction-Tuned Reward Models with Dual Aggregation : Abstract: Reward models are pivotal for aligning Large Language Models (LLMs) with human preferences. Existing approaches face two key limitations: Discriminative reward models require large-scale ann...
- SAPO: Self-Adaptive Process Optimization Makes Small Reasoners Stronger : Abstract: Existing self-evolution methods overlook the influence of fine-grained reasoning steps, which leads to the reasoner-verifier gap. The computational inefficiency of Monte Carlo (MC) process s...
- CE-RM: A Pointwise Generative Reward Model Optimized via Two-Stage Rollout and Unified Criteria : Abstract: Automatic evaluation is crucial yet challenging for open-ended natural language generation, especially when rule-based metrics are infeasible. Compared with traditional methods, the recent L...
- Do Whitepaper Claims Predict Market Behavior? Evidence from Cryptocurrency Factor Analysis : Abstract: Cryptocurrency projects articulate value propositions through whitepapers, making claims about functionality and technical capabilities. This study investigates whether these narratives alig...
- The augmented NLP bound for maximum-entropy remote sampling : Abstract: The maximum-entropy remote sampling problem (MERSP) is to select a subset of s random variables from a set of n random variables, so as to maximize the information concerning a set of target...
- A Diffusive Classification Loss for Learning Energy-based Generative Models : Abstract: Score-based generative models have recently achieved remarkable success. While they are usually parameterized by the score, an alternative way is to use a series of time-dependent energy-bas...
- Faithful-Patchscopes: Understanding and Mitigating Model Bias in Hidden Representations Explanation of Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated strong capabilities for hidden representation interpretation through Patchscopes, a framework that uses LLMs themselves to generate human-reada...
- MiNER: A Two-Stage Pipeline for Metadata Extraction from Municipal Meeting Minutes : Abstract: Municipal meeting minutes are official documents of local governance, exhibiting heterogeneous formats and writing styles. Effective information retrieval (IR) requires identifying metadata ...
- DETOUR: An Interactive Benchmark for Dual-Agent Search and Reasoning : Abstract: When recalling information in conversation, people often arrive at the recollection after multiple turns. However, existing benchmarks for evaluating agent capabilities in such tip-of-the-to...
- DecompressionLM: Deterministic, Diagnostic, and Zero-Shot Concept Graph Extraction from Language Models : Abstract: Existing knowledge probing methods rely on pre-defined queries, limiting extraction to known concepts. We introduce DecompressionLM, a stateless framework for zero-shot concept graph extract...
- Clause-Internal or Clause-External? Testing Turkish Reflexive Binding in Adapted versus Chain of Thought Large Language Models : Abstract: This study evaluates whether state-of-the-art large language models capture the binding relations of Turkish reflexive pronouns. We construct a balanced set of 100 sentences that pit local a...
- Segment-Level Attribution for Selective Learning of Long Reasoning Traces : Abstract: Large Reasoning Models (LRMs) achieve strong reasoning performance by generating long chains of thought (CoTs), yet only a small fraction of these traces meaningfully contributes to answer p...
- What Matters to an LLM? Behavioral and Computational Evidences from Summarization : Abstract: Large Language Models (LLMs) are now state-of-the-art at summarization, yet the internal notion of importance that drives their information selections remains hidden. We propose to investiga...
- Intention-Adaptive LLM Fine-Tuning for Text Revision Generation : Abstract: Large Language Models (LLMs) have achieved impressive capabilities in various context-based text generation tasks, such as summarization and reasoning; however, their applications in intenti...
- From Knowledge to Inference: Scaling Laws of Specialized Reasoning on GlobalHealthAtlas : Abstract: Public health reasoning requires population level inference grounded in scientific evidence, expert consensus, and safety constraints. However, it remains underexplored as a structured machi...
- Reasoning by Commented Code for Table Question Answering : Abstract: Table Question Answering (TableQA) poses a significant challenge for large language models (LLMs) because conventional linearization of tables often disrupts the two-dimensional relationship...
- A Hierarchical and Attentional Analysis of Argument Structure Constructions in BERT Using Naturalistic Corpora : Abstract: This study investigates how the Bidirectional Encoder Representations from Transformers model processes four fundamental Argument Structure Constructions. We employ a multi-dimensional analy...
- The French Drama Revolution: Political Economy and Literary Production, 1700-1900 : Abstract: This paper investigates the changing nature of French drama between 1700-1900 using Latent Dirichlet Allocation and Jensen-Shannon Divergence. Results indicate that the topical distribution ...
- Kanade: A Simple Disentangled Tokenizer for Spoken Language Modeling : Abstract: A good language model starts with a good tokenizer. Tokenization is especially important for speech modeling, which must handle continuous signals that mix linguistic and non-linguistic info...
- Lookahead-then-Verify: Reliable Constrained Decoding for Diffusion LLMs under Context-Free Grammars : Abstract: Diffusion Large Language Models (dLLMs) have demonstrated promising generative capabilities and are increasingly used to produce formal languages defined by context-free grammars, such as so...
- Transformer-Based Model for Multilingual Hope Speech Detection : Abstract: This paper describes a system that has been submitted to the "PolyHope-M" at RANLP2025. In this work various transformers have been implemented and evaluated for hope speech detection for En...
- Formal Semantic Control over Language Models : Abstract: This thesis advances semantic representation learning to render language representations or models more semantically and geometrically interpretable, and to enable localised, quasi-symbolic,...
- LegalOne: A Family of Foundation Models for Reliable Legal Reasoning : Abstract: While Large Language Models (LLMs) have demonstrated impressive general capabilities, their direct application in the legal domain is often hindered by a lack of precise domain knowledge and...
- CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs : Abstract: User modeling characterizes individuals through their preferences and behavioral patterns to enable personalized simulation and generation with Large Language Models (LLMs) in contemporary a...
- Temporal Leakage in Search-Engine Date-Filtered Web Retrieval: A Case Study from Retrospective Forecasting : Abstract: Search-engine date filters are widely used to enforce pre-cutoff retrieval in retrospective evaluations of search-augmented forecasters. We show this approach is unreliable: auditing Google ...
- APR: Penalizing Structural Redundancy in Large Reasoning Models via Anchor-based Process Rewards : Abstract: Test-Time Scaling (TTS) has significantly enhanced the capabilities of Large Reasoning Models (LRMs) but introduces a critical side-effect known as Overthinking. We conduct a preliminary stu...
- WordCraft: Scaffolding the Keyword Method for L2 Vocabulary Learning with Multimodal LLMs : Abstract: Applying the keyword method for vocabulary memorization remains a significant challenge for L1 Chinese-L2 English learners. They frequently struggle to generate phonologically appropriate ke...
- Reasoning as State Transition: A Representational Analysis of Reasoning Evolution in Large Language Models : Abstract: Large Language Models have achieved remarkable performance on reasoning tasks, motivating research into how this ability evolves during training. Prior work has primarily analyzed this evolu...
- HyLRA: Hybrid Layer Reuse Attention for Efficient Long-Context Inference : Abstract: Long-context inference in Large Language Models (LLMs) is bottlenecked by the quadratic computation complexity of attention and the substantial memory footprint of Key-Value (KV) caches. Whi...
- Omni-RRM: Advancing Omni Reward Modeling via Automatic Rubric-Grounded Preference Synthesis : Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities, yet their performance is often capped by the coarse nature of existing alignment techniques. A critical bottlenec...
- Unifying Adversarial Robustness and Training Across Text Scoring Models : Abstract: Research on adversarial robustness in language models is currently fragmented across applications and attacks, obscuring shared vulnerabilities. In this work, we propose unifying the study o...
- ILSIC: Corpora for Identifying Indian Legal Statutes from Queries by Laypeople : Abstract: Legal Statute Identification (LSI) for a given situation is one of the most fundamental tasks in Legal NLP. This task has traditionally been modeled using facts from court judgments as input...
- Verification Required: The Impact of Information Credibility on AI Persuasion : Abstract: Agents powered by large language models (LLMs) are increasingly deployed in settings where communication shapes high-stakes decisions, making a principled understanding of strategic communic...
- Sparse Reward Subsystem in Large Language Models : Abstract: In this paper, we identify a sparse reward subsystem within the hidden states of Large Language Models (LLMs), drawing an analogy to the biological reward subsystem in the human brain. We de...
- Reliable Use of Lemmas via Eligibility Reasoning and Section$-$Aware Reinforcement Learning : Abstract: Recent large language models (LLMs) perform strongly on mathematical benchmarks yet often misapply lemmas, importing conclusions without validating assumptions. We formalize lemma$-$judging ...
- Distilling Token-Trained Models into Byte-Level Models : Abstract: Byte Language Models (BLMs) have emerged as a promising direction for scaling language models beyond tokenization. However, existing BLMs typically require training from scratch on trillions...
- Large Language Models as Students Who Think Aloud: Overly Coherent, Verbose, and Confident : Abstract: Large language models (LLMs) are increasingly embedded in AI-based tutoring systems. Can they faithfully model novice reasoning and metacognitive judgments? Existing evaluations emphasize pr...
- Bias in the Ear of the Listener: Assessing Sensitivity in Audio Language Models Across Linguistic, Demographic, and Positional Variations : Abstract: This work presents the first systematic investigation of speech bias in multilingual MLLMs. We construct and release the BiasInEar dataset, a speech-augmented benchmark based on Global MMLU ...
- Exploring Knowledge Purification in Multi-Teacher Knowledge Distillation for LLMs : Abstract: Knowledge distillation has emerged as a pivotal technique for transferring knowledge from stronger large language models (LLMs) to smaller, more efficient models. However, traditional distil...
- What If We Allocate Test-Time Compute Adaptively? : Abstract: Test-time compute scaling allocates inference computation uniformly, uses fixed sampling strategies, and applies verification only for reranking. In contrast, we propose a verifier-guided ad...
- Logic-Oriented Retriever Enhancement via Contrastive Learning : Abstract: Large language models (LLMs) struggle in knowledge-intensive tasks, as retrievers often overfit to surface similarity and fail on queries involving complex logical relations. The capacity fo...
- Tendem: A Hybrid AI+Human Platform : Abstract: Tendem is a hybrid system where AI handles structured, repeatable work and Human Experts step in when the models fail or to verify results. Each result undergoes a comprehensive quality revi...
- Don't Judge a Book by its Cover: Testing LLMs' Robustness Under Logical Obfuscation : Abstract: Tasks such as solving arithmetic equations, evaluating truth tables, and completing syllogisms are handled well by large language models (LLMs) in their standard form, but they often fail wh...
- Beyond Training for Cultural Awareness: The Role of Dataset Linguistic Structure in Large Language Models : Abstract: The global deployment of large language models (LLMs) has raised concerns about cultural misalignment, yet the linguistic properties of fine-tuning datasets used for cultural adaptation rema...
- Typologically-Informed Candidate Reranking for LLM-based Translation into Low-Resource Languages : Abstract: Large language models trained predominantly on high-resource languages exhibit systematic biases toward dominant typological patterns, leading to structural non-conformance when translating ...
- PedagoSense: A Pedology Grounded LLM System for Pedagogical Strategy Detection and Contextual Response Generation in Learning Dialogues : Abstract: This paper addresses the challenge of improving interaction quality in dialogue based learning by detecting and recommending effective pedagogical strategies in tutor student conversations. ...
- EmoAra: Emotion-Preserving English Speech Transcription and Cross-Lingual Translation with Arabic Text-to-Speech : Abstract: This work presents EmoAra, an end-to-end emotion-preserving pipeline for cross-lingual spoken communication, motivated by banking customer service where emotional context affects service qua...
- Bridging Lexical Ambiguity and Vision: A Mini Review on Visual Word Sense Disambiguation : Abstract: This paper offers a mini review of Visual Word Sense Disambiguation (VWSD), which is a multimodal extension of traditional Word Sense Disambiguation (WSD). VWSD helps tackle lexical ambiguit...
- ASTER: Agentic Scaling with Tool-integrated Extended Reasoning : Abstract: Reinforcement learning (RL) has emerged as a dominant paradigm for eliciting long-horizon reasoning in Large Language Models (LLMs). However, scaling Tool-Integrated Reasoning (TIR) via RL r...
- Chronos: Learning Temporal Dynamics of Reasoning Chains for Test-Time Scaling : Abstract: Test-Time Scaling (TTS) has emerged as an effective paradigm for improving the reasoning performance of large language models (LLMs). However, existing methods -- most notably majority votin...
- Inferential Question Answering : Abstract: Despite extensive research on a wide range of question answering (QA) systems, most existing work focuses on answer containment-i.e., assuming that answers can be directly extracted and/or g...
- Minimizing Mismatch Risk: A Prototype-Based Routing Framework for Zero-shot LLM-generated Text Detection : Abstract: Zero-shot methods detect LLM-generated text by computing statistical signatures using a surrogate model. Existing approaches typically employ a fixed surrogate for all inputs regardless of t...
- Large-Scale Terminal Agentic Trajectory Generation from Dockerized Environments : Abstract: Training agentic models for terminal-based tasks critically depends on high-quality terminal trajectories that capture realistic long-horizon interactions across diverse domains. However, co...
- PARSE: An Open-Domain Reasoning Question Answering Benchmark for Persian : Abstract: Reasoning-focused Question Answering (QA) has advanced rapidly with Large Language Models (LLMs), yet high-quality benchmarks for low-resource languages remain scarce. Persian, spoken by rou...
- DreamOn: Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas : Abstract: Diffusion Language Models (DLMs) present a compelling alternative to autoregressive models, offering flexible, any-order infilling without specialized prompting design. However, their practi...
- Balancing Understanding and Generation in Discrete Diffusion Models : Abstract: In discrete generative modeling, two dominant paradigms demonstrate divergent capabilities: Masked Diffusion Language Models (MDLM) excel at semantic understanding and zero-shot generalizati...
- On the Power of (Approximate) Reward Models for Inference-Time Scaling : Abstract: Inference-time scaling has recently emerged as a powerful paradigm for improving the reasoning capability of large language models. Among various approaches, Sequential Monte Carlo (SMC) has...
- Rethinking Selective Knowledge Distillation : Abstract: Growing efforts to improve knowledge distillation (KD) in large language models (LLMs) replace dense teacher supervision with selective distillation, which uses a subset of token positions, ...
- Understanding QA generation: Extracting Parametric and Contextual Knowledge with CQA for Low Resource Bangla Language : Abstract: Question-Answering (QA) models for low-resource languages like Bangla face challenges due to limited annotated data and linguistic complexity. A key issue is determining whether models rely ...
- ConPress: Learning Efficient Reasoning from Multi-Question Contextual Pressure : Abstract: Large reasoning models (LRMs) typically solve reasoning-intensive tasks by generating long chain-of-thought (CoT) traces, leading to substantial inference overhead. We identify a reproducibl...
- Ebisu: Benchmarking Large Language Models in Japanese Finance : Abstract: Japanese finance combines agglutinative, head-final linguistic structure, mixed writing systems, and high-context communication norms that rely on indirect expression and implicit commitment...
- Argument Rarity-based Originality Assessment for AI-Assisted Writing : Abstract: As Large Language Models (LLMs) have become capable of effortlessly generating high-quality text, traditional quality-focused writing assessment is losing its significance. If the essential ...
- FS-Researcher: Test-Time Scaling for Long-Horizon Research Tasks with File-System-Based Agents : Abstract: Deep research is emerging as a representative long-horizon task for large language model (LLM) agents. However, long trajectories in deep research often exceed model context limits, compress...
- LLM-based Embeddings: Attention Values Encode Sentence Semantics Better Than Hidden States : Abstract: Sentence representations are foundational to many Natural Language Processing (NLP) applications. While recent methods leverage Large Language Models (LLMs) to derive sentence representation...
- Wiki Live Challenge: Challenging Deep Research Agents with Expert-Level Wikipedia Articles : Abstract: Deep Research Agents (DRAs) have demonstrated remarkable capabilities in autonomous information retrieval and report generation, showing great potential to assist humans in complex research ...
- The Art of Socratic Inquiry: A Framework for Proactive Template-Guided Therapeutic Conversation Generation : Abstract: Proactive questioning, where therapists deliberately initiate structured, cognition-guiding inquiries, is a cornerstone of cognitive behavioral therapy (CBT). Yet, current psychological larg...
- SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia : Abstract: Culturally aware safeguards are crucial for AI alignment in real-world settings, where safety extends beyond common sense and encompasses diverse local values, norms, and region-specific reg...
- A2Eval: Agentic and Automated Evaluation for Embodied Brain : Abstract: Current embodied VLM evaluation relies on static, expert-defined, manually annotated benchmarks that exhibit severe redundancy and coverage imbalance. This labor intensive paradigm drains co...
- Steering Vector Fields for Context-Aware Inference-Time Control in Large Language Models : Abstract: Steering vectors (SVs) offer a lightweight way to control large language models (LLMs) at inference time by shifting hidden activations, providing a practical middle ground between prompting...
- Scaling Search-Augmented LLM Reasoning via Adaptive Information Control : Abstract: Search-augmented reasoning agents interleave multi-step reasoning with external information retrieval, but uncontrolled retrieval often leads to redundant evidence, context saturation, and u...
- ARTIS: Agentic Risk-Aware Test-Time Scaling via Iterative Simulation : Abstract: Current test-time scaling (TTS) techniques enhance large language model (LLM) performance by allocating additional computation at inference time, yet they remain insufficient for agentic set...
- MedAraBench: Large-Scale Arabic Medical Question Answering Dataset and Benchmark : Abstract: Arabic remains one of the most underrepresented languages in natural language processing research, particularly in medical applications, due to the limited availability of open-source data a...
- Mechanistic Indicators of Steering Effectiveness in Large Language Models : Abstract: Activation-based steering enables Large Language Models (LLMs) to exhibit targeted behaviors by intervening on intermediate activations without retraining. Despite its widespread use, the me...
- COMI: Coarse-to-fine Context Compression via Marginal Information Gain : Abstract: Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse tasks. However, their deployment in long context scenarios remains hindered by computational inefficien...
- WorldCup Sampling for Multi-bit LLM Watermarking : Abstract: As large language models (LLMs) generate increasingly human-like text, watermarking offers a promising solution for reliable attribution beyond mere detection. While multi-bit watermarking e...
- Data Distribution Matters: A Data-Centric Perspective on Context Compression for Large Language Model : Abstract: The deployment of Large Language Models (LLMs) in long-context scenarios is hindered by computational inefficiency and significant information redundancy. Although recent advancements have w...
- CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding : Abstract: Large Language Models (LLMs) have achieved remarkable success in source code understanding, yet as software systems grow in scale, computational efficiency has become a critical bottleneck. ...
- AXE: Low-Cost Cross-Domain Web Structured Information Extraction : Abstract: Extracting structured data from the web is often a trade-off between the brittle nature of manual heuristics and the prohibitive cost of Large Language Models. We introduce AXE (Adaptive X-P...
- Finance-Grounded Optimization For Algorithmic Trading : Abstract: Deep Learning is evolving fast and integrates into various domains. Finance is a challenging field for deep learning, especially in the case of interpretable artificial intelligence (AI). Al...
- DAG: A Dual Correlation Network for Time Series Forecasting with Exogenous Variables : Abstract: Time series forecasting is essential in various domains. Compared to relying solely on endogenous variables (i.e., target variables), considering exogenous variables (i.e., covariates) provi...
- Prediction Markets with Intermittent Contributions : Abstract: Although both data availability and the demand for accurate forecasts are increasing, collaboration between stakeholders is often constrained by data ownership and competitive interests. In ...
- Window-Diffusion: Accelerating Diffusion Language Model Inference with Windowed Token Pruning and Caching : Abstract: Diffusion language models (DLMs) generate text through iterative denoising, but inference requires full-sequence attention at every iteration, resulting in substantial redundant computation ...
- GraphAllocBench: A Flexible Benchmark for Preference-Conditioned Multi-Objective Policy Learning : Abstract: Preference-Conditioned Policy Learning (PCPL) in Multi-Objective Reinforcement Learning (MORL) aims to approximate diverse Pareto-optimal solutions by conditioning policies on user-specified...
- Less is More: Clustered Cross-Covariance Control for Offline RL : Abstract: A fundamental challenge in offline reinforcement learning is distributional shift. Scarce data or datasets dominated by out-of-distribution (OOD) areas exacerbate this issue. Our theoretical...
- Accurate Network Traffic Matrix Prediction via LEAD: a Large Language Model-Enhanced Adapter-Based Conditional Diffusion Model : Abstract: Driven by the evolution toward 6G and AI-native edge intelligence, network operations increasingly require predictive and risk-aware adaptation under stringent computation and latency constr...
- Sampling-Free Privacy Accounting for Matrix Mechanisms under Random Allocation : Abstract: We study privacy amplification for differentially private model training with matrix factorization under random allocation (also known as the balls-in-bins model). Recent work by Choquette-C...
- Hardware-Triggered Backdoors : Abstract: Machine learning models are routinely deployed on a wide range of computing hardware. Although such hardware is typically expected to produce identical results, differences in its design can...
- VC Theory for Inventory Policies : Abstract: There has been growing interest in applying reinforcement learning (RL) to inventory management, either by optimizing over temporal transitions or by learning directly from full historical d...
- Efficient Transformer Encoders for Mask2Former-style models : Abstract: Vision transformer based models bring significant improvements for image segmentation tasks. Although these architectures offer powerful capabilities irrespective of specific segmentation ta...
- Future frame prediction in chest and liver cine MRI using the PCA respiratory motion model: comparing transformers and dynamically trained recurrent neural networks : Abstract: Respiratory motion complicates accurate irradiation of thoraco-abdominal tumors in radiotherapy, as treatment-system latency entails target-location uncertainties. This work addresses frame ...
- Polyak's Heavy Ball Method Achieves Accelerated Local Rate of Convergence under Polyak-Lojasiewicz Inequality : Abstract: In this work, we analyze the convergence of Polyak's heavy ball method in both continuous and discrete time for non-convex $C^4$-objective functions satisfying the Polyak-Lojasiewicz inequal...
- Diffusion-based Layer-wise Semantic Reconstruction for Unsupervised Out-of-Distribution Detection : Abstract: Unsupervised out-of-distribution (OOD) detection aims to identify out-of-domain data by learning only from unlabeled In-Distribution (ID) training samples, which is crucial for developing a ...
- Graph Max Shift: A Hill-Climbing Method for Graph Clustering : Abstract: We present a method for graph clustering that is analogous to gradient ascent methods previously proposed for clustering points in space. The algorithm, which can be viewed as a max-degree h...
- Quantum Re-Uploading for Calorimetry: Optimized Architectures with Extended Expressivity : Abstract: Near-term quantum machine learning must balance expressivity, optimization, and hardware constraints. We study quantum re-uploading units (QRUs) as compact circuits and compare them, at matc...
- When Pattern-by-Pattern Works: Theoretical and Empirical Insights for Logistic Models with Missing Values : Abstract: Predicting with missing inputs challenges even parametric models, as parameter estimation alone is insufficient for prediction on incomplete data. While several works study prediction in lin...
- Spatio-Temporal Transformers for Long-Term NDVI Forecasting : Abstract: Long-term satellite image time series (SITS) analysis in heterogeneous landscapes faces significant challenges, particularly in Mediterranean regions where complex spatial patterns, seasonal...
- Sentence Curve Language Models : Abstract: Language models (LMs) are a central component of modern AI systems, and diffusion-based language models (DLMs) have recently emerged as a competitive alternative. Both paradigms rely on word...
- Learning Sequential Decisions from Multiple Sources via Group-Robust Markov Decision Processes : Abstract: We often collect data from multiple sites (e.g., hospitals) that share common structure but also exhibit heterogeneity. This paper aims to learn robust sequential decision-making policies fr...
- RIR-Former: Coordinate-Guided Transformer for Continuous Reconstruction of Room Impulse Responses : Abstract: Room impulse responses (RIRs) are essential for many acoustic signal processing tasks, yet measuring them densely across space is often impractical. In this work, we propose RIR-Former, a gr...
- Transformers as Measure-Theoretic Associative Memory: A Statistical Perspective and Minimax Optimality : Abstract: Transformers excel through content-addressable retrieval and the ability to exploit contexts of, in principle, unbounded length. We recast associative memory at the level of probability meas...
- Grappa: Gradient-Only Communication for Scalable Graph Neural Network Training : Abstract: Cross-partition edges dominate the cost of distributed GNN training: fetching remote features and activations per iteration overwhelms the network as graphs deepen and partition counts grow....
- Propagating the prior from far to near offset: A self-supervised diffusion framework for progressively recovering near-offsets of towed-streamer data : Abstract: In marine towed-streamer seismic acquisition, the nearest hydrophone is often two hundred meter away from the source resulting in missing near-offset traces, which degrades critical processi...
- Privacy Amplification by Missing Data : Abstract: Privacy preservation is a fundamental requirement in many high-stakes domains such as medicine and finance, where sensitive personal data must be analyzed without compromising individual con...
- FluxNet: Learning Capacity-Constrained Local Transport Operators for Conservative and Bounded PDE Surrogates : Abstract: Autoregressive learning of time-stepping operators offers an effective approach to data-driven PDE simulation on grids. For conservation laws, however, long-horizon rollouts are often destab...
- SpikingGamma: Surrogate-Gradient Free and Temporally Precise Online Training of Spiking Neural Networks with Smoothed Delays : Abstract: Neuromorphic hardware implementations of Spiking Neural Networks (SNNs) promise energy-efficient, low-latency AI through sparse, event-driven computation. Yet, training SNNs under fine tempo...
- Stochastic Interpolants in Hilbert Spaces : Abstract: Although diffusion models have successfully extended to function-valued data, stochastic interpolants -- which offer a flexible way to bridge arbitrary distributions -- remain limited to fin...
- Position: The Need for Ultrafast Training : Abstract: Domain-specialized FPGAs have delivered unprecedented performance for low-latency inference across scientific and industrial workloads, yet nearly all existing accelerators assume static mod...
- Scale-covariant spiking wavelets : Abstract: We establish a theoretical connection between wavelet transforms and spiking neural networks through scale-space theory. We rely on the scale-covariant guarantees in the leaky integrate-and-...
- Adaptive Quality-Diversity Trade-offs for Large-Scale Batch Recommendation : Abstract: A core research question in recommender systems is to propose batches of highly relevant and diverse items, that is, items personalized to the user's preferences, but which also might get th...
- Hippasus: Effective and Efficient Automatic Feature Augmentation for Machine Learning Tasks on Relational Data : Abstract: Machine learning models depend critically on feature quality, yet useful features are often scattered across multiple relational tables. Feature augmentation enriches a base table by discove...
- Twinning Complex Networked Systems: Data-Driven Calibration of the mABCD Synthetic Graph Generator : Abstract: The increasing availability of relational data has contributed to a growing reliance on network-based representations of complex systems. Over time, these models have evolved to capture more...
- Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks : Abstract: Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeti...
- Think Dense, Not Long: Dynamic Decoupled Conditional Advantage for Efficient Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) can elicit strong multi-step reasoning, yet it often encourages overly verbose traces. Moreover, naive length penalties in group-relativ...
- Training-free score-based diffusion for parameter-dependent stochastic dynamical systems : Abstract: Simulating parameter-dependent stochastic differential equations (SDEs) presents significant computational challenges, as separate high-fidelity simulations are typically required for each p...
- Enhancing Diffusion-Based Quantitatively Controllable Image Generation via Matrix-Form EDM and Adaptive Vicinal Training : Abstract: Continuous Conditional Diffusion Model (CCDM) is a diffusion-based framework designed to generate high-quality images conditioned on continuous regression labels. Although CCDM has demonstra...
- Learning Beyond the Gaussian Data: Learning Dynamics of Neural Networks on an Expressive and Cumulant-Controllable Data Model : Abstract: We study the effect of high-order statistics of data on the learning dynamics of neural networks (NNs) by using a moment-controllable non-Gaussian data model. Considering the expressivity of...
- Real-Time 2D LiDAR Object Detection Using Three-Frame RGB Scan Encoding : Abstract: Indoor service robots need perception that is robust, more privacy-friendly than RGB video, and feasible on embedded hardware. We present a camera-free 2D LiDAR object detection pipeline tha...
- PCA of probability measures: Sparse and Dense sampling regimes : Abstract: A common approach to perform PCA on probability measures is to embed them into a Hilbert space where standard functional PCA techniques apply. While convergence rates for estimating the embe...
- Online Fine-Tuning of Pretrained Controllers for Autonomous Driving via Real-Time Recurrent RL : Abstract: Deploying pretrained policies in real-world applications presents substantial challenges that fundamentally limit the practical applicability of learning-based control systems. When autonomo...
- Well-Posed KL-Regularized Control via Wasserstein and Kalman-Wasserstein KL Divergences : Abstract: Kullback-Leibler divergence (KL) regularization is widely used in reinforcement learning, but it becomes infinite under support mismatch and can degenerate in low-noise limits. Utilizing a u...
- Hierarchical Federated Learning with SignSGD: A Highly Communication-Efficient Approach : Abstract: Hierarchical federated learning (HFL) has emerged as a key architecture for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before rea...
- NAB: Neural Adaptive Binning for Sparse-View CT reconstruction : Abstract: Computed Tomography (CT) plays a vital role in inspecting the internal structures of industrial objects. Furthermore, achieving high-quality CT reconstruction from sparse views is essential ...
- Transfer Learning Through Conditional Quantile Matching : Abstract: We introduce a transfer learning framework for regression that leverages heterogeneous source domains to improve predictive performance in a data-scarce target domain. Our approach learns a ...
- Personalized Image Generation via Human-in-the-loop Bayesian Optimization : Abstract: Imagine Alice has a specific image $x^\ast$ in her mind, say, the view of the street in which she grew up during her childhood. To generate that exact image, she guides a generative model wi...
- PRISM: Performer RS-IMLE for Single-pass Multisensory Imitation Learning : Abstract: Robotic imitation learning typically requires models that capture multimodal action distributions while operating at real-time control rates and accommodating multiple sensing modalities. Al...
- Provably Data-driven Multiple Hyper-parameter Tuning with Structured Loss Function : Abstract: Data-driven algorithm design automates hyperparameter tuning, but its statistical foundations remain limited because model performance can depend on hyperparameters in implicit and highly no...
- Masked Autoencoders as Universal Speech Enhancer : Abstract: Supervised speech enhancement methods have been very successful. However, in practical scenarios, there is a lack of clean speech, and self-supervised learning-based (SSL) speech enhancement...
- Misconception Diagnosis From Student-Tutor Dialogue: Generate, Retrieve, Rerank : Abstract: Timely and accurate identification of student misconceptions is key to improving learning outcomes and pre-empting the compounding of student errors. However, this task is highly dependent o...
- Full-Batch Gradient Descent Outperforms One-Pass SGD: Sample Complexity Separation in Single-Index Learning : Abstract: It is folklore that reusing training data more than once can improve the statistical efficiency of gradient-based learning. However, beyond linear regression, the theoretical advantage of fu...
- Energy-Efficient Neuromorphic Computing for Edge AI: A Framework with Adaptive Spiking Neural Networks and Hardware-Aware Optimization : Abstract: Edge AI applications increasingly require ultra-low-power, low-latency inference. Neuromorphic computing based on event-driven spiking neural networks (SNNs) offers an attractive path, but p...
- Age-Aware Edge-Blind Federated Learning via Over-the-Air Aggregation : Abstract: We study federated learning (FL) over wireless fading channels where multiple devices simultaneously send their model updates. We propose an efficient \emph{age-aware edge-blind over-the-air...
- HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos : Abstract: Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic i...
- The Function Representation of Artificial Neural Network : Abstract: This paper expresses the structure of artificial neural network (ANN) as a functional form, using the activation integral concept derived from the activation function. In this way, the struc...
- Parameter-efficient Multi-Task and Multi-Domain Learning using Factorized Tensor Networks : Abstract: Multi-task and multi-domain learning methods seek to learn multiple tasks/domains, jointly or one after another, using a single unified network. The primary challenge and opportunity lie in ...
- Dual-Phase Continual Learning: Supervised Adaptation Meets Unsupervised Retention : Abstract: Foundational Vision-Language Models (VLMs) excel across diverse tasks, but adapting them to new domains without forgetting prior knowledge remains a critical challenge. Continual Learning (C...
- Retrospective Feature Estimation for Continual Learning : Abstract: The intrinsic capability to continuously learn a changing data stream is a desideratum of deep neural networks (DNNs). However, current DNNs suffer from catastrophic forgetting, which interf...
- UniGAP: A Universal and Adaptive Graph Upsampling Approach to Mitigate Over-Smoothing in Node Classification Tasks : Abstract: In the graph domain, deep graph networks based on Message Passing Neural Networks (MPNNs) or Graph Transformers often cause over-smoothing of node features, limiting their expressive capacit...
- End-to-End Conformal Calibration for Optimization Under Uncertainty : Abstract: Machine learning can significantly improve performance for decision-making under uncertainty across a wide range of domains. However, ensuring robustness guarantees requires well-calibrated ...
- Individual Regret in Cooperative Stochastic Multi-Armed Bandits : Abstract: We study the regret in stochastic Multi-Armed Bandits (MAB) with multiple agents that communicate over an arbitrary connected communication graph. We analyzed a variant of Cooperative Succes...
- TRACE: Grounding Time Series in Context for Multimodal Embedding and Retrieval : Abstract: The ubiquity of dynamic data in domains such as weather, healthcare, and energy underscores a growing need for effective interpretation and retrieval of time-series data. These data are inhe...
- Brazilian Portuguese Image Captioning with Transformers: A Study on Cross-Native-Translated Dataset : Abstract: Image captioning (IC) refers to the automatic generation of natural language descriptions for images, with applications ranging from social media content generation to assisting individuals ...
- 3DGS$^2$-TR: Scalable Second-Order Trust-Region Method for 3D Gaussian Splatting : Abstract: We propose 3DGS$^2$-TR,a second-order optimizer for accelerating the scene training problem in 3D Gaussian Splatting (3DGS). Unlike existing second-order approaches that rely on explicit or ...
- Reinforcement Learning for Control Systems with Time Delays: A Comprehensive Survey : Abstract: In the last decade, Reinforcement Learning (RL) has achieved remarkable success in the control and decision-making of complex dynamical systems. However, most RL algorithms rely on the Marko...
- Alignment of Diffusion Model and Flow Matching for Text-to-Image Generation : Abstract: Diffusion models and flow matching have demonstrated remarkable success in text-to-image generation. While many existing alignment methods primarily focus on fine-tuning pre-trained generati...
- Toward Autonomous Laboratory Safety Monitoring with Vision Language Models: Learning to See Hazards Through Scene Structure : Abstract: Laboratories are prone to severe injuries from minor unsafe actions, yet continuous safety monitoring -- beyond mandatory pre-lab safety training -- is limited by human availability. Vision ...
- Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits : Abstract: We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits ha...
- Topological Residual Asymmetry for Bivariate Causal Direction : Abstract: Inferring causal direction from purely observational bivariate data is fragile: many methods commit to a direction even in ambiguous or near non-identifiable regimes. We propose Topological ...
- Exact Instance Compression for Convex Empirical Risk Minimization via Color Refinement : Abstract: Empirical risk minimization (ERM) can be computationally expensive, with standard solvers scaling poorly even in the convex setting. We propose a novel lossless compression framework for con...
- DISK: Dynamic Inference SKipping for World Models : Abstract: We present DISK, a training-free adaptive inference method for autoregressive world models. DISK coordinates two coupled diffusion transformers for video and ego-trajectory via dual-branch c...
- Stabilizing Fixed-Point Iteration for Markov Chain Poisson Equations : Abstract: Poisson equations underpin average-reward reinforcement learning, but beyond ergodicity they can be ill-posed, meaning that solutions are non-unique and standard fixed point iterations can o...
- Reinforcement Learning-assisted Constraint Relaxation for Constrained Expensive Optimization : Abstract: Constraint handling plays a key role in solving realistic complex optimization problems. Though intensively discussed in the last few decades, existing constraint handling techniques predomi...
- Surrogate Ensemble in Expensive Multi-Objective Optimization via Deep Q-Learning : Abstract: Surrogate-assisted Evolutionary Algorithms~(SAEAs) have shown promising robustness in solving expensive optimization problems. A key aspect that impacts SAEAs' effectiveness is surrogate mod...
- NPNet: A Non-Parametric Network with Adaptive Gaussian-Fourier Positional Encoding for 3D Classification and Segmentation : Abstract: We present NPNet, a fully non-parametric approach for 3D point-cloud classification and part segmentation. NPNet contains no learned weights; instead, it builds point features using determin...
- From Pixels to Facts (Pix2Fact): Benchmarking Multi-Hop Reasoning for Fine-Grained Visual Fact Checking : Abstract: Despite progress on general tasks, VLMs struggle with challenges demanding both detailed visual grounding and deliberate knowledge-based reasoning, a synergy not captured by existing benchma...
- Sampling from multi-modal distributions on Riemannian manifolds with training-free stochastic interpolants : Abstract: In this paper, we propose a general methodology for sampling from un-normalized densities defined on Riemannian manifolds, with a particular focus on multi-modal targets that remain challeng...
- Non-Clashing Teaching in Graphs: Algorithms, Complexity, and Bounds : Abstract: Kirkpatrick et al. [ALT 2019] and Fallat et al. [JMLR 2023] introduced non-clashing teaching and proved that it is the most efficient batch machine teaching model satisfying the collusion-av...
- Audio-to-Image Bird Species Retrieval without Audio-Image Pairs via Text Distillation : Abstract: Audio-to-image retrieval offers an interpretable alternative to audio-only classification for bioacoustic species recognition, but learning aligned audio-image representations is challenging...
- Cross-Modal Binary Attention: An Energy-Efficient Fusion Framework for Audio-Visual Learning : Abstract: Effective multimodal fusion requires mechanisms that can capture complex cross-modal dependencies while remaining computationally scalable for real-world deployment. Existing audio-visual fu...
- Emergence of Distortions in High-Dimensional Guided Diffusion Models : Abstract: Classifier-free guidance (CFG) is the de facto standard for conditional sampling in diffusion models, yet it often leads to a loss of diversity in generated samples. We formalize this phenom...
- A New Workflow for Materials Discovery Bridging the Gap Between Experimental Databases and Graph Neural Networks : Abstract: Incorporating Machine Learning (ML) into material property prediction has become a crucial step in accelerating materials discovery. A key challenge is the severe lack of training data, as m...
- Communications-Incentivized Collaborative Reasoning in NetGPT through Agentic Reinforcement Learning : Abstract: The evolution of next-Generation (xG) wireless networks marks a paradigm shift from connectivity-centric architectures to Artificial Intelligence (AI)-native designs that tightly integrate d...
- Learning in Bayesian Stackelberg Games With Unknown Follower's Types : Abstract: We study online learning in Bayesian Stackelberg games, where a leader repeatedly interacts with a follower whose unknown private type is independently drawn at each round from an unknown pr...
- Zero-Flow Encoders : Abstract: Flow-based methods have achieved significant success in various generative modeling tasks, capturing nuanced details within complex data distributions. However, few existing works have explo...
- Hessian Spectral Analysis at Foundation Model Scale : Abstract: Accurate Hessian spectra of foundation models have remained out of reach, leading most prior work to rely on small models or strong structural approximations. We show that faithful spectral ...
- Safety-Efficacy Trade Off: Robustness against Data-Poisoning : Abstract: Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises...
- Harmful Overfitting in Sobolev Spaces : Abstract: Motivated by recent work on benign overfitting in overparameterized machine learning, we study the generalization behavior of functions in Sobolev spaces $W^{k, p}(\mathbb{R}^d)$ that perfec...
- Score-based Metropolis-Hastings for Fractional Langevin Algorithms : Abstract: Sampling from heavy-tailed and multimodal distributions is challenging when neither the target density nor the proposal density can be evaluated, as in $α$-stable Lévy-driven fractional Lang...
- Multivariate Time Series Data Imputation via Distributionally Robust Regularization : Abstract: Multivariate time series (MTS) imputation is often compromised by mismatch between observed and true data distributions -- a bias exacerbated by non-stationarity and systematic missingness. ...
- Sublinear Time Quantum Algorithm for Attention Approximation : Abstract: Given the query, key and value matrices $Q, K, V\in \mathbb{R}^{n\times d}$, the attention module is defined as $\mathrm{Att}(Q, K, V)=D^{-1}AV$ where $A=\exp(QK^\top/\sqrt{d})$ with $\exp(\...
- RoDiF: Robust Direct Fine-Tuning of Diffusion Policies with Corrupted Human Feedback : Abstract: Diffusion policies are a powerful paradigm for robotic control, but fine-tuning them with human preferences is fundamentally challenged by the multi-step structure of the denoising process. ...
- On the Convergence of Jacobian-Free Backpropagation for Optimal Control Problems with Implicit Hamiltonians : Abstract: Optimal feedback control with implicit Hamiltonians poses a fundamental challenge for learning-based value function methods due to the absence of closed-form optimal control laws. Recent wor...
- Improving Minimax Estimation Rates for Contaminated Mixture of Multinomial Logistic Experts via Expert Heterogeneity : Abstract: Contaminated mixture of experts (MoE) is motivated by transfer learning methods where a pre-trained model, acting as a frozen expert, is integrated with an adapter model, functioning as a tr...
- Hybrid Topological and Deep Feature Fusion for Accurate MRI-Based Alzheimer's Disease Severity Classification : Abstract: Early and accurate diagnosis of Alzheimer's disease (AD) remains a critical challenge in neuroimaging-based clinical decision support systems. In this work, we propose a novel hybrid deep le...
- Trust in One Round: Confidence Estimation for Large Language Models via Structural Signals : Abstract: Large language models (LLMs) are increasingly deployed in domains where errors carry high social, scientific, or safety costs. Yet standard confidence estimators, such as token likelihood, s...
- Optimal Decision-Making Based on Prediction Sets : Abstract: Prediction sets can wrap around any ML model to cover unknown test outcomes with a guaranteed probability. Yet, it remains unclear how to use them optimally for downstream decision-making. H...
- CortiNet: A Physics-Perception Hybrid Cortical-Inspired Dual-Stream Network for Gallbladder Disease Diagnosis from Ultrasound : Abstract: Ultrasound imaging is the primary diagnostic modality for detecting Gallbladder diseases due to its non-invasive nature, affordability, and wide accessibility. However, the low resolution an...
- The Stacked Autoencoder Evolution Hypothesis : Abstract: This study introduces a novel theoretical framework, the Stacked Autoencoder Evolution Hypothesis, which proposes that biological evolutionary systems operate through multi-layered self-enco...
- The Quantum Learning Menagerie (A survey on Quantum learning for Classical concepts) : Abstract: This paper surveys various results in the field of Quantum Learning theory, specifically focusing on learning quantum-encoded classical concepts in the Probably Approximately Correct (PAC) f...
- PDE-Constrained Optimization for Neural Image Segmentation with Physics Priors : Abstract: Segmentation of microscopy images constitutes an ill-posed inverse problem due to measurement noise, weak object boundaries, and limited labeled data. Although deep neural networks provide f...
- Long-range Modeling and Processing of Multimodal Event Sequences : Abstract: Temporal point processes (TPPs) have emerged as powerful tools for modeling asynchronous event sequences. While recent advances have extended TPPs to handle textual information, existing app...
- Robust Machine Learning Framework for Reliable Discovery of High-Performance Half-Heusler Thermoelectrics : Abstract: Machine learning (ML) can facilitate efficient thermoelectric (TE) material discovery essential to address the environmental crisis. However, ML models often suffer from poor experimental ge...
- Equivalence of Privacy and Stability with Generalization Guarantees in Quantum Learning : Abstract: We present a unified information-theoretic framework to analyze the generalization performance of differentially private (DP) quantum learning algorithms. By leveraging the connection betwee...
- Refining Context-Entangled Content Segmentation via Curriculum Selection and Anti-Curriculum Promotion : Abstract: Biological learning proceeds from easy to difficult tasks, gradually reinforcing perception and robustness. Inspired by this principle, we address Context-Entangled Content Segmentation (CEC...
- Attention Sink Forges Native MoE in Attention Layers: Sink-Aware Training to Address Head Collapse : Abstract: Large Language Models (LLMs) often assign disproportionate attention to the first token, a phenomenon known as the attention sink. Several recent approaches aim to address this issue, includ...
- CRAFT: Calibrated Reasoning with Answer-Faithful Traces via Reinforcement Learning for Multi-Hop Question Answering : Abstract: Retrieval-augmented generation (RAG) is widely used to ground Large Language Models (LLMs) for multi-hop question answering. Recent work mainly focused on improving answer accuracy via fine-...
- Robust Sublinear Convergence Rates for Iterative Bregman Projections : Abstract: Entropic regularization provides a simple way to approximate linear programs whose constraints split into two (or more) tractable blocks. The resulting objectives are amenable to cyclic Kull...
- WAKESET: A Large-Scale, High-Reynolds Number Flow Dataset for Machine Learning of Turbulent Wake Dynamics : Abstract: Machine learning (ML) offers transformative potential for computational fluid dynamics (CFD), promising to accelerate simulations, improve turbulence modelling, and enable real-time flow pre...
- PromptRL: Prompt Matters in RL for Flow-Based Image Generation : Abstract: Flow matching models (FMs) have revolutionized text-to-image (T2I) generation, with reinforcement learning (RL) serving as a critical post-training strategy for alignment with reward objecti...
- The Enhanced Physics-Informed Kolmogorov-Arnold Networks: Applications of Newton's Laws in Financial Deep Reinforcement Learning (RL) Algorithms : Abstract: Deep Reinforcement Learning (DRL), a subset of machine learning focused on sequential decision-making, has emerged as a powerful approach for tackling financial trading problems. In finance,...
- SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling : Abstract: This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative ...
- Nonlinear model reduction for transport-dominated problems : Abstract: This article surveys nonlinear model reduction methods that remain effective in regimes where linear reduced-space approximations are intrinsically inefficient, such as transport-dominated p...
- Online Social Welfare Function-based Resource Allocation : Abstract: In many real-world settings, a centralized decision-maker must repeatedly allocate finite resources to a population over multiple time steps. Individuals who receive a resource derive some s...
- Importance Weighted Variational Inference without the Reparameterization Trick : Abstract: Importance weighted variational inference (VI) approximates densities known up to a normalizing constant by optimizing bounds that tighten with the number of Monte Carlo samples $N$. Standar...
- Where to Attend: A Principled Vision-Centric Position Encoding with Parabolas : Abstract: We propose Parabolic Position Encoding (PaPE), a parabola-based position encoding for vision modalities in attention-based architectures. Given a set of vision tokens-such as images, point c...
- Robust Generalization with Adaptive Optimal Transport Priors for Decision-Focused Learning : Abstract: Few-shot learning requires models to generalize under limited supervision while remaining robust to distribution shifts. Existing Sinkhorn Distributionally Robust Optimization (DRO) methods ...
- Non-Uniform Noise-to-Signal Ratio in the REINFORCE Policy-Gradient Estimator : Abstract: Policy-gradient methods are widely used in reinforcement learning, yet training often becomes unstable or slows down as learning progresses. We study this phenomenon through the noise-to-sig...
- Rethinking Multinomial Logistic Mixture of Experts with Sigmoid Gating Function : Abstract: The sigmoid gate in mixture-of-experts (MoE) models has been empirically shown to outperform the softmax gate across several tasks, ranging from approximating feed-forward networks to langua...
- Density-Informed Pseudo-Counts for Calibrated Evidential Deep Learning : Abstract: Evidential Deep Learning (EDL) is a popular framework for uncertainty-aware classification that models predictive uncertainty via Dirichlet distributions parameterized by neural networks. De...
- Alternating Reinforcement Learning for Rubric-Based Reward Modeling in Non-Verifiable LLM Post-Training : Abstract: Standard reward models typically predict scalar scores that fail to capture the multifaceted nature of response quality in non-verifiable domains, such as creative writing or open-ended inst...
- RAPT: Model-Predictive Out-of-Distribution Detection and Failure Diagnosis for Sim-to-Real Humanoid Robots : Abstract: Deploying learned control policies on humanoid robots is challenging: policies that appear robust in simulation can execute confidently in out-of-distribution (OOD) states after Sim-to-Real ...
- Making Bias Non-Predictive: Training Robust LLM Judges via Reinforcement Learning : Abstract: Large language models (LLMs) increasingly serve as automated judges, yet they remain susceptible to cognitive biases -- often altering their reasoning when faced with spurious prompt-level c...
- Rotation-free Online Handwritten Character Recognition Using Linear Recurrent Units : Abstract: Online handwritten character recognition leverages stroke order and dynamic features, which generally provide higher accuracy and robustness compared with offline recognition. However, in pr...
- Genus-0 Surface Parameterization using Spherical Beltrami Differentials : Abstract: Spherical surface parameterization is a fundamental tool in geometry processing and imaging science. For a genus-0 closed surface, many efficient algorithms can map the surface to the sphere...
- Expected Harm: Rethinking Safety Evaluation of (Mis)Aligned LLMs : Abstract: Current evaluations of LLM safety predominantly rely on severity-based taxonomies to assess the harmfulness of malicious queries. We argue that this formulation requires re-examination as it...
- Inference-Aware Meta-Alignment of LLMs via Non-Linear GRPO : Abstract: Aligning large language models (LLMs) to diverse human preferences is fundamentally challenging since criteria can often conflict with each other. Inference-time alignment methods have recen...
- Minimax optimal differentially private synthetic data for smooth queries : Abstract: Differentially private synthetic data enables the sharing and analysis of sensitive datasets while providing rigorous privacy guarantees for individual contributors. A central challenge is t...
- Efficient Softmax Reformulation for Homomorphic Encryption via Moment Generating Function : Abstract: Homomorphic encryption (HE) is a prominent framework for privacy-preserving machine learning, enabling inference directly on encrypted data. However, evaluating softmax, a core component of ...
- Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models : Abstract: Large Reasoning Models (LRMs) have recently achieved strong mathematical and code reasoning performance through Reinforcement Learning (RL) post-training. However, we show that modern reason...
- Cross-Domain Fake News Detection on Unseen Domains via LLM-Based Domain-Aware User Modeling : Abstract: Cross-domain fake news detection (CD-FND) transfers knowledge from a source domain to a target domain and is crucial for real-world fake news mitigation. This task becomes particularly impor...
- ST-BCP: Tightening Coverage Bound for Backward Conformal Prediction via Non-Conformity Score Transformation : Abstract: Conformal Prediction (CP) provides a statistical framework for uncertainty quantification that constructs prediction sets with coverage guarantees. While CP yields uncontrolled prediction se...
- Physics-Informed Chebyshev Polynomial Neural Operator for Parametric Partial Differential Equations : Abstract: Neural operators have emerged as powerful deep learning frameworks for approximating solution operators of parameterized partial differential equations (PDE). However, current methods predom...
- Enhancing Automated Essay Scoring with Three Techniques: Two-Stage Fine-Tuning, Score Alignment, and Self-Training : Abstract: Automated Essay Scoring (AES) plays a crucial role in education by providing scalable and efficient assessment tools. However, in real-world settings, the extreme scarcity of labeled data se...
- Zero2Text: Zero-Training Cross-Domain Inversion Attacks on Textual Embeddings : Abstract: The proliferation of retrieval-augmented generation (RAG) has established vector databases as critical infrastructure, yet they introduce severe privacy risks via embedding inversion attacks...
- Cost-Aware Bayesian Optimization for Prototyping Interactive Devices : Abstract: Deciding which idea is worth prototyping is a central concern in iterative design. A prototype should be produced when the expected improvement is high and the cost is low. However, this is ...
- On Stability and Robustness of Diffusion Posterior Sampling for Bayesian Inverse Problems : Abstract: Diffusion models have recently emerged as powerful learned priors for Bayesian inverse problems (BIPs). Diffusion-based solvers rely on a presumed likelihood for the observations in BIPs to ...
- Dissecting Outlier Dynamics in LLM NVFP4 Pretraining : Abstract: Training large language models using 4-bit arithmetic enhances throughput and memory efficiency. Yet, the limited dynamic range of FP4 increases sensitivity to outliers. While NVFP4 mitigate...
- Learning to Route and Schedule LLMs from User Retrials via Contextual Queueing Bandits : Abstract: Explosive demands for LLMs often cause user queries to accumulate in server queues, requiring efficient routing (query-LLM matching) and scheduling (query prioritization) mechanisms. Several...
- BAPS: A Fine-Grained Low-Precision Scheme for Softmax in Attention via Block-Aware Precision reScaling : Abstract: As the performance gains from accelerating quantized matrix multiplication plateau, the softmax operation becomes the critical bottleneck in Transformer inference. This bottleneck stems from...
- Calibrating Adaptive Smoothing Methods for Freeway Traffic Reconstruction : Abstract: The adaptive smoothing method (ASM) is a widely used approach for traffic state reconstruction. This article presents a Python implementation of ASM, featuring end-to-end calibration using r...
- AICD Bench: A Challenging Benchmark for AI-Generated Code Detection : Abstract: Large language models (LLMs) are increasingly capable of generating functional source code, raising concerns about authorship, accountability, and security. While detecting AI-generated code...
- Learning Half-Spaces from Perturbed Contrastive Examples : Abstract: We study learning under a two-step contrastive example oracle, as introduced by Mansouri et. al. (2025), where each queried (or sampled) labeled example is paired with an additional contrast...
- Active learning from positive and unlabeled examples : Abstract: Learning from positive and unlabeled data (PU learning) is a weakly supervised variant of binary classification in which the learner receives labels only for (some) positively labeled instan...
- Efficient Swap Regret Minimization in Combinatorial Bandits : Abstract: This paper addresses the problem of designing efficient no-swap regret algorithms for combinatorial bandits, where the number of actions $N$ is exponentially large in the dimensionality of t...
- No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs : Abstract: This work stems from prior complementary observations on the dynamics of Chain-of-Thought (CoT): Large Language Models (LLMs) is shown latent planning of subsequent reasoning prior to CoT em...
- An Empirical Study of World Model Quantization : Abstract: World models learn an internal representation of environment dynamics, enabling agents to simulate and reason about future states within a compact latent space for tasks such as planning, pr...
- The Maximum von Neumann Entropy Principle: Theory and Applications in Machine Learning : Abstract: Von Neumann entropy (VNE) is a fundamental quantity in quantum information theory and has recently been adopted in machine learning as a spectral measure of diversity for kernel matrices and...
- Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization : Abstract: Adaptive Rounding has emerged as an alternative to round-to-nearest (RTN) for post-training quantization by enabling cross-element error cancellation. Yet, dense and element-wise rounding ma...
- Efficient Neural Controlled Differential Equations via Attentive Kernel Smoothing : Abstract: Neural Controlled Differential Equations (Neural CDEs) provide a powerful continuous-time framework for sequence modeling, yet the roughness of the driving control path often restricts their...
- Generating Causal Temporal Interaction Graphs for Counterfactual Validation of Temporal Link Prediction : Abstract: Temporal link prediction (TLP) models are commonly evaluated based on predictive accuracy, yet such evaluations do not assess whether these models capture the causal mechanisms that govern t...
- Interpretable Tabular Foundation Models via In-Context Kernel Regression : Abstract: Tabular foundation models like TabPFN and TabICL achieve state-of-the-art performance through in-context learning, yet their architectures remain fundamentally opaque. We introduce KernelICL...
- Co-RedTeam: Orchestrated Security Discovery and Exploitation with LLM Agents : Abstract: Large language models (LLMs) have shown promise in assisting cybersecurity tasks, yet existing approaches struggle with automatic vulnerability discovery and exploitation due to limited inte...
- Generalized Optimal Classification Trees: A Mixed-Integer Programming Approach : Abstract: Global optimization of decision trees is a long-standing challenge in combinatorial optimization, yet such models play an important role in interpretable machine learning. Although the probl...
- STILL: Selecting Tokens for Intra-Layer Hybrid Attention to Linearize LLMs : Abstract: Linearizing pretrained large language models (LLMs) primarily relies on intra-layer hybrid attention mechanisms to alleviate the quadratic complexity of standard softmax attention. Existing ...
- ECHO-2: A Large Scale Distributed Rollout Framework for Cost-efficient Reinforcement Learning : Abstract: Reinforcement learning (RL) is a critical stage in post-training large language models (LLMs), involving repeated interaction between rollout generation, reward evaluation, and centralized l...
- Fat-Cat: Document-Driven Metacognitive Multi-Agent System for Complex Reasoning : Abstract: The effectiveness of LLM-based agents is often limited not by model capacity alone, but by how efficiently contextual information is utilized at runtime. Existing agent frameworks rely on ri...
- Scientific Theory of a Black-Box: A Life Cycle-Scale XAI Framework Based on Constructive Empiricism : Abstract: Explainable AI (XAI) offers a growing number of algorithms that aim to answer specific questions about black-box models. What is missing is a principled way to consolidate explanatory inform...
- Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts : Abstract: We study the problem of monitoring model performance in dynamic environments where labeled data are limited. To this end, we propose prediction-powered risk monitoring (PPRM), a semi-supervi...
- Interpretability in Deep Time Series Models Demands Semantic Alignment : Abstract: Deep time series models continue to improve predictive performance, yet their deployment remains limited by their black-box nature. In response, existing interpretability approaches in the f...
- Variational Entropic Optimal Transport : Abstract: Entropic optimal transport (EOT) in continuous spaces with quadratic cost is a classical tool for solving the domain translation problem. In practice, recent approaches optimize a weak dual ...
- Learning While Staying Curious: Entropy-Preserving Supervised Fine-Tuning via Adaptive Self-Distillation for Large Reasoning Models : Abstract: The standard post-training recipe for large reasoning models, supervised fine-tuning followed by reinforcement learning (SFT-then-RL), may limit the benefits of the RL stage: while SFT imita...
- Alignment-Aware Model Adaptation via Feedback-Guided Optimization : Abstract: Fine-tuning is the primary mechanism for adapting foundation models to downstream tasks; however, standard approaches largely optimize task objectives in isolation and do not account for sec...
- Segment to Focus: Guiding Latent Action Models in the Presence of Distractors : Abstract: Latent Action Models (LAMs) learn to extract action-relevant representations solely from raw observations, enabling reinforcement learning from unlabelled videos and significantly scaling av...
- Learning Markov Decision Processes under Fully Bandit Feedback : Abstract: A standard assumption in Reinforcement Learning is that the agent observes every visited state-action pair in the associated Markov Decision Process (MDP), along with the per-step rewards. S...
- Unlocking the Duality between Flow and Field Matching : Abstract: Conditional Flow Matching (CFM) unifies conventional generative paradigms such as diffusion models and flow matching. Interaction Field Matching (IFM) is a newer framework that generalizes E...
- HopFormer: Sparse Graph Transformers with Explicit Receptive Field Control : Abstract: Graph Transformers typically rely on explicit positional or structural encodings and dense global attention to incorporate graph topology. In this work, we show that neither is essential. We...
- MoLF: Mixture-of-Latent-Flow for Pan-Cancer Spatial Gene Expression Prediction from Histology : Abstract: Inferring spatial transcriptomics (ST) from histology enables scalable histogenomic profiling, yet current methods are largely restricted to single-tissue models. This fragmentation fails to...
- Choice-Model-Assisted Q-learning for Delayed-Feedback Revenue Management : Abstract: We study reinforcement learning for revenue management with delayed feedback, where a substantial fraction of value is determined by customer cancellations and modifications observed days af...
- Statistical Learning Theory in Lean 4: Empirical Processes from Scratch : Abstract: We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our end-to-end formal infrastructure implement the missing ...
- EvalQReason: A Framework for Step-Level Reasoning Evaluation in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in critical applications requiring reliable reasoning, yet their internal reasoning processes remain difficult to evaluate systematical...
- C-kNN-LSH: A Nearest-Neighbor Algorithm for Sequential Counterfactual Inference : Abstract: Estimating causal effects from longitudinal trajectories is central to understanding the progression of complex conditions and optimizing clinical decision-making, such as comorbidities and ...
- Self-Supervised Learning from Structural Invariance : Abstract: Joint-embedding self-supervised learning (SSL), the key paradigm for unsupervised representation learning from visual data, learns from invariances between semantically-related data pairs. W...
- SLIME: Stabilized Likelihood Implicit Margin Enforcement for Preference Optimization : Abstract: Direct preference optimization methods have emerged as a computationally efficient alternative to Reinforcement Learning from Human Feedback (RLHF) for aligning Large Language Models (LLMs)....
- Transformers learn factored representations : Abstract: Transformers pretrained via next token prediction learn to factor their world into parts, representing these factors in orthogonal subspaces of the residual stream. We formalize two represen...
- An Empirical Study on Noisy Data and LLM Pretraining Loss Divergence : Abstract: Large-scale pretraining datasets drive the success of large language models (LLMs). However, these web-scale corpora inevitably contain large amounts of noisy data due to unregulated web con...
- Active Transfer Bagging: A New Approach for Accelerated Active Learning Acquisition of Data by Combined Transfer Learning and Bagging Based Models : Abstract: Modern machine learning has achieved remarkable success on many problems, but this success often depends on the existence of large, labeled datasets. While active learning can dramatically r...
- Trust Region Continual Learning as an Implicit Meta-Learner : Abstract: Continual learning aims to acquire tasks sequentially without catastrophic forgetting, yet standard strategies face a core tradeoff: regularization-based methods (e.g., EWC) can overconstrai...
- Repurposing Protein Language Models for Latent Flow-Based Fitness Optimization : Abstract: Protein fitness optimization is challenged by a vast combinatorial landscape where high-fitness variants are extremely sparse. Many current methods either underperform or require computation...
- Embedding Perturbation may Better Reflect the Uncertainty in LLM Reasoning : Abstract: Large language Models (LLMs) have achieved significant breakthroughs across diverse domains; however, they can still produce unreliable or misleading outputs. For responsible LLM application...
- Maximizing Reliability with Bayesian Optimization : Abstract: Bayesian optimization (BO) is a popular, sample-efficient technique for expensive, black-box optimization. One such problem arising in manufacturing is that of maximizing the reliability, or...
- Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE : Abstract: Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-...
- Finite-Sample Wasserstein Error Bounds and Concentration Inequalities for Nonlinear Stochastic Approximation : Abstract: This paper derives non-asymptotic error bounds for nonlinear stochastic approximation algorithms in the Wasserstein-$p$ distance. To obtain explicit finite-sample guarantees for the last ite...
- Conflict-Aware Client Selection for Multi-Server Federated Learning : Abstract: Federated learning (FL) has emerged as a promising distributed machine learning (ML) that enables collaborative model training across clients without exposing raw data, thereby preserving us...
- SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning : Abstract: Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains ...
- Expanding the Capabilities of Reinforcement Learning via Text Feedback : Abstract: The success of RL for LLM post-training stems from an unreasonably uninformative source: a single bit of information per rollout as binary reward or preference label. At the other extreme, d...
- RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System : Abstract: We propose RLAnything, a reinforcement learning framework that dynamically forges environment, policy, and reward models through closed-loop optimization, amplifying learning signals and str...
- MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training : Abstract: Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statis...
- C$^2$-Cite: Contextual-Aware Citation Generation for Attributed Large Language Models : Abstract: The attribution technique enhances the credibility of LLMs by adding citations to the generated sentences, enabling users to trace back to the original sources and verify the reliability of ...
- Linear-PAL: A Lightweight Ranker for Mitigating Shortcut Learning in Personalized, High-Bias Tabular Ranking : Abstract: In e-commerce ranking, implicit user feedback is systematically confounded by Position Bias -- the strong propensity of users to interact with top-ranked items regardless of relevance. While...
- Asynchronous MultiAgent Reinforcement Learning for 5G Routing under Side Constraints : Abstract: Networks in the current 5G and beyond systems increasingly carry heterogeneous traffic with diverse quality-of-service constraints, making real-time routing decisions both complex and time-c...
- Exploring the Interpretability of Forecasting Models for Energy Balancing Market : Abstract: The balancing market in the energy sector plays a critical role in physically and financially balancing the supply and demand. Modeling dynamics in the balancing market can provide valuable ...
- Comparison of Multiple Classifiers for Android Malware Detection with Emphasis on Feature Insights Using CICMalDroid 2020 Dataset : Abstract: Accurate Android malware detection was critical for protecting users at scale. Signature scanners lagged behind fast release cycles on public app stores. We aimed to build a trustworthy dete...
- IntentCoding: Amplifying User Intent in Code Generation : Abstract: Large Language Models (LLMs) have shown strong capabilities in code generation, but their adherence to fine-grained user intent with multiple constraints remains a significant challenge. Our...
- On finite-dimensional encoding/decoding theorems for neural operators : Abstract: Recently, versions of neural networks with infinite-dimensional affine operators inside the computational units (``neural operator'' networks) have been applied to learn solutions to differe...
- FoundationalASSIST: An Educational Dataset for Foundational Knowledge Tracing and Pedagogical Grounding of LLMs : Abstract: Can Large Language Models understand how students learn? As LLMs are deployed for adaptive testing and personalized tutoring, this question becomes urgent -- yet we cannot answer it with exi...
- Test-Time Adaptation for Non-stationary Time Series: From Synthetic Regime Shifts to Financial Markets : Abstract: Time series encountered in practice are rarely stationary. When the data distribution changes, a forecasting model trained on past observations can lose accuracy. We study a small-footprint ...
- Repair Brain Damage: Real-Numbered Error Correction Code for Neural Network : Abstract: We consider a neural network (NN) that may experience memory faults and computational errors. In this paper, we propose a novel real-number-based error correction code (ECC) capable of detec...
- The GT-Score: A Robust Objective Function for Reducing Overfitting in Data-Driven Trading Strategies : Abstract: Overfitting remains a critical challenge in data-driven financial modeling, where machine learning (ML) systems learn spurious patterns in historical prices and fail out of sample and in dep...
- Observing Health Outcomes Using Remote Sensing Imagery and Geo-Context Guided Visual Transformer : Abstract: Visual transformers have driven major progress in remote sensing image analysis, particularly in object detection and segmentation. Recent vision-language and multimodal models further exten...
- Event Driven Clustering Algorithm : Abstract: This paper introduces a novel asynchronous, event-driven algorithm for real-time detection of small event clusters in event camera data. Like other hierarchical agglomerative clustering algo...
- Comparison of Image Processing Models in Quark Gluon Jet Classification : Abstract: We present a comprehensive comparison of convolutional and transformer-based models for distinguishing quark and gluon jets using simulated jet images from Pythia 8. By encoding jet substruc...
- Early warning prediction: Onsager-Machlup vs Schr\"{o}dinger : Abstract: Predicting critical transitions in complex systems, such as epileptic seizures in the brain, represents a major challenge in scientific research. The high-dimensional characteristics and hid...
- Uncertainty-Aware Multimodal Learning via Conformal Shapley Intervals : Abstract: Multimodal learning combines information from multiple data modalities to improve predictive performance. However, modalities often contribute unequally and in a data dependent way, making i...
- Neuron Block Dynamics for XOR Classification with Zero-Margin : Abstract: The ability of neural networks to learn useful features through stochastic gradient descent (SGD) is a cornerstone of their success. Most theoretical analyses focus on regression or on class...
- RPP: A Certified Poisoned-Sample Detection Framework for Backdoor Attacks under Dataset Imbalance : Abstract: Deep neural networks are highly susceptible to backdoor attacks, yet most defense methods to date rely on balanced data, overlooking the pervasive class imbalance in real-world scenarios tha...
- CAPA: Contribution-Aware Pruning and FFN Approximation for Efficient Large Vision-Language Models : Abstract: Efficient inference in Large Vision-Language Models is constrained by the high cost of processing thousands of visual tokens, yet it remains unclear which tokens and computations can be safe...
- Benchmarking Uncertainty Calibration in Large Language Model Long-Form Question Answering : Abstract: Large Language Models (LLMs) are commonly used in Question Answering (QA) settings, increasingly in the natural sciences if not science at large. Reliable Uncertainty Quantification (UQ) is ...
- Parametrization of subgrid scales in long-term simulations of the shallow-water equations using machine learning and convex limiting : Abstract: We present a method for parametrizing sub-grid processes in the Shallow Water equations. We define coarse variables and local spatial averages and use a feed-forward neural network to learn ...
- Modeling Image-Caption Rating from Comparative Judgments : Abstract: Rating the accuracy of captions in describing images is time-consuming and subjective for humans. In contrast, it is often easier for people to compare two captions and decide which one bett...
- Singular Bayesian Neural Networks : Abstract: Bayesian neural networks promise calibrated uncertainty but require $O(mn)$ parameters for standard mean-field Gaussian posteriors. We argue this cost is often unnecessary, particularly when...
- WinFLoRA: Incentivizing Client-Adaptive Aggregation in Federated LoRA under Privacy Heterogeneity : Abstract: Large Language Models (LLMs) increasingly underpin intelligent web applications, from chatbots to search and recommendation, where efficient specialization is essential. Low-Rank Adaptation ...
- Tangent Space Fine-Tuning for Directional Preference Alignment in Large Language Models : Abstract: Our goal is to enable large language models (LLMs) to balance multiple human preference dimensions; such as helpfulness, safety, and verbosity, through principled and controllable alignment....
- TRACE: Scalable Amortized Causal Discovery from Single Sequences via Autoregressive Density Estimation : Abstract: We study causal discovery from a single observed sequence of discrete events generated by a stochastic process, as encountered in vehicle logs, manufacturing systems, or patient trajectories...
- A Unified Matrix-Spectral Framework for Stability and Interpretability in Deep Learning : Abstract: We develop a unified matrix-spectral framework for analyzing stability and interpretability in deep neural networks. Representing networks as data-dependent products of linear operators reve...
- Self-Generative Adversarial Fine-Tuning for Large Language Models : Abstract: Fine-tuning large language models (LLMs) for alignment typically relies on supervised fine-tuning or reinforcement learning from human feedback, both limited by the cost and scarcity of high...
- Key Principles of Graph Machine Learning: Representation, Robustness, and Generalization : Abstract: Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations from structured data. Despite their growing popularity and success across various applications, GNNs ...
- Generalized Radius and Integrated Codebook Transforms for Differentiable Vector Quantization : Abstract: Vector quantization (VQ) underpins modern generative and representation models by turning continuous latents into discrete tokens. Yet hard nearest-neighbor assignments are non-differentiabl...
- PolicyFlow: Policy Optimization with Continuous Normalizing Flow in Reinforcement Learning : Abstract: Among on-policy reinforcement learning algorithms, Proximal Policy Optimization (PPO) demonstrates is widely favored for its simplicity, numerical stability, and strong empirical performance...
- Multi-Fidelity Physics-Informed Neural Networks with Bayesian Uncertainty Quantification and Adaptive Residual Learning for Efficient Solution of Parametric Partial Differential Equations : Abstract: Physics-informed neural networks (PINNs) have emerged as a powerful paradigm for solving partial differential equations (PDEs) by embedding physical laws directly into neural network trainin...
- Rethinking the Flow-Based Gradual Domain Adaption: A Semi-Dual Optimal Transport Perspective : Abstract: Gradual domain adaptation (GDA) aims to mitigate domain shift by progressively adapting models from the source domain to the target domain via intermediate domains. However, real intermediat...
- Analyzing and Improving Diffusion Models for Time-Series Data Imputation: A Proximal Recursion Perspective : Abstract: Diffusion models (DMs) have shown promise for Time-Series Data Imputation (TSDI); however, their performance remains inconsistent in complex scenarios. We attribute this to two primary obsta...
- Unraveling the Hidden Dynamical Structure in Recurrent Neural Policies : Abstract: Recurrent neural policies are widely used in partially observable control and meta-RL tasks. Their abilities to maintain internal memory and adapt quickly to unseen scenarios have offered th...
- SimpleGPT: Improving GPT via A Simple Normalization Strategy : Abstract: In this work, we revisit Transformer optimization through the lens of second-order geometry and establish a direct connection between architectural design, activation scale, the Hessian matr...
- MiTA Attention: Efficient Fast-Weight Scaling via a Mixture of Top-$k$ Activations : Abstract: The attention operator in Transformers can be viewed as a two-layer fast-weight MLP, whose weights are dynamically instantiated from input tokens and whose width equals sequence length $N$. ...
- BicKD: Bilateral Contrastive Knowledge Distillation : Abstract: Knowledge distillation (KD) is a machine learning framework that transfers knowledge from a teacher model to a student model. The vanilla KD proposed by Hinton et al. has been the dominant a...
- Diving into Kronecker Adapters: Component Design Matters : Abstract: Kronecker adapters have emerged as a promising approach for fine-tuning large-scale models, enabling high-rank updates through tunable component structures. However, existing work largely tr...
- Mixture-of-World Models: Scaling Multi-Task Reinforcement Learning with Modular Latent Dynamics : Abstract: A fundamental challenge in multi-task reinforcement learning (MTRL) is achieving sample efficiency in visual domains where tasks exhibit substantial heterogeneity in both observations and dy...
- From Intents to Actions: Agentic AI in Autonomous Networks : Abstract: Telecommunication networks are increasingly expected to operate autonomously while supporting heterogeneous services with diverse and often conflicting intents -- that is, performance object...
- Richer Bayesian Last Layers with Subsampled NTK Features : Abstract: Bayesian Last Layers (BLLs) provide a convenient and computationally efficient way to estimate uncertainty in neural networks. However, they underestimate epistemic uncertainty because they ...
- EDIS: Diagnosing LLM Reasoning via Entropy Dynamics : Abstract: Entropy-based confidence signals are increasingly leveraged to improve reasoning in large language models (LLMs), yet existing approaches treat confidence as a static quantity -- typically a...
- Gradient-Aligned Calibration for Post-Training Quantization of Diffusion Models : Abstract: Diffusion models have shown remarkable performance in image synthesis by progressively estimating a smooth transition from a Gaussian distribution of noise to a real image. Unfortunately, th...
- The BoBW Algorithms for Heavy-Tailed MDPs : Abstract: We investigate episodic Markov Decision Processes with heavy-tailed feedback (HTMDPs). Existing approaches for HTMDPs are conservative in stochastic environments and lack adaptivity in adver...
- Imperfect Influence, Preserved Rankings: A Theory of TRAK for Data Attribution : Abstract: Data attribution, tracing a model's prediction back to specific training data, is an important tool for interpreting sophisticated AI models. The widely used TRAK algorithm addresses this ch...
- PolySAE: Modeling Feature Interactions in Sparse Autoencoders via Polynomial Decoding : Abstract: Sparse autoencoders (SAEs) have emerged as a promising method for interpreting neural network representations by decomposing activations into sparse combinations of dictionary atoms. However...
- High-accuracy sampling for diffusion models and log-concave distributions : Abstract: We present algorithms for diffusion model sampling which obtain $δ$-error in $\mathrm{polylog}(1/δ)$ steps, given access to $\widetilde O(δ)$-accurate score estimates in $L^2$. This is an ex...
- Finding Differentially Private Second Order Stationary Points in Stochastic Minimax Optimization : Abstract: We provide the first study of the problem of finding differentially private (DP) second-order stationary points (SOSP) in stochastic (non-convex) minimax optimization. Existing literature ei...
- Your Self-Play Algorithm is Secretly an Adversarial Imitator: Understanding LLM Self-Play through the Lens of Imitation Learning : Abstract: Self-play post-training methods has emerged as an effective approach for finetuning large language models and turn the weak language model into strong language model without preference data....
- SNIP: An Adaptive Mixed Precision Framework for Subbyte Large Language Model Training : Abstract: Training large language models (LLMs) efficiently while preserving model quality poses significant challenges, particularly with subbyte precision supported by state-of-the-art GPUs. Current...
- Improve the Trade-off Between Watermark Strength and Speculative Sampling Efficiency for Language Models : Abstract: Watermarking is a principled approach for tracing the provenance of large language model (LLM) outputs, but its deployment in practice is hindered by inference inefficiency. Speculative samp...
- Phase Transitions for Feature Learning in Neural Networks : Abstract: According to a popular viewpoint, neural networks learn from data by first identifying low-dimensional representations, and subsequently fitting the best model in this space. Recent works pr...
- Theoretical Analysis of Measure Consistency Regularization for Partially Observed Data : Abstract: The problem of corrupted data, missing features, or missing modalities continues to plague the modern machine learning landscape. To address this issue, a class of regularization methods tha...
- A Meta-Knowledge-Augmented LLM Framework for Hyperparameter Optimization in Time-Series Forecasting : Abstract: Hyperparameter optimization (HPO) plays a central role in the performance of deep learning models, yet remains computationally expensive and difficult to interpret, particularly for time-ser...
- Provable Cooperative Multi-Agent Exploration for Reward-Free MDPs : Abstract: We study cooperative multi-agent reinforcement learning in the setting of reward-free exploration, where multiple agents jointly explore an unknown MDP in order to learn its dynamics (withou...
- Modeling Topological Impact on Node Attribute Distributions in Attributed Graphs : Abstract: We investigate how the topology of attributed graphs influences the distribution of node attributes. This work offers a novel perspective by treating topology and attributes as structurally ...
- Rectified LpJEPA: Joint-Embedding Predictive Architectures with Sparse and Maximum-Entropy Representations : Abstract: Joint-Embedding Predictive Architectures (JEPA) learn view-invariant representations and admit projection-based distribution matching for collapse prevention. Existing approaches regularize ...
- A Statistical Theory of Gated Attention through the Lens of Hierarchical Mixture of Experts : Abstract: Self-attention has greatly contributed to the success of the widely used Transformer architecture by enabling learning from data with long-range dependencies. In an effort to improve perform...
- Predicting and improving test-time scaling laws via reward tail-guided search : Abstract: Test-time scaling has emerged as a critical avenue for enhancing the reasoning capabilities of Large Language Models (LLMs). Though the straight-forward ''best-of-$N$'' (BoN) strategy has al...
- Multi-Scale Wavelet Transformers for Operator Learning of Dynamical Systems : Abstract: Recent years have seen a surge in data-driven surrogates for dynamical systems that can be orders of magnitude faster than numerical solvers. However, many machine learning-based models such...
- Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum : Abstract: We establish an optimal sample complexity of $O(ε^{-2})$ for obtaining an $ε$-optimal global policy using a single-timescale actor-critic (AC) algorithm in infinite-horizon discounted Markov...
- Enhancing Generalization in Evolutionary Feature Construction for Symbolic Regression through Vicinal Jensen Gap Minimization : Abstract: Genetic programming-based feature construction has achieved significant success in recent years as an automated machine learning technique to enhance learning performance. However, overfitti...
- When Is Rank-1 Enough? Geometry-Guided Initialization for Parameter-Efficient Fine-Tuning : Abstract: Parameter-efficient fine-tuning (PEFT) is a standard way to adapt multimodal large language models, yet extremely low-rank settings -- especially rank-1 LoRA -- are often unstable. We show t...
- The Inlet Rank Collapse in Implicit Neural Representations: Diagnosis and Unified Remedy : Abstract: Implicit Neural Representations (INRs) have revolutionized continuous signal modeling, yet they struggle to recover fine-grained details within finite training budgets. While empirical techn...
- How Implicit Bias Accumulates and Propagates in LLM Long-term Memory : Abstract: Long-term memory mechanisms enable Large Language Models (LLMs) to maintain continuity and personalization across extended interaction lifecycles, but they also introduce new and underexplor...
- Local Exponential Stability of Mean-Field Langevin Descent-Ascent in Wasserstein Space : Abstract: We study the mean-field Langevin descent-ascent (MFL-DA), a coupled optimization dynamics on the space of probability measures for entropically regularized two-player zero-sum games. Althoug...
- Nearly Optimal Active Preference Learning and Its Application to LLM Alignment : Abstract: Aligning large language models (LLMs) depends on high-quality datasets of human preference labels, which are costly to collect. Although active learning has been studied to improve sample ef...
- A Lightweight Sparse Interaction Network for Time Series Forecasting : Abstract: Recent work shows that linear models can outperform several transformer models in long-term time-series forecasting (TSF). However, instead of explicitly performing temporal interaction thro...
- Universal Redundancies in Time Series Foundation Models : Abstract: Time Series Foundation Models (TSFMs) leverage extensive pretraining to accurately predict unseen time series during inference, without the need for task-specific fine-tuning. Through large-...
- What Do Agents Learn from Trajectory-SFT: Semantics or Interfaces? : Abstract: Large language models are increasingly evaluated as interactive agents, yet standard agent benchmarks conflate two qualitatively distinct sources of success: semantic tool-use and interface-...
- AdaptNC: Adaptive Nonconformity Scores for Uncertainty-Aware Autonomous Systems in Dynamic Environments : Abstract: Rigorous uncertainty quantification is essential for the safe deployment of autonomous systems in unconstrained environments. Conformal Prediction (CP) provides a distribution-free framework...
- COMET: Codebook-based Online-adaptive Multi-scale Embedding for Time-series Anomaly Detection : Abstract: Time series anomaly detection is a critical task across various industrial domains. However, capturing temporal dependencies and multivariate correlations within patch-level representation l...
- Chance-Constrained Inference for Hallucination Risk Control in Large Language Models : Abstract: Large language models generate outputs stochastically and may produce fluent but invalid responses, including factual hallucinations. Existing mitigation strategies reduce average error rate...
- Quantifying Epistemic Predictive Uncertainty in Conformal Prediction : Abstract: We study the problem of quantifying epistemic predictive uncertainty (EPU) -- that is, uncertainty faced at prediction time due to the existence of multiple plausible predictive models -- wi...
- Finite and Corruption-Robust Regret Bounds in Online Inverse Linear Optimization under M-Convex Action Sets : Abstract: We study online inverse linear optimization, also known as contextual recommendation, where a learner sequentially infers an agent's hidden objective vector from observed optimal actions ove...
- $\textbf{AGT$^{AO}$}$: Robust and Stabilized LLM Unlearning via Adversarial Gating Training with Adaptive Orthogonality : Abstract: While Large Language Models (LLMs) have achieved remarkable capabilities, they unintentionally memorize sensitive data, posing critical privacy and security risks. Machine unlearning is pivo...
- Revisiting Generalization Measures Beyond IID: An Empirical Study under Distributional Shift : Abstract: Generalization remains a central yet unresolved challenge in deep learning, particularly the ability to predict a model's performance beyond its training distribution using quantities availa...
- MSign: An Optimizer Preventing Training Instability in Large Language Models via Stable Rank Restoration : Abstract: Training instability remains a critical challenge in large language model (LLM) pretraining, often manifesting as sudden gradient explosions that waste significant computational resources. W...
- Position: The Inevitable End of One-Architecture-Fits-All-Domains in Time Series Forecasting : Abstract: Recent work has questioned the effectiveness and robustness of neural network architectures for time series forecasting tasks. We summarize these concerns and analyze groundly their inherent...
- MGKAN: Predicting Asymmetric Drug-Drug Interactions via a Multimodal Graph Kolmogorov-Arnold Network : Abstract: Predicting drug-drug interactions (DDIs) is essential for safe pharmacological treatments. Previous graph neural network (GNN) models leverage molecular structures and interaction networks b...
- Position: Beyond Model-Centric Prediction -- Agentic Time Series Forecasting : Abstract: Time series forecasting has traditionally been formulated as a model-centric, static, and single-pass prediction problem that maps historical observations to future values. While this paradi...
- Grad2Reward: From Sparse Judgment to Dense Rewards for Improving Open-Ended LLM Reasoning : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has catalyzed significant breakthroughs in complex LLM reasoning within verifiable domains, such as mathematics and programming. Recent ...
- Hyperbolic Graph Neural Networks Under the Microscope: The Role of Geometry-Task Alignment : Abstract: Many complex networks exhibit hyperbolic structural properties, making hyperbolic space a natural candidate for representing hierarchical and tree-like graphs with low distortion. Based on t...
- Prism: Efficient Test-Time Scaling via Hierarchical Search and Self-Verification for Discrete Diffusion Language Models : Abstract: Inference-time compute has re-emerged as a practical way to improve LLM reasoning. Most test-time scaling (TTS) algorithms rely on autoregressive decoding, which is ill-suited to discrete di...
- No Generation without Representation: Efficient Causal Protein Language Models Enable Zero-Shot Fitness Estimation : Abstract: Protein language models (PLMs) face a fundamental divide: masked language models (MLMs) excel at fitness prediction while causal models enable generation, forcing practitioners to maintain s...
- Self-Rewarding Sequential Monte Carlo for Masked Diffusion Language Models : Abstract: This work presents self-rewarding sequential Monte Carlo (SMC), an inference-time scaling algorithm enabling effective sampling of masked diffusion language models (MDLMs). Our algorithm ste...
- FUPareto: Bridging the Forgetting-Utility Gap in Federated Unlearning via Pareto Augmented Optimization : Abstract: Federated Unlearning (FU) aims to efficiently remove the influence of specific client data from a federated model while preserving utility for the remaining clients. However, three key chall...
- Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning : Abstract: A/B testing has become a gold standard for modern technological companies to conduct policy evaluation. Yet, its application to time series experiments, where policies are sequentially assig...
- Autocorrelated Optimize-via-Estimate: Predict-then-Optimize versus Finite-sample Optimal : Abstract: Models that directly optimize for out-of-sample performance in the finite-sample regime have emerged as a promising alternative to traditional estimate-then-optimize approaches in data-drive...
- Internal Flow Signatures for Self-Checking and Refinement in LLMs : Abstract: Large language models can generate fluent answers that are unfaithful to the provided context, while many safeguards rely on external verification or a separate judge after generation. We in...
- Observation-dependent Bayesian active learning via input-warped Gaussian processes : Abstract: Bayesian active learning relies on the precise quantification of predictive uncertainty to explore unknown function landscapes. While Gaussian process surrogates are the standard for such ta...
- Data- and Variance-dependent Regret Bounds for Online Tabular MDPs : Abstract: This work studies online episodic tabular Markov decision processes (MDPs) with known transitions and develops best-of-both-worlds algorithms that achieve refined data-dependent regret bound...
- Towards Long-Horizon Interpretability: Efficient and Faithful Multi-Token Attribution for Reasoning LLMs : Abstract: Token attribution methods provide intuitive explanations for language model outputs by identifying causally important input tokens. However, as modern LLMs increasingly rely on extended reas...
- Embedding Learning on Multiplex Networks for Link Prediction : Abstract: Over the past years, embedding learning on networks has shown tremendous results in link prediction tasks for complex systems, with a wide range of real-life applications. Learning a represe...
- Bayesian Integration of Nonlinear Incomplete Clinical Data : Abstract: Multimodal clinical data are characterized by high dimensionality, heterogeneous representations, and structured missingness, posing significant challenges for predictive modeling, data inte...
- Boundary-Constrained Diffusion Models for Floorplan Generation: Balancing Realism and Diversity : Abstract: Diffusion models have become widely popular for automated floorplan generation, producing highly realistic layouts conditioned on user-defined constraints. However, optimizing for perceptual...
- Deep Multivariate Models with Parametric Conditionals : Abstract: We consider deep multivariate models for heterogeneous collections of random variables. In the context of computer vision, such collections may e.g. consist of images, segmentations, image a...
- Grounding Generated Videos in Feasible Plans via World Models : Abstract: Large-scale video generative models have shown emerging capabilities as zero-shot visual planners, yet video-generated plans often violate temporal consistency and physical constraints, lead...
- Self-Consolidation for Self-Evolving Agents : Abstract: While large language model (LLM) agents have demonstrated impressive problem-solving capabilities, they typically operate as static systems, lacking the ability to evolve through lifelong in...
- Logic-Guided Vector Fields for Constrained Generative Modeling : Abstract: Neuro-symbolic systems aim to combine the expressive structure of symbolic logic with the flexibility of neural learning; yet, generative models typically lack mechanisms to enforce declarat...
- SNAP: A Self-Consistent Agreement Principle with Application to Robust Computation : Abstract: We introduce SNAP (Self-coNsistent Agreement Principle), a self-supervised framework for robust computation based on mutual agreement. Based on an Agreement-Reliability Hypothesis SNAP assig...
- Robust Domain Generalization under Divergent Marginal and Conditional Distributions : Abstract: Domain generalization (DG) aims to learn predictive models that can generalize to unseen domains. Most existing DG approaches focus on learning domain-invariant representations under the ass...
- DASH: Faster Shampoo via Batched Block Preconditioning and Efficient Inverse-Root Solvers : Abstract: Shampoo is one of the leading approximate second-order optimizers: a variant of it has won the MLCommons AlgoPerf competition, and it has been shown to produce models with lower activation o...
- MATRIX: A Multimodal Benchmark and Post-Training Framework for Materials Science : Abstract: Scientific reasoning in materials science requires integrating multimodal experimental evidence with underlying physical theory. Existing benchmarks make it difficult to assess whether incor...
- RePaint-Enhanced Conditional Diffusion Model for Parametric Engineering Designs under Performance and Parameter Constraints : Abstract: This paper presents a RePaint-enhanced framework that integrates a pre-trained performance-guided denoising diffusion probabilistic model (DDPM) for performance- and parameter-constraint eng...
- A Fragile Guardrail: Diffusion LLM's Safety Blessing and Its Failure Mode : Abstract: Diffusion large language models (D-LLMs) offer an alternative to autoregressive LLMs (AR-LLMs) and have demonstrated advantages in generation efficiency. Beyond the utility benefits, we argu...
- Localized, High-resolution Geographic Representations with Slepian Functions : Abstract: Geographic data is fundamentally local. Disease outbreaks cluster in population centers, ecological patterns emerge along coastlines, and economic activity concentrates within country border...
- MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers : Abstract: Understanding how transformer components operate in LLMs is important, as it is at the core of recent technological advances in artificial intelligence. In this work, we revisit the challeng...
- DROGO: Default Representation Objective via Graph Optimization in Reinforcement Learning : Abstract: In computational reinforcement learning, the default representation (DR) and its principal eigenvector have been shown to be effective for a wide variety of applications, including reward sh...
- Fed-Listing: Federated Label Distribution Inference in Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have been intensively studied for their expressive representation and learning performance on graph-structured data, enabling effective modeling of complex relat...
- Federated-inspired Single-cell Batch Integration in Latent Space : Abstract: Advances in single-cell RNA sequencing enable the rapid generation of massive, high-dimensional datasets, yet the accumulation of data across experiments introduces batch effects that obscur...
- Open Materials Generation with Inference-Time Reinforcement Learning : Abstract: Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties int...
- Towards Building Non-Fine-Tunable Foundation Models : Abstract: Open-sourcing foundation models (FMs) enables broad reuse but also exposes model trainers to economic and safety risks from unrestricted downstream fine-tuning. We address this problem by bu...
- Stabilizing Decentralized Federated Fine-Tuning via Topology-Aware Alternating LoRA : Abstract: Decentralized federated learning (DFL), a serverless variant of federated learning, poses unique challenges for parameter-efficient fine-tuning due to the factorized structure of low-rank ad...
- FedMOA: Federated GRPO for Personalized Reasoning LLMs under Heterogeneous Rewards : Abstract: Group Relative Policy Optimization (GRPO) has recently emerged as an effective approach for improving the reasoning capabilities of large language models through online multi-objective reinf...
- Search Inspired Exploration in Reinforcement Learning : Abstract: Exploration in environments with sparse rewards remains a fundamental challenge in reinforcement learning (RL). Existing approaches such as curriculum learning and Go-Explore often rely on h...
- Parallel Stochastic Gradient-Based Planning for World Models : Abstract: World models simulate environment dynamics from raw sensory inputs like video. However, using them for planning can be challenging due to the vast and unstructured search space. We propose a...
- Diffusion LMs Can Approximate Optimal Infilling Lengths Implicitly : Abstract: Diffusion language models (DLMs) provide a bidirectional generation framework naturally suited for infilling, yet their performance is constrained by the pre-specified infilling length. In t...
- AREAL-DTA: Dynamic Tree Attention for Efficient Reinforcement Learning of Large Language Models : Abstract: Reinforcement learning (RL) based post-training for large language models (LLMs) is computationally expensive, as it generates many rollout sequences that could frequently share long token p...
- OD-DEAL: Dynamic Expert-Guided Adversarial Learning with Online Decomposition for Scalable Capacitated Vehicle Routing : Abstract: Solving large-scale capacitated vehicle routing problems (CVRP) is hindered by the high complexity of heuristics and the limited generalization of neural solvers on massive graphs. We propos...
- Partition of Unity Neural Networks for Interpretable Classification with Explicit Class Regions : Abstract: Despite their empirical success, neural network classifiers remain difficult to interpret. In softmax-based models, class regions are defined implicitly as solutions to systems of inequaliti...
- Minerva: Reinforcement Learning with Verifiable Rewards for Cyber Threat Intelligence LLMs : Abstract: Cyber threat intelligence (CTI) analysts routinely convert noisy, unstructured security artifacts into standardized, automation-ready representations. Although large language models (LLMs) s...
- NEST: Nested Event Stream Transformer for Sequences of Multisets : Abstract: Event stream data often exhibit hierarchical structure in which multiple events co-occur, resulting in a sequence of multisets (i.e., bags of events). In electronic health records (EHRs), fo...
- AIRE-Prune: Asymptotic Impulse-Response Energy for State Pruning in State Space Models : Abstract: State space models (SSMs) often sacrifice capacity, search space, or stability to offset the memory and compute costs of large state dimensions. We introduce a structured post-training pruni...
- Invertible Memory Flow Networks : Abstract: Long sequence neural memory remains a challenging problem. RNNs and their variants suffer from vanishing gradients, and Transformers suffer from quadratic scaling. Furthermore, compressing l...
- OpenDDI: A Comprehensive Benchmark for DDI Prediction : Abstract: Drug-Drug Interactions (DDIs) significantly influence therapeutic efficacy and patient safety. As experimental discovery is resource-intensive and time-consuming, efficient computational met...
- One Loss to Rule Them All: Marked Time-to-Event for Structured EHR Foundation Models : Abstract: Clinical events captured in Electronic Health Records (EHR) are irregularly sampled and may consist of a mixture of discrete events and numerical measurements, such as laboratory values or t...
- Depth, Not Data: An Analysis of Hessian Spectral Bifurcation : Abstract: The eigenvalue distribution of the Hessian matrix plays a crucial role in understanding the optimization landscape of deep neural networks. Prior work has attributed the well-documented ``bu...
- Beyond the Node: Clade-level Selection for Efficient MCTS in Automatic Heuristic Design : Abstract: While Monte Carlo Tree Search (MCTS) shows promise in Large Language Model (LLM) based Automatic Heuristic Design (AHD), it suffers from a critical over-exploitation tendency under the limit...
- Forget by Uncertainty: Orthogonal Entropy Unlearning for Quantized Neural Networks : Abstract: The deployment of quantized neural networks on edge devices, combined with privacy regulations like GDPR, creates an urgent need for machine unlearning in quantized models. However, existing...
- When Classes Evolve: A Benchmark and Framework for Stage-Aware Class-Incremental Learning : Abstract: Class-Incremental Learning (CIL) aims to sequentially learn new classes while mitigating catastrophic forgetting of previously learned knowledge. Conventional CIL approaches implicitly assum...
- Sparsity-Aware Unlearning for Large Language Models : Abstract: Large Language Models (LLMs) inevitably memorize sensitive information during training, posing significant privacy risks. Machine unlearning has emerged as a promising solution to selectivel...
- Bridging Time and Frequency: A Joint Modeling Framework for Irregular Multivariate Time Series Forecasting : Abstract: Irregular multivariate time series forecasting (IMTSF) is challenging due to non-uniform sampling and variable asynchronicity. These irregularities violate the equidistant assumptions of sta...
- Safe Langevin Soft Actor Critic : Abstract: Balancing reward and safety in constrained reinforcement learning remains challenging due to poor generalization from sharp value minima and inadequate handling of heavy-tailed risk distribu...
- SEER: Transformer-based Robust Time Series Forecasting via Automated Patch Enhancement and Replacement : Abstract: Time series forecasting is important in many fields that require accurate predictions for decision-making. Patching techniques, commonly used and effective in time series modeling, help capt...
- Kernelized Edge Attention: Addressing Semantic Attention Blurring in Temporal Graph Neural Networks : Abstract: Temporal Graph Neural Networks (TGNNs) aim to capture the evolving structure and timing of interactions in dynamic graphs. Although many models incorporate time through encodings or architec...
- Direct Preference Optimization with Rating Information: Practical Algorithms and Provable Gains : Abstract: The class of direct preference optimization (DPO) algorithms has emerged as a promising approach for solving the alignment problem in foundation models. These algorithms work with very limit...
- Actor-Dual-Critic Dynamics for Zero-sum and Identical-Interest Stochastic Games : Abstract: We propose a novel independent and payoff-based learning framework for stochastic games that is model-free, game-agnostic, and gradient-free. The learning dynamics follow a best-response-typ...
- Equilibrium of Feasible Zone and Uncertain Model in Safe Exploration : Abstract: Ensuring the safety of environmental exploration is a critical problem in reinforcement learning (RL). While limiting exploration to a feasible zone has become widely accepted as a way to en...
- Combinatorial Bandit Bayesian Optimization for Tensor Outputs : Abstract: Bayesian optimization (BO) has been widely used to optimize expensive and black-box functions across various domains. Existing BO methods have not addressed tensor-output functions. To fill ...
- CoRe-Fed: Bridging Collaborative and Representation Fairness via Federated Embedding Distillation : Abstract: With the proliferation of distributed data sources, Federated Learning (FL) has emerged as a key approach to enable collaborative intelligence through decentralized model training while pres...
- PHAT: Modeling Period Heterogeneity for Multivariate Time Series Forecasting : Abstract: While existing multivariate time series forecasting models have advanced significantly in modeling periodicity, they largely neglect the periodic heterogeneity common in real-world data, whe...
- Riemannian Flow Matching for Disentangled Graph Domain Adaptation : Abstract: Graph Domain Adaptation (GDA) typically uses adversarial learning to align graph embeddings in Euclidean space. However, this paradigm suffers from two critical challenges: Structural Degene...
- Three-Way Emotion Classification of EEG-based Signals using Machine Learning : Abstract: Electroencephalography (EEG) is a widely used technique for measuring brain activity. EEG-based signals can reveal a persons emotional state, as they directly reflect activity in different b...
- Strong Linear Baselines Strike Back: Closed-Form Linear Models as Gaussian Process Conditional Density Estimators for TSAD : Abstract: Research in time series anomaly detection (TSAD) has largely focused on developing increasingly sophisticated, hard-to-train, and expensive-to-infer neural architectures. We revisit this par...
- Provably Protecting Fine-Tuned LLMs from Training Data Extraction : Abstract: Fine-tuning large language models (LLMs) on sensitive datasets raises privacy concerns, as training data extraction (TDE) attacks can expose highly confidential information. Existing defense...
- Topology and Geometry of the Learning Space of ReLU Networks: Connectivity and Singularities : Abstract: Understanding the properties of the parameter space in feed-forward ReLU networks is critical for effectively analyzing and guiding training dynamics. After initialization, training under gr...
- LocalV: Exploiting Information Locality for IP-level Verilog Generation : Abstract: The generation of Register-Transfer Level (RTL) code is a crucial yet labor-intensive step in digital hardware design, traditionally requiring engineers to manually translate complex specifi...
- Federated Learning at the Forefront of Fairness: A Multifaceted Perspective : Abstract: Fairness in Federated Learning (FL) is emerging as a critical factor driven by heterogeneous clients' constraints and balanced model performance across various scenarios. In this survey, we ...
- Spectral Imbalance Causes Forgetting in Low-Rank Continual Adaptation : Abstract: Parameter-efficient continual learning aims to adapt pre-trained models to sequential tasks without forgetting previously acquired knowledge. Most existing approaches treat continual learnin...
- Provable Model Provenance Set for Large Language Models : Abstract: The growing prevalence of unauthorized model usage and misattribution has increased the need for reliable model provenance analysis. However, existing methods largely rely on heuristic finge...
- A novel VAE-DML fusion framework for casual analysis of greenwashing in the mining industry : Abstract: Against the backdrop of the global green transition and "dual carbon" goals, mining industry chain enterprises are pivotal entities in terms of resource consumption and environmental impact....
- Stable Time Series Prediction of Enterprise Carbon Emissions Based on Causal Inference : Abstract: Against the backdrop of ongoing carbon peaking and carbon neutrality goals, accurate prediction of enterprise carbon emission trends constitutes an essential foundation for energy structure ...
- Fast Non-Episodic Finite-Horizon RL with K-Step Lookahead Thresholding : Abstract: Online reinforcement learning in non-episodic, finite-horizon MDPs remains underexplored and is challenged by the need to estimate returns to a fixed terminal time. Existing infinite-horizon...
- Sporadic Gradient Tracking over Directed Graphs: A Theoretical Perspective on Decentralized Federated Learning : Abstract: Decentralized Federated Learning (DFL) enables clients with local data to collaborate in a peer-to-peer manner to train a generalized model. In this paper, we unify two branches of work that...
- Mobile Exergames: Activity Recognition Based on Smartphone Sensors : Abstract: Smartphone sensors can be extremely useful in providing information on the activities and behaviors of persons. Human activity recognition is increasingly used for games, medical, or surveil...
- Over-Alignment vs Over-Fitting: The Role of Feature Learning Strength in Generalization : Abstract: Feature learning strength (FLS), i.e., the inverse of the effective output scaling of a model, plays a critical role in shaping the optimization dynamics of neural nets. While its impact has...
- Investigating the Robustness of Subtask Distillation under Spurious Correlation : Abstract: Subtask distillation is an emerging paradigm in which compact, specialized models are extracted from large, general-purpose 'foundation models' for deployment in environments with limited re...
- Learning Heat-based Equations in Self-similar variables : Abstract: We study solution learning for heat-based equations in self-similar variables (SSV). We develop an SSV training framework compatible with standard neural-operator training. We instantiate th...
- Dynamic Expert Sharing: Decoupling Memory from Parallelism in Mixture-of-Experts Diffusion LLMs : Abstract: Among parallel decoding paradigms, diffusion large language models (dLLMs) have emerged as a promising candidate that balances generation quality and throughput. However, their integration w...
- Test-time Generalization for Physics through Neural Operator Splitting : Abstract: Neural operators have shown promise in learning solution maps of partial differential equations (PDEs), but they often struggle to generalize when test inputs lie outside the training distri...
- Reliability-Aware Determinantal Point Processes for Robust Informative Data Selection in Large Language Models : Abstract: Informative data selection is a key requirement for large language models (LLMs) to minimize the amount of data required for fine-tuning, network distillation, and token pruning, enabling fa...
- Domain-Adaptive and Scalable Dense Retrieval for Content-Based Recommendation : Abstract: E-commerce recommendation and search commonly rely on sparse keyword matching (e.g., BM25), which breaks down under vocabulary mismatch when user intent has limited lexical overlap with prod...
- PyGALAX: An Open-Source Python Toolkit for Advanced Explainable Geospatial Machine Learning : Abstract: PyGALAX is a Python package for geospatial analysis that integrates automated machine learning (AutoML) and explainable artificial intelligence (XAI) techniques to analyze spatial heterogene...
- Efficient Deep Learning for Medical Imaging: Bridging the Gap Between High-Performance AI and Clinical Deployment : Abstract: Deep learning has revolutionized medical image analysis, playing a vital role in modern clinical applications. However, the deployment of large-scale models in real-world clinical settings r...
- Early Classification of Time Series in Non-Stationary Cost Regimes : Abstract: Early Classification of Time Series (ECTS) addresses decision-making problems in which predictions must be made as early as possible while maintaining high accuracy. Most existing ECTS metho...
- Beyond What Seems Necessary: Hidden Gains from Scaling Training-Time Reasoning Length under Outcome Supervision : Abstract: Training LLMs to think and reason for longer has become a key ingredient in building state-of-the-art models that can solve complex problems previously out of reach. Recent efforts pursue th...
- SALAAD: Sparse And Low-Rank Adaptation via ADMM : Abstract: Modern large language models are increasingly deployed under compute and memory constraints, making flexible control of model capacity a central challenge. While sparse and low-rank structur...
- Dynamic Prior Thompson Sampling for Cold-Start Exploration in Recommender Systems : Abstract: Cold-start exploration is a core challenge in large-scale recommender systems: new or data-sparse items must receive traffic to estimate value, but over-exploration harms users and wastes im...
- Optimal Budgeted Adaptation of Large Language Models : Abstract: The trade-off between labeled data availability and downstream accuracy remains a central challenge in fine-tuning large language models (LLMs). We propose a principled framework for \emph{b...
- SAGE: Agentic Framework for Interpretable and Clinically Translatable Computational Pathology Biomarker Discovery : Abstract: Despite significant progress in computational pathology, many AI models remain black-box and difficult to interpret, posing a major barrier to clinical adoption due to limited transparency a...
- From drift to adaptation to the failed ml model: Transfer Learning in Industrial MLOps : Abstract: Model adaptation to production environment is critical for reliable Machine Learning Operations (MLOps), less attention is paid to developing systematic framework for updating the ML models ...
- Probing the Knowledge Boundary: An Interactive Agentic Framework for Deep Knowledge Extraction : Abstract: Large Language Models (LLMs) can be seen as compressed knowledge bases, but it remains unclear what knowledge they truly contain and how far their knowledge boundaries extend. Existing bench...
- On the Spectral Flattening of Quantized Embeddings : Abstract: Training Large Language Models (LLMs) at ultra-low precision is critically impeded by instability rooted in the conflict between discrete quantization constraints and the intrinsic heavy-tai...
- Forest-Guided Semantic Transport for Label-Supervised Manifold Alignment : Abstract: Label-supervised manifold alignment bridges the gap between unsupervised and correspondence-based paradigms by leveraging shared label information to align multimodal datasets. Still, most e...
- Scalable Random Wavelet Features: Efficient Non-Stationary Kernel Approximation with Convergence Guarantees : Abstract: Modeling non-stationary processes, where statistical properties vary across the input domain, is a critical challenge in machine learning; yet most scalable methods rely on a simplifying ass...
- Predicting Anemia Among Under-Five Children in Nepal Using Machine Learning and Deep Learning : Abstract: Childhood anemia remains a major public health challenge in Nepal and is associated with impaired growth, cognition, and increased morbidity. Using World Health Organization hemoglobin thres...
- SFMP: Fine-Grained, Hardware-Friendly and Search-Free Mixed-Precision Quantization for Large Language Models : Abstract: Mixed-precision quantization is a promising approach for compressing large language models under tight memory budgets. However, existing mixed-precision methods typically suffer from one of ...
- SwiftRepertoire: Few-Shot Immune-Signature Synthesis via Dynamic Kernel Codes : Abstract: Repertoire-level analysis of T cell receptors offers a biologically grounded signal for disease detection and immune monitoring, yet practical deployment is impeded by label sparsity, cohort...
- LRAgent: Efficient KV Cache Sharing for Multi-LoRA LLM Agents : Abstract: Role specialization in multi-LLM agent systems is often realized via multi-LoRA, where agents share a pretrained backbone and differ only through lightweight adapters. Despite sharing base m...
- On the Expressive Power of Permutation-Equivariant Weight-Space Networks : Abstract: Weight-space learning studies neural architectures that operate directly on the parameters of other neural networks. Motivated by the growing availability of pretrained models, recent work h...
- Single-Edge Node Injection Threats to GNN-Based Security Monitoring in Industrial Graph Systems : Abstract: Graph neural networks (GNNs) are increasingly adopted in industrial graph-based monitoring systems (e.g., Industrial internet of things (IIoT) device graphs, power-grid topology models, and ...
- ChronoSpike: An Adaptive Spiking Graph Neural Network for Dynamic Graphs : Abstract: Dynamic graph representation learning requires capturing both structural relationships and temporal evolution, yet existing approaches face a fundamental trade-off: attention-based methods a...
- Introduction to Automated Negotiation : Abstract: This book is an introductory textbook targeted towards computer science students who are completely new to the topic of automated negotiation. It does not require any prerequisite knowledge,...
- FastSLM: Hierarchical Frame Q-Former for Effective Speech Modality Adaptation : Abstract: Although Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision, language, and video understanding tasks, scaling them to long-form speech remains a cri...
- Membership Inference Attacks Against Fine-tuned Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) represent a promising alternative to autoregressive language models, using bidirectional masked token prediction. Yet their susceptibility to privacy leakage...
- How AI Impacts Skill Formation : Abstract: AI assistance produces significant productivity gains across professional domains, particularly for novice workers. Yet how this assistance affects the development of skills required to effe...
- Music Plagiarism Detection: Problem Formulation and a Segment-based Solution : Abstract: Recently, the problem of music plagiarism has emerged as an even more pressing social issue. As music information retrieval research advances, there is a growing effort to address issues rel...
- HER: Human-like Reasoning and Reinforcement Learning for LLM Role-playing : Abstract: LLM role-playing, i.e., using LLMs to simulate specific personas, has emerged as a key capability in various applications, such as companionship, content creation, and digital games. While c...
- TraceRouter: Robust Safety for Large Foundation Models via Path-Level Intervention : Abstract: Despite their capabilities, large foundation models (LFMs) remain susceptible to adversarial manipulation. Current defenses predominantly rely on the "locality hypothesis", suppressing isola...
- Latent Adversarial Regularization for Offline Preference Optimization : Abstract: Learning from human feedback typically relies on preference optimization that constrains policy updates through token-level regularization. However, preference optimization for language mode...
- Measurement for Opaque Systems: Multi-source Triangulation with Interpretable Machine Learning : Abstract: We propose a measurement framework for difficult-to-access contexts that uses indirect data traces, interpretable machine-learning models, and theory-guided triangulation to fill inaccessibl...
- ELLMPEG: An Edge-based Agentic LLM Video Processing Tool : Abstract: Large language models (LLMs), the foundation of generative AI systems like ChatGPT, are transforming many fields and applications, including multimedia, enabling more advanced content genera...
- RAPTOR-AI for Disaster OODA Loop: Hierarchical Multimodal RAG with Experience-Driven Agentic Decision-Making : Abstract: Effective humanitarian assistance and disaster relief (HADR) requires rapid situational understanding, reliable decision support, and the ability to generalize across diverse and previously ...
- Extending Beacon to Hindi: Cultural Adaptation Drives Cross-Lingual Sycophancy : Abstract: Sycophancy, the tendency of language models to prioritize agreement with user preferences over principled reasoning, has been identified as a persistent alignment failure in English-language...
- Distributional Reinforcement Learning for Condition-Based Maintenance of Multi-Pump Equipment : Abstract: Condition-Based Maintenance (CBM) signifies a paradigm shift from reactive to proactive equipment management strategies in modern industrial systems. Conventional time-based maintenance sche...
- Generative AI-enhanced Probabilistic Multi-Fidelity Surrogate Modeling Via Transfer Learning : Abstract: The performance of machine learning surrogates is critically dependent on data quality and quantity. This presents a major challenge, as high-fidelity (HF) data is often scarce and computati...
- Dimensional Peeking for Low-Variance Gradients in Zeroth-Order Discrete Optimization via Simulation : Abstract: Gradient-based optimization methods are commonly used to identify local optima in high-dimensional spaces. When derivatives cannot be evaluated directly, stochastic estimators can provide ap...
- Automated univariate time series forecasting with regression trees : Abstract: This paper describes a methodology for automated univariate time series forecasting using regression trees and their ensembles: bagging and random forests. The key aspects that are addressed...
- Lossless Embedding Compression via Spherical Coordinates : Abstract: We present a lossless compression method for unit-norm embeddings that achieves 1.5$\times$ compression, 25\% better than the best prior method. The method exploits that spherical coordinate...
- Why LoRA Resists Label Noise: A Theoretical Framework for Noise-Robust Parameter-Efficient Fine-Tuning : Abstract: Parameter-efficient fine-tuning methods like Low-Rank Adaptation (LoRA) have become the dominant paradigm for adapting large pretrained models. We present a theoretical framework explaining ...
- Trade-offs Between Individual and Group Fairness in Machine Learning: A Comprehensive Review : Abstract: Algorithmic fairness has become a central concern in computational decision-making systems, where ensuring equitable outcomes is essential for both ethical and legal reasons. Two dominant no...
- Gauss-Newton Natural Gradient Descent for Shape Learning : Abstract: We explore the use of the Gauss-Newton method for optimization in shape learning, including implicit neural surfaces and geometry-informed neural networks. The method addresses key challenge...
- THDC: Training Hyperdimensional Computing Models with Backpropagation : Abstract: Hyperdimensional computing (HDC) offers lightweight learning for energy-constrained devices by encoding data into high-dimensional vectors. However, its reliance on ultra-high dimensionality...
- Predicting Mortgage Default with Machine Learning: AutoML, Class Imbalance, and Leakage Control : Abstract: Mortgage default prediction is a core task in financial risk management, and machine learning models are increasingly used to estimate default probabilities and provide interpretable signals...
- ALIGN: Aligned Delegation with Performance Guarantees for Multi-Agent LLM Reasoning : Abstract: LLMs often underperform on complex reasoning tasks when relying on a single generation-and-selection pipeline. Inference-time ensemble methods can improve performance by sampling diverse rea...
- Quantum Model Parallelism for MRI-Based Classification of Alzheimer's Disease Stages : Abstract: With increasing life expectancy, AD has become a major global health concern. While classical AI-based methods have been developed for early diagnosis and stage classification of AD, growing...
- Monte Carlo Tree Search for Execution-Guided Program Repair with Large Language Models : Abstract: Automated program repair with large language models remains challenging at the repository level due to long-horizon reasoning requirements and the limitations of autoregressive decoding. We ...
- On the Relationship Between Representation Geometry and Generalization in Deep Neural Networks : Abstract: We investigate the relationship between representation geometry and neural network performance. Analyzing 52 pretrained ImageNet models across 13 architecture families, we show that effectiv...
- Benford's Law as a Distributional Prior for Post-Training Quantization of Large Language Models : Abstract: The rapid growth of Large Language Models (LLMs) intensifies the need for effective compression, with weight quantization being the most widely adopted technique. Standard uniform quantizers...
- How Understanding Forecast Uncertainty Resolves the Explainability Problem in Machine Learning Models : Abstract: For applications of machine learning in critical decisions, explainability is a primary concern, and often a regulatory requirement. Local linear methods for generating explanations, such as...
- GEPC: Group-Equivariant Posterior Consistency for Out-of-Distribution Detection in Diffusion Models : Abstract: Diffusion models learn a time-indexed score field $\mathbf{s}_θ(\mathbf{x}_t,t)$ that often inherits approximate equivariances (flips, rotations, circular shifts) from in-distribution (ID) d...
- Reducing Memorisation in Generative Models via Riemannian Bayesian Inference : Abstract: Modern generative models can produce realistic samples, however, balancing memorisation and generalisation remains an open problem. We approach this challenge from a Bayesian perspective by ...
- Reducing Class-Wise Performance Disparity via Margin Regularization : Abstract: Deep neural networks often exhibit substantial disparities in class-wise accuracy, even when trained on class-balanced data, posing concerns for reliable deployment. While prior efforts have...
- Dispersion Loss Counteracts Embedding Condensation and Improves Generalization in Small Language Models : Abstract: Large language models (LLMs) achieve remarkable performance through ever-increasing parameter counts, but scaling incurs steep computational costs. To better understand LLM scaling, we study...
- GRIP2: A Robust and Powerful Deep Knockoff Method for Feature Selection : Abstract: Identifying truly predictive covariates while strictly controlling false discoveries remains a fundamental challenge in nonlinear, highly correlated, and low signal-to-noise regimes, where d...
- Green-NAS: A Global-Scale Multi-Objective Neural Architecture Search for Robust and Efficient Edge-Native Weather Forecasting : Abstract: We introduce Green-NAS, a multi-objective NAS (neural architecture search) framework designed for low-resource environments using weather forecasting as a case study. By adhering to 'Green A...
- Generation Order and Parallel Decoding in Masked Diffusion Models: An Information-Theoretic Perspective : Abstract: Masked Diffusion Models (MDMs) significantly accelerate inference by trading off sequential determinism. However, the theoretical mechanisms governing generation order and the risks inherent...
- From Observations to States: Latent Time Series Forecasting : Abstract: Deep learning has achieved strong performance in Time Series Forecasting (TSF). However, we identify a critical representation paradox, termed Latent Chaos: models with accurate predictions ...
- Agentic Framework for Epidemiological Modeling : Abstract: Epidemic modeling is essential for public health planning, yet traditional approaches rely on fixed model classes that require manual redesign as pathogens, policies, and scenario assumption...
- Neural Ising Machines via Unrolling and Zeroth-Order Training : Abstract: We propose a data-driven heuristic for NP-hard Ising and Max-Cut optimization that learns the update rule of an iterative dynamical system. The method learns a shared, node-wise update rule ...
- Harvest: Opportunistic Peer-to-Peer GPU Caching for LLM Inference : Abstract: Large Language Model (LLM) inference is increasingly constrained by GPU memory capacity rather than compute throughput, driven by growing model sizes and the linear growth of the key-value (...
- Prototype-based Explainable Neural Networks with Channel-specific Reasoning for Geospatial Learning Tasks : Abstract: Explainable AI (XAI) is essential for understanding machine learning (ML) decision-making and ensuring model trustworthiness in scientific applications. Prototype-based XAI methods offer an ...
- Efficient and accurate steering of Large Language Models through attention-guided feature learning : Abstract: Steering, or direct manipulation of internal activations to guide LLM responses toward specific semantic concepts, is emerging as a promising avenue for both understanding how semantic conce...
- Adaptive Momentum and Nonlinear Damping for Neural Network Training : Abstract: We propose a continuous-time scheme for large-scale optimization that introduces individual, adaptive momentum coefficients regulated by the kinetic energy of each model parameter. This appr...
- Planning with Language and Generative Models: Toward General Reward-Guided Wireless Network Design : Abstract: Intelligent access point (AP) deployment remains challenging in next-generation wireless networks due to complex indoor geometries and signal propagation. We firstly benchmark general-purpos...
- Leveraging Textual-Cues for Enhancing Multimodal Sentiment Analysis by Object Recognition : Abstract: Multimodal sentiment analysis, which includes both image and text data, presents several challenges due to the dissimilarities in the modalities of text and image, the ambiguity of sentiment...
- Quantum Generator Kernels : Abstract: Quantum kernel methods offer significant theoretical benefits by rendering classically inseparable features separable in quantum space. Yet, the practical application of Quantum Machine Lear...
- Insight Agents: An LLM-Based Multi-Agent System for Data Insights : Abstract: Today, E-commerce sellers face several key challenges, including difficulties in discovering and effectively utilizing available programs and tools, and struggling to understand and utilize ...
- AMA: Adaptive Memory via Multi-Agent Collaboration : Abstract: The rapid evolution of Large Language Model (LLM) agents has necessitated robust memory systems to support cohesive long-term interaction and complex reasoning. Benefiting from the strong ca...
- OpenSec: Measuring Incident Response Agent Calibration Under Adversarial Evidence : Abstract: As large language models improve, so do their offensive applications: frontier agents now generate working exploits for under $50 in compute (Heelan, 2026). Defensive incident response (IR) ...
- ChipBench: A Next-Step Benchmark for Evaluating LLM Performance in AI-Aided Chip Design : Abstract: While Large Language Models (LLMs) show significant potential in hardware engineering, current benchmarks suffer from saturation and limited task diversity, failing to reflect LLMs' performa...
- Language-based Trial and Error Falls Behind in the Era of Experience : Abstract: While Large Language Models (LLMs) excel in language-based agentic tasks, their applicability to unseen, nonlinguistic environments (e.g., symbolic or spatial tasks) remains limited. Previou...
- Bridging Forecast Accuracy and Inventory KPIs: A Simulation-Based Software Framework : Abstract: Efficient management of spare parts inventory is crucial in the automotive aftermarket, where demand is highly intermittent and uncertainty drives substantial cost and service risks. Forecas...
- Optimizing Agentic Workflows using Meta-tools : Abstract: Agentic AI enables LLM to dynamically reason, plan, and interact with tools to solve complex tasks. However, agentic workflows often require many iterative reasoning steps and tool invocatio...
- Unified Task and Motion Planning using Object-centric Abstractions of Motion Constraints : Abstract: In task and motion planning (TAMP), the ambiguity and underdetermination of abstract descriptions used by task planning methods make it difficult to characterize physical constraints needed ...
- Trajectory Data Management and Mining: A Survey from Deep Learning to the LLM Era : Abstract: Trajectory computing is a pivotal domain encompassing trajectory data management and mining, garnering widespread attention due to its crucial role in various practical applications such as ...
- InterDreamer: Zero-Shot Text to 3D Dynamic Human-Object Interaction : Abstract: Text-conditioned human motion generation has experienced significant advancements with diffusion models trained on extensive motion capture data and corresponding textual annotations. Howeve...
- Conditional diffusion models for downscaling and bias correction of Earth system model precipitation : Abstract: Climate change exacerbates extreme weather events like heavy rainfall and flooding. As these events cause severe socioeconomic damage, accurate high-resolution simulation of precipitation is...
- Invariant Representation Guided Multimodal Sentiment Decoding with Sequential Variation Regularization : Abstract: Achieving consistent sentiment representation across diverse modalities remains a key challenge in multimodal sentiment analysis. However, rapid emotional fluctuations over time often introd...
- Exploiting Latent Linearity in LLMs Improves Explainable Molecular Representation Learning : Abstract: Large language models (LLMs) have demonstrated broad utility across molecular domains, spanning drug discovery and materials design. Analyzing LLMs' latent representations is crucial for elu...
- Estimating Respiratory Effort from Nocturnal Breathing Sounds for Obstructive Sleep Apnoea Screening : Abstract: Obstructive sleep apnoea (OSA) is a prevalent condition with significant health consequences, yet many patients remain undiagnosed due to the complexity and cost of over-night polysomnograph...
- SupCLAP: Controlling Optimization Trajectory Drift in Audio-Text Contrastive Learning with Support Vector Regularization : Abstract: Contrastive language-audio pretraining, which aims to unify multimodal representations in a shared embedding space, serves as a cornerstone for building a wide range of applications, from cr...
- Scalable Spatio-Temporal SE(3) Diffusion for Long-Horizon Protein Dynamics : Abstract: Molecular dynamics (MD) simulations remain the gold standard for studying protein dynamics, but their computational cost limits access to biologically relevant timescales. Recent generative ...
- DCoPilot: Generative AI-Empowered Policy Adaptation for Dynamic Data Center Operations : Abstract: Modern data centers (DCs) hosting artificial intelligence (AI)-dedicated devices operate at high power densities with rapidly varying workloads, making minute-level adaptation essential for ...
- CAM: A Causality-based Analysis Framework for Multi-Agent Code Generation Systems : Abstract: Despite the remarkable success that Multi-Agent Code Generation Systems (MACGS) have achieved, the inherent complexity of multi-agent architectures produces substantial volumes of intermedia...
- EvoMU: Evolutionary Machine Unlearning : Abstract: Machine unlearning aims to unlearn specified training data (e.g. sensitive or copyrighted material). A prominent approach is to fine-tune an existing model with an unlearning loss that retai...
- Learning Generative Selection for Best-of-N : Abstract: Scaling test-time compute via parallel sampling can substantially improve LLM reasoning, but is often limited by Best-of-N selection quality. Generative selection methods, such as GenSelect,...
- Back to the Future: Look-ahead Augmentation and Parallel Self-Refinement for Time Series Forecasting : Abstract: Long-term time series forecasting (LTSF) remains challenging due to the trade-off between parallel efficiency and sequential modeling of temporal coherence. Direct multi-step forecasting (DM...
- ECHO: Entropy-Confidence Hybrid Optimization for Test-Time Reinforcement Learning : Abstract: Test-time reinforcement learning generates multiple candidate answers via repeated rollouts and performs online updates using pseudo-labels constructed by majority voting. To reduce overhead...
- Self-Evolving Coordination Protocol in Multi-Agent AI Systems: An Exploratory Systems Feasibility Study : Abstract: Contemporary multi-agent systems increasingly rely on internal coordination mechanisms to combine, arbitrate, or constrain the outputs of heterogeneous components. In safety-critical and reg...
- SurvKAN: A Fully Parametric Survival Model Based on Kolmogorov-Arnold Networks : Abstract: Accurate prediction of time-to-event outcomes is critical for clinical decision-making, treatment planning, and resource allocation in modern healthcare. While classical survival models such...
- Malware Detection Through Memory Analysis : Abstract: This paper summarizes the research conducted for a malware detection project using the Canadian Institute for Cybersecurity's MalMemAnalysis-2022 dataset. The purpose of the project was to e...
- Vision-DeepResearch Benchmark: Rethinking Visual and Textual Search for Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have advanced VQA and now support Vision-DeepResearch systems that use search engines for complex visual-textual fact-finding. However, evaluating th...
- State Rank Dynamics in Linear Attention LLMs : Abstract: Linear Attention Large Language Models (LLMs) offer a compelling recurrent formulation that compresses context into a fixed-size state matrix, enabling constant-time inference. However, the ...
- Hierarchical Adaptive Eviction for KV Cache Management in Multimodal Language Models : Abstract: The integration of visual information into Large Language Models (LLMs) has enabled Multimodal LLMs (MLLMs), but the quadratic memory and computational costs of Transformer architectures rem...
- Cardinality-Preserving Structured Sparse Graph Transformers for Molecular Property Prediction : Abstract: Drug discovery motivates efficient molecular property prediction under limited labeled data. Chemical space is vast, often estimated at approximately 10^60 drug-like molecules, while only th...
- Towards AI Evaluation in Domain-Specific RAG Systems: The AgriHubi Case Study : Abstract: Large language models show promise for knowledge-intensive domains, yet their use in agriculture is constrained by weak grounding, English-centric training data, and limited real-world evalu...
- Generating Physically Sound Designs from Text and a Set of Physical Constraints : Abstract: We present TIDES, a text informed design approach for generating physically sound designs based on a textual description and a set of physical constraints. TIDES jointly optimizes structural...
- Spectral Superposition: A Theory of Feature Geometry : Abstract: Neural networks represent more features than they have dimensions via superposition, forcing features to share representational space. Current methods decompose activations into sparse linea...
- SEDformer: Event-Synchronous Spiking Transformers for Irregular Telemetry Time Series Forecasting : Abstract: Telemetry streams from large-scale Internet-connected systems (e.g., IoT deployments and online platforms) naturally form an irregular multivariate time series (IMTS) whose accurate forecast...
- Geometry- and Relation-Aware Diffusion for EEG Super-Resolution : Abstract: Recent electroencephalography (EEG) spatial super-resolution (SR) methods, while showing improved quality by either directly predicting missing signals from visible channels or adapting late...
- OmniCode: A Benchmark for Evaluating Software Engineering Agents : Abstract: LLM-powered coding agents are redefining how real-world software is developed. To drive the research towards better coding agents, we require challenging benchmarks that can rigorously evalu...
- Unsupervised Physics-Informed Operator Learning through Multi-Stage Curriculum Training : Abstract: Solving partial differential equations remains a central challenge in scientific machine learning. Neural operators offer a promising route by learning mappings between function spaces and e...
- OpenSeal: Good, Fast, and Cheap Construction of an Open-Source Southeast Asian LLM via Parallel Data : Abstract: Large language models (LLMs) have proven to be effective tools for a wide range of natural language processing (NLP) applications. Although many LLMs are multilingual, most remain English-ce...
- Bridging the Sim-to-Real Gap with multipanda ros2: A Real-Time ROS2 Framework for Multimanual Systems : Abstract: We present $multipanda\_ros2$, a novel open-source ROS2 architecture for multi-robot control of Franka Robotics robots. Leveraging ros2 control, this framework provides native ROS2 interface...
- RACA: Representation-Aware Coverage Criteria for LLM Safety Testing : Abstract: Recent advancements in LLMs have led to significant breakthroughs in various AI applications. However, their sophisticated capabilities also introduce severe safety concerns, particularly th...
- Backpropagation as Physical Relaxation: Exact Gradients in Finite Time : Abstract: Backpropagation, the foundational algorithm for training neural networks, is typically understood as a symbolic computation that recursively applies the chain rule. We show it emerges exactl...
- DFKI-Speech System for WildSpoof Challenge: A robust framework for SASV In-the-Wild : Abstract: This paper presents the DFKI-Speech system developed for the WildSpoof Challenge under the Spoofing aware Automatic Speaker Verification (SASV) track. We propose a robust SASV framework in w...
- An Optimization Method for Autoregressive Time Series Forecasting : Abstract: Current time-series forecasting models are primarily based on transformer-style neural networks. These models achieve long-term forecasting mainly by scaling up the model size rather than th...
- Hallucination or Creativity: How to Evaluate AI-Generated Scientific Stories? : Abstract: Generative AI can turn scientific articles into narratives for diverse audiences, but evaluating these stories remains challenging. Storytelling demands abstraction, simplification, and peda...
- Decoupling Generalizability and Membership Privacy Risks in Neural Networks : Abstract: A deep learning model usually has to sacrifice some utilities when it acquires some other abilities or characteristics. Privacy preservation has such trade-off relationships with utilities. ...
- Advancing General-Purpose Reasoning Models with Modular Gradient Surgery : Abstract: Reinforcement learning (RL) has played a central role in recent advances in large reasoning models (LRMs), yielding strong gains in verifiable and open-ended reasoning. However, training a s...
- Spark: Modular Spiking Neural Networks : Abstract: Nowadays, neural networks act as a synonym for artificial intelligence. Present neural network models, although remarkably powerful, are inefficient both in terms of data and energy. Several...
- FragmentFlow: Scalable Transition State Generation for Large Molecules : Abstract: Transition states (TSs) are central to understanding and quantitatively predicting chemical reactivity and reaction mechanisms. Although traditional TS generation methods are computationally...
- A Large-Scale Dataset for Molecular Structure-Language Description via a Rule-Regularized Method : Abstract: Molecular function is largely determined by structure. Accurately aligning molecular structure with natural language is therefore essential for enabling large language models (LLMs) to reaso...
- TTT-Parkour: Rapid Test-Time Training for Perceptive Robot Parkour : Abstract: Achieving highly dynamic humanoid parkour on unseen, complex terrains remains a challenge in robotics. Although general locomotion policies demonstrate capabilities across broad terrain dist...
- VQ-Style: Disentangling Style and Content in Motion with Residual Quantized Representations : Abstract: Human motion data is inherently rich and complex, containing both semantic content and subtle stylistic features that are challenging to model. We propose a novel method for effective disent...
- Building a Correct-by-Design Lakehouse. Data Contracts, Versioning, and Transactional Pipelines for Humans and Agents : Abstract: Lakehouses are the default cloud platform for analytics and AI, but they become unsafe when untrusted actors concurrently operate on production data: upstream-downstream mismatches surface o...
- Rethinking Generative Recommender Tokenizer: Recsys-Native Encoding and Semantic Quantization Beyond LLMs : Abstract: Semantic ID (SID)-based recommendation is a promising paradigm for scaling sequential recommender systems, but existing methods largely follow a semantic-centric pipeline: item embeddings ar...
- Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics : Abstract: Methods for controlling large language models (LLMs), including local weight fine-tuning, LoRA-based adaptation, and activation-based interventions, are often studied in isolation, obscuring...
- Artificial Intelligence and Symmetries: Learning, Encoding, and Discovering Structure in Physical Data : Abstract: Symmetries play a central role in physics, organizing dynamics, constraining interactions, and determining the effective number of physical degrees of freedom. In parallel, modern artificial...
- Implicit neural representation of textures : Abstract: Implicit neural representation (INR) has proven to be accurate and efficient in various domains. In this work, we explore how different neural networks can be designed as a new texture INR, ...
- SWE-Universe: Scale Real-World Verifiable Environments to Millions : Abstract: We propose SWE-Universe, a scalable and efficient framework for automatically constructing real-world software engineering (SWE) verifiable environments from GitHub pull requests (PRs). To o...
- ReasonCACHE: Teaching LLMs To Reason Without Weight Updates : Abstract: Can Large language models (LLMs) learn to reason without any weight update and only through in-context learning (ICL)? ICL is strikingly sample-efficient, often learning from only a handful ...
- From Sycophancy to Sensemaking: Premise Governance for Human-AI Decision Making : Abstract: As LLMs expand from assistance to decision support, a dangerous pattern emerges: fluent agreement without calibrated judgment. Low-friction assistants can become sycophantic, baking in impli...
- Infinite-World: Scaling Interactive World Models to 1000-Frame Horizons via Pose-Free Hierarchical Memory : Abstract: We propose Infinite-World, a robust interactive world model capable of maintaining coherent visual memory over 1000+ frames in complex real-world environments. While existing world models ca...
- David vs. Goliath: Verifiable Agent-to-Agent Jailbreaking via Reinforcement Learning : Abstract: The evolution of large language models into autonomous agents introduces adversarial failures that exploit legitimate tool privileges, transforming safety evaluation in tool-augmented enviro...
- SoMA: A Real-to-Sim Neural Simulator for Robotic Soft-body Manipulation : Abstract: Simulating deformable objects under rich interactions remains a fundamental challenge for real-to-sim robot manipulation, with dynamics jointly driven by environmental effects and robot acti...
- Didactic to Constructive: Turning Expert Solutions into Learnable Reasoning : Abstract: Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or on the existence of a str...
- ReasonEdit: Editing Vision-Language Models using Human Reasoning : Abstract: Model editing aims to correct errors in large, pretrained models without altering unrelated behaviors. While some recent works have edited vision-language models (VLMs), no existing editors ...
- Poly-attention: a general scheme for higher-order self-attention : Abstract: The self-attention mechanism, at the heart of the Transformer model, is able to effectively model pairwise interactions between tokens. However, numerous recent works have shown that it is u...
- UniReason 1.0: A Unified Reasoning Framework for World Knowledge Aligned Image Generation and Editing : Abstract: Unified multimodal models often struggle with complex synthesis tasks that demand deep reasoning, and typically treat text-to-image generation and image editing as isolated capabilities rath...
- Active Causal Experimentalist (ACE): Learning Intervention Strategies via Direct Preference Optimization : Abstract: Discovering causal relationships requires controlled experiments, but experimentalists face a sequential decision problem: each intervention reveals information that should inform what to tr...
- World-Gymnast: Training Robots with Reinforcement Learning in a World Model : Abstract: Robot learning from interacting with the physical world is fundamentally bottlenecked by the cost of physical interaction. The two alternatives, supervised finetuning (SFT) from expert demon...
- Abstract Activation Spaces for Content-Invariant Reasoning in Large Language Models : Abstract: Large Language Models (LLMs) often struggle with deductive judgment in syllogistic reasoning, systematically conflating semantic plausibility with formal validity a phenomenon known as conte...
- Multi-head automated segmentation by incorporating detection head into the contextual layer neural network : Abstract: Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or hallucinations, in slices lacki...
- MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents : Abstract: Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what t...
- Flow Policy Gradients for Robot Control : Abstract: Likelihood-based policy gradient methods are the dominant approach for training robot control policies from rewards. These methods rely on differentiable action likelihoods, which constrain ...
- RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents : Abstract: LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or mainta...
- PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss : Abstract: Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is chal...
- Reward-free Alignment for Conflicting Objectives : Abstract: Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectiv...
- T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems : Abstract: To address the interpretability challenge in machine learning (ML) systems, counterfactual explanations (CEs) have emerged as a promising solution. CEs are unique as they provide workable su...
- Mastering NIM and Impartial Games with Weak Neural Networks: An AlphaZero-inspired Multi-Frame Approach : Abstract: We introduce a practical circuit-complexity model for fixed-precision neural networks to explain and overcome a persistent learnability barrier in impartial games like NIM. We show that boun...
- Preference-Conditioned Gradient Variations for Multi-Objective Quality-Diversity : Abstract: In a variety of domains, from robotics to finance, Quality-Diversity algorithms have been used to generate collections of both diverse and high-performing solutions. Multi-Objective Quality-...
- Test-Time Scaling in Reasoning Models Is Not Effective for Knowledge-Intensive Tasks Yet : Abstract: Test-time scaling increases inference-time computation by allowing models to generate long reasoning chains, and has improved performance across many domains. However, in this work, we show ...
- Extending RLVR to Open-Ended Tasks via Verifiable Multiple-Choice Reformulation : Abstract: Reinforcement Learning with Verifiable Rewards(RLVR) has demonstrated great potential in enhancing the reasoning capabilities of large language models (LLMs). However, its success has thus f...
- SCALER:Synthetic Scalable Adaptive Learning Environment for Reasoning : Abstract: Reinforcement learning (RL) offers a principled way to enhance the reasoning capabilities of large language models, yet its effectiveness hinges on training signals that remain informative a...
- JudgeFlow: Agentic Workflow Optimization via Block Judge : Abstract: Optimizing LLM-based agentic workflows is challenging for scaling AI capabilities. Current methods rely on coarse, end-to-end evaluation signals and lack fine-grained signals on where to ref...
- The Multiple Ticket Hypothesis: Random Sparse Subnetworks Suffice for RLVR : Abstract: The Lottery Ticket Hypothesis demonstrated that sparse subnetworks can match full-model performance, suggesting parameter redundancy. Meanwhile, in Reinforcement Learning with Verifiable Rew...
- Adaptive Rollout Allocation for Online Reinforcement Learning with Verifiable Rewards : Abstract: Sampling efficiency is a key bottleneck in reinforcement learning with verifiable rewards. Existing group-based policy optimization methods, such as GRPO, allocate a fixed number of rollouts...
- Boosting Maximum Entropy Reinforcement Learning via One-Step Flow Matching : Abstract: Diffusion policies are expressive yet incur high inference latency. Flow Matching (FM) enables one-step generation, but integrating it into Maximum Entropy Reinforcement Learning (MaxEnt RL)...
- A Practical Tensor-Network Compression Pipeline for Production-Scale Large Language Models : Abstract: Large language models are limited in deployment by GPU memory and inference latency. We present Minima, a production compression pipeline that learns where and how to structurally compress a...
- AgroFlux: A Spatial-Temporal Benchmark for Carbon and Nitrogen Flux Prediction in Agricultural Ecosystems : Abstract: Agroecosystem, which heavily influenced by human actions and accounts for a quarter of global greenhouse gas emissions (GHGs), plays a crucial role in mitigating global climate change and se...
- SUSD: Structured Unsupervised Skill Discovery through State Factorization : Abstract: Unsupervised Skill Discovery (USD) aims to autonomously learn a diverse set of skills without relying on extrinsic rewards. One of the most common USD approaches is to maximize the Mutual In...
- Toward Enhancing Representation Learning in Federated Multi-Task Settings : Abstract: Federated multi-task learning (FMTL) seeks to collaboratively train customized models for users with different tasks while preserving data privacy. Most existing approaches assume model cong...
- HuPER: A Human-Inspired Framework for Phonetic Perception : Abstract: We propose HuPER, a human-inspired framework that models phonetic perception as adaptive inference over acoustic-phonetics evidence and linguistic knowledge. With only 100 hours of training ...
- The Effect of Mini-Batch Noise on the Implicit Bias of Adam : Abstract: With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for...
- De Novo Molecular Generation from Mass Spectra via Many-Body Enhanced Diffusion : Abstract: Molecular structure generation from mass spectrometry is fundamental for understanding cellular metabolism and discovering novel compounds. Although tandem mass spectrometry (MS/MS) enables ...
- From Perception to Action: Spatial AI Agents and World Models : Abstract: While large language models have become the prevailing approach for agentic reasoning and planning, their success in symbolic domains does not readily translate to the physical world. Spatia...
- Contribution-aware Token Compression for Efficient Video Understanding via Reinforcement Learning : Abstract: Video large language models have demonstrated remarkable capabilities in video understanding tasks. However, the redundancy of video tokens introduces significant computational overhead duri...
- On the Spatiotemporal Dynamics of Generalization in Neural Networks : Abstract: Why do neural networks fail to generalize addition from 16-digit to 32-digit numbers, while a child who learns the rule can apply it to arbitrarily long sequences? We argue that this failure...
- Efficient Adversarial Attacks on High-dimensional Offline Bandits : Abstract: Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top...
- CoDiQ: Test-Time Scaling for Controllable Difficult Question Generation : Abstract: Large Reasoning Models (LRMs) benefit substantially from training on challenging competition-level questions. However, existing automated question synthesis methods lack precise difficulty c...
- TABX: A High-Throughput Sandbox Battle Simulator for Multi-Agent Reinforcement Learning : Abstract: The design of environments plays a critical role in shaping the development and evaluation of cooperative multi-agent reinforcement learning (MARL) algorithms. While existing benchmarks high...
- ASGMamba: Adaptive Spectral Gating Mamba for Multivariate Time Series Forecasting : Abstract: Long-term multivariate time series forecasting (LTSF) plays a crucial role in various high-performance computing applications, including real-time energy grid management and large-scale traf...
- AI-Assisted Adaptive Rendering for High-Frequency Security Telemetry in Web Interfaces : Abstract: Modern cybersecurity platforms must process and display high-frequency telemetry such as network logs, endpoint events, alerts, and policy changes in real time. Traditional rendering techniq...
- Real-Time Loop Closure Detection in Visual SLAM via NetVLAD and Faiss : Abstract: Loop closure detection (LCD) is a core component of simultaneous localization and mapping (SLAM): it identifies revisited places and enables pose-graph constraints that correct accumulated d...
- Towards Autonomous Instrument Tray Assembly for Sterile Processing Applications : Abstract: The Sterile Processing and Distribution (SPD) department is responsible for cleaning, disinfecting, inspecting, and assembling surgical instruments between surgeries. Manual inspection and p...
- FreshMem: Brain-Inspired Frequency-Space Hybrid Memory for Streaming Video Understanding : Abstract: Transitioning Multimodal Large Language Models (MLLMs) from offline to online streaming video understanding is essential for continuous perception. However, existing methods lack flexible ad...
- The Strategic Foresight of LLMs: Evidence from a Fully Prospective Venture Tournament : Abstract: Can artificial intelligence outperform humans at strategic foresight -- the capacity to form accurate judgments about uncertain, high-stakes outcomes before they unfold? We address this ques...
- Semantic-aware Wasserstein Policy Regularization for Large Language Model Alignment : Abstract: Large language models (LLMs) are commonly aligned with human preferences using reinforcement learning from human feedback (RLHF). In this method, LLM policies are generally optimized through...
- Counting Hypothesis: Potential Mechanism of In-Context Learning : Abstract: In-Context Learning (ICL) indicates that large language models (LLMs) pretrained on a massive amount of data can learn specific tasks from input prompts' examples. ICL is notable for two rea...
- Cross-Modal Alignment and Fusion for RGB-D Transmission-Line Defect Detection : Abstract: Transmission line defect detection remains challenging for automated UAV inspection due to the dominance of small-scale defects, complex backgrounds, and illumination variations. Existing RG...
- Meta Engine: A Unified Semantic Query Engine on Heterogeneous LLM-Based Query Systems : Abstract: With the increasingly use of multi-modal data, semantic query has become more and more demanded in data management systems, which is an important way to access and analyze multi-modal data. ...
- Beyond Mode Elicitation: Diversity-Preserving Reinforcement Learning via Latent Diffusion Reasoner : Abstract: Recent reinforcement learning (RL) methods improve LLM reasoning by optimizing discrete Chain-of-Thought (CoT) generation; however, exploration in token space often suffers from diversity co...
- Game of Thought: Robust Information Seeking with Large Language Models Using Game Theory : Abstract: Large Language Models (LLMs) are increasingly deployed in real-world scenarios where they may lack sufficient information to complete a given task. In such settings, the ability to actively ...
- Physics Informed Generative AI Enabling Labour Free Segmentation For Microscopy Analysis : Abstract: Semantic segmentation of microscopy images is a critical task for high-throughput materials characterisation, yet its automation is severely constrained by the prohibitive cost, subjectivity...
- BBPE16: UTF-16-based byte-level byte-pair encoding for improved multilingual speech recognition : Abstract: Multilingual automatic speech recognition (ASR) requires tokenization that efficiently covers many writing systems. Byte-level BPE (BBPE) using UTF-8 is widely adopted for its language-agnos...
- SafePred: A Predictive Guardrail for Computer-Using Agents via World Models : Abstract: With the widespread deployment of Computer-using Agents (CUAs) in complex real-world environments, prevalent long-term risks often lead to severe and irreversible consequences. Most existing...
- Softmax Linear Attention: Reclaiming Global Competition : Abstract: While linear attention reduces the quadratic complexity of standard Transformers to linear time, it often lags behind in expressivity due to the removal of softmax normalization. This omissi...
- Probability-Entropy Calibration: An Elastic Indicator for Adaptive Fine-tuning : Abstract: Token-level reweighting is a simple yet effective mechanism for controlling supervised fine-tuning, but common indicators are largely one-dimensional: the ground-truth probability reflects d...
- Rethinking LoRA for Data Heterogeneous Federated Learning: Subspace and State Alignment : Abstract: Low-Rank Adaptation (LoRA) is widely used for federated fine-tuning. Yet under non-IID settings, it can substantially underperform full-parameter fine-tuning. Through with-high-probability r...
- A Provable Expressiveness Hierarchy in Hybrid Linear-Full Attention : Abstract: Transformers serve as the foundation of most modern large language models. To mitigate the quadratic complexity of standard full attention, various efficient attention mechanisms, such as li...
- Backdoor Sentinel: Detecting and Detoxifying Backdoors in Diffusion Models via Temporal Noise Consistency : Abstract: Diffusion models have been widely deployed in AIGC services; however, their reliance on opaque training data and procedures exposes a broad attack surface for backdoor injection. In practica...
- CoMeT: Collaborative Memory Transformer for Efficient Long Context Modeling : Abstract: The quadratic complexity and indefinitely growing key-value (KV) cache of standard Transformers pose a major barrier to long-context processing. To overcome this, we introduce the Collaborat...
- IRIS: Implicit Reward-Guided Internal Sifting for Mitigating Multimodal Hallucination : Abstract: Hallucination remains a fundamental challenge for Multimodal Large Language Models (MLLMs). While Direct Preference Optimization (DPO) is a key alignment framework, existing approaches often...
- : One LLM Token for Explicit Graph Structural Understanding : Abstract: Large language models show great potential in unstructured data understanding, but still face significant challenges with graphs due to their structural hallucination. Existing approaches ma...
- DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics : Abstract: Data-independent acquisition mass spectrometry (DIA-MS) has established itself as a cornerstone of proteomic profiling and large-scale systems biology, offering unparalleled depth and reprod...
- Stein-Rule Shrinkage for Stochastic Gradient Estimation in High Dimensions : Abstract: Stochastic gradient methods are central to large-scale learning, yet their analysis typically treats mini-batch gradients as unbiased estimators of the population gradient. In high-dimension...
- RedVisor: Reasoning-Aware Prompt Injection Defense via Zero-Copy KV Cache Reuse : Abstract: Large Language Models (LLMs) are increasingly vulnerable to Prompt Injection (PI) attacks, where adversarial instructions hidden within retrieved contexts hijack the model's execution flow. ...
- Fast Autoregressive Video Diffusion and World Models with Temporal Cache Compression and Sparse Attention : Abstract: Autoregressive video diffusion models enable streaming generation, opening the door to long-form synthesis, video world models, and interactive neural game engines. However, their core atten...
- Beyond Precision: Training-Inference Mismatch is an Optimization Problem and Simple LR Scheduling Fixes It : Abstract: Reinforcement Learning (RL) for training Large Language Models is notoriously unstable. While recent studies attribute this to "training inference mismatch stemming" from inconsistent hybrid...
- DOGMA: Weaving Structural Information into Data-centric Single-cell Transcriptomics Analysis : Abstract: Recently, data-centric AI methodology has been a dominant paradigm in single-cell transcriptomics analysis, which treats data representation rather than model complexity as the fundamental b...
- CloDS: Visual-Only Unsupervised Cloth Dynamics Learning in Unknown Conditions : Abstract: Deep learning has demonstrated remarkable capabilities in simulating complex dynamic systems. However, existing methods require known physical properties as supervision or inputs, limiting t...
- Time2Vec-Integrated Transformer for Robust Gesture Recognition from Low-Density sEMG : Abstract: Accurate and responsive myoelectric prosthesis control typically relies on complex, dense multi-sensor arrays, which limits consumer accessibility. This paper presents a novel, data-efficien...
- GRAB: An LLM-Inspired Sequence-First Click-Through Rate Prediction Modeling Paradigm : Abstract: Traditional Deep Learning Recommendation Models (DLRMs) face increasing bottlenecks in performance and efficiency, often struggling with generalization and long-sequence modeling. Inspired b...
- ES-MemEval: Benchmarking Conversational Agents on Personalized Long-Term Emotional Support : Abstract: Large Language Models (LLMs) have shown strong potential as conversational agents. Yet, their effectiveness remains limited by deficiencies in robust long-term memory, particularly in comple...
- Learning Sparse Visual Representations via Spatial-Semantic Factorization : Abstract: Self-supervised learning (SSL) faces a fundamental conflict between semantic understanding and image reconstruction. High-level semantic SSL (e.g., DINO) relies on global tokens that are for...
- DSXFormer: Dual-Pooling Spectral Squeeze-Expansion and Dynamic Context Attention Transformer for Hyperspectral Image Classification : Abstract: Hyperspectral image classification (HSIC) is a challenging task due to high spectral dimensionality, complex spectral-spatial correlations, and limited labeled training samples. Although tra...
- Reliable Real-Time Value at Risk Estimation via Quantile Regression Forest with Conformal Calibration : Abstract: Rapidly evolving market conditions call for real-time risk monitoring, but its online estimation remains challenging. In this paper, we study the online estimation of one of the most widely ...
- VLM-Guided Experience Replay : Abstract: Recent advances in Large Language Models (LLMs) and Vision-Language Models (VLMs) have enabled powerful semantic and multimodal reasoning capabilities, creating new opportunities to enhance ...
- PIMPC-GNN: Physics-Informed Multi-Phase Consensus Learning for Enhancing Imbalanced Node Classification in Graph Neural Networks : Abstract: Graph neural networks (GNNs) often struggle in class-imbalanced settings, where minority classes are under-represented and predictions are biased toward majorities. We propose \textbf{PIMPC-...
- COLT: Lightweight Multi-LLM Collaboration through Shared MCTS Reasoning for Model Compilation : Abstract: Model serving costs dominate AI systems, making compiler optimization essential for scalable deployment. Recent works show that a large language model (LLM) can guide compiler search by reas...
- PIMCST: Physics-Informed Multi-Phase Consensus and Spatio-Temporal Few-Shot Learning for Traffic Flow Forecasting : Abstract: Accurate traffic flow prediction remains a fundamental challenge in intelligent transportation systems, particularly in cross-domain, data-scarce scenarios where limited historical data hind...
- T-LLM: Teaching Large Language Models to Forecast Time Series via Temporal Distillation : Abstract: Time series forecasting plays a critical role in decision-making across many real-world applications. Unlike data in vision and language domains, time series data is inherently tied to the e...
- Towards Exploratory and Focused Manipulation with Bimanual Active Perception: A New Problem, Benchmark and Strategy : Abstract: Recently, active vision has reemerged as an important concept for manipulation, since visual occlusion occurs more frequently when main cameras are mounted on the robot heads. We reflect on ...
- Human Society-Inspired Approaches to Agentic AI Security: The 4C Framework : Abstract: AI is moving from domain-specific autonomy in closed, predictable settings to large-language-model-driven agents that plan and act in open, cross-organizational environments. As a result, th...
- Efficient Epistemic Uncertainty Estimation for Large Language Models via Knowledge Distillation : Abstract: Quantifying uncertainty in Large Language Models (LLMs) is essential for mitigating hallucinations and enabling risk-aware deployment in safety-critical tasks. However, estimating Epistemic ...
- Zero-Shot Off-Policy Learning : Abstract: Off-policy learning methods seek to derive an optimal policy directly from a fixed dataset of prior interactions. This objective presents significant challenges, primarily due to the inheren...
- Breaking the Static Graph: Context-Aware Traversal for Robust Retrieval-Augmented Generation : Abstract: Recent advances in Retrieval-Augmented Generation (RAG) have shifted from simple vector similarity to structure-aware approaches like HippoRAG, which leverage Knowledge Graphs (KGs) and Pers...
- Mixture-of-Experts with Intermediate CTC Supervision for Accented Speech Recognition : Abstract: Accented speech remains a persistent challenge for automatic speech recognition (ASR), as most models are trained on data dominated by a few high-resource English varieties, leading to subst...
- Your AI-Generated Image Detector Can Secretly Achieve SOTA Accuracy, If Calibrated : Abstract: Despite being trained on balanced datasets, existing AI-generated image detectors often exhibit systematic bias at test time, frequently misclassifying fake images as real. We hypothesize th...
- IntraSlice: Towards High-Performance Structural Pruning with Block-Intra PCA for LLMs : Abstract: Large Language Models (LLMs) achieve strong performance across diverse tasks but face deployment challenges due to their massive size. Structured pruning offers acceleration benefits but lea...
- FlyPrompt: Brain-Inspired Random-Expanded Routing with Temporal-Ensemble Experts for General Continual Learning : Abstract: General continual learning (GCL) challenges intelligent systems to learn from single-pass, non-stationary data streams without clear task boundaries. While recent advances in continual param...
- SAME: Stabilized Mixture-of-Experts for Multimodal Continual Instruction Tuning : Abstract: Multimodal Large Language Models (MLLMs) achieve strong performance through instruction tuning, but real-world deployment requires them to continually expand their capabilities, making Multi...
- Optimizing Tensor Train Decomposition in DNNs for RISC-V Architectures Using Design Space Exploration and Compiler Optimizations : Abstract: Deep neural networks (DNNs) have become indispensable in many real-life applications like natural language processing, and autonomous systems. However, deploying DNNs on resource-constrained...
- On the Limits of Layer Pruning for Generative Reasoning in LLMs : Abstract: Recent works have shown that layer pruning can compress large language models (LLMs) while retaining strong performance on classification benchmarks with little or no finetuning. However, ex...
- SurfSplat: Conquering Feedforward 2D Gaussian Splatting with Surface Continuity Priors : Abstract: Reconstructing 3D scenes from sparse images remains a challenging task due to the difficulty of recovering accurate geometry and texture without optimization. Recent approaches leverage gene...
- Preserve-Then-Quantize: Balancing Rank Budgets for Quantization Error Reconstruction in LLMs : Abstract: Quantization Error Reconstruction (QER) reduces accuracy loss in Post-Training Quantization (PTQ) by approximating weights as $\mathbf{W} \approx \mathbf{Q} + \mathbf{L}\mathbf{R}$, using a ...
- ClueTracer: Question-to-Vision Clue Tracing for Training-Free Hallucination Suppression in Multimodal Reasoning : Abstract: Large multimodal reasoning models solve challenging visual problems via explicit long-chain inference: they gather visual clues from images and decode clues into textual tokens. Yet this cap...
- Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation : Abstract: Agent memory systems often adopt the standard Retrieval-Augmented Generation (RAG) pipeline, yet its underlying assumptions differ in this setting. RAG targets large, heterogeneous corpora w...
- Rethinking Genomic Modeling Through Optical Character Recognition : Abstract: Recent genomic foundation models largely adopt large language model architectures that treat DNA as a one-dimensional token sequence. However, exhaustive sequential reading is structurally m...
- One Size, Many Fits: Aligning Diverse Group-Wise Click Preferences in Large-Scale Advertising Image Generation : Abstract: Advertising image generation has increasingly focused on online metrics like Click-Through Rate (CTR), yet existing approaches adopt a ``one-size-fits-all" strategy that optimizes for overal...
- Bandwidth-Efficient Multi-Agent Communication through Information Bottleneck and Vector Quantization : Abstract: Multi-agent reinforcement learning systems deployed in real-world robotics applications face severe communication constraints that significantly impact coordination effectiveness. We present...
- Auto-Comp: An Automated Pipeline for Scalable Compositional Probing of Contrastive Vision-Language Models : Abstract: Modern Vision-Language Models (VLMs) exhibit a critical flaw in compositional reasoning, often confusing "a red cube and a blue sphere" with "a blue cube and a red sphere". Disentangling the...
- FORLER: Federated Offline Reinforcement Learning with Q-Ensemble and Actor Rectification : Abstract: In Internet-of-Things systems, federated learning has advanced online reinforcement learning (RL) by enabling parallel policy training without sharing raw data. However, interacting with rea...
- FiLoRA: Focus-and-Ignore LoRA for Controllable Feature Reliance : Abstract: Multimodal foundation models integrate heterogeneous signals across modalities, yet it remains poorly understood how their predictions depend on specific internal feature groups and whether ...
- See2Refine: Vision-Language Feedback Improves LLM-Based eHMI Action Designers : Abstract: Automated vehicles lack natural communication channels with other road users, making external Human-Machine Interfaces (eHMIs) essential for conveying intent and maintaining trust in shared ...
- Multi-View Stenosis Classification Leveraging Transformer-Based Multiple-Instance Learning Using Real-World Clinical Data : Abstract: Coronary artery stenosis is a leading cause of cardiovascular disease, diagnosed by analyzing the coronary arteries from multiple angiography views. Although numerous deep-learning models ha...
- LEC-KG: An LLM-Embedding Collaborative Framework for Domain-Specific Knowledge Graph Construction -- A Case Study on SDGs : Abstract: Constructing domain-specific knowledge graphs from unstructured text remains challenging due to heterogeneous entity mentions, long-tail relation distributions, and the absence of standardiz...
- WADEPre: A Wavelet-based Decomposition Model for Extreme Precipitation Nowcasting with Multi-Scale Learning : Abstract: The heavy-tailed nature of precipitation intensity impedes precise precipitation nowcasting. Standard models that optimize pixel-wise losses are prone to regression-to-the-mean bias, which b...
- Probabilistic Performance Guarantees for Multi-Task Reinforcement Learning : Abstract: Multi-task reinforcement learning trains generalist policies that can execute multiple tasks. While recent years have seen significant progress, existing approaches rarely provide formal per...
- The Verification Crisis: Expert Perceptions of GenAI Disinformation and the Case for Reproducible Provenance : Abstract: The growth of Generative Artificial Intelligence (GenAI) has shifted disinformation production from manual fabrication to automated, large-scale manipulation. This article presents findings ...
- Unifying Masked Diffusion Models with Various Generation Orders and Beyond : Abstract: Masked diffusion models (MDMs) are a potential alternative to autoregressive models (ARMs) for language generation, but generation quality depends critically on the generation order. Prior w...
- Toxicity Assessment in Preclinical Histopathology via Class-Aware Mahalanobis Distance for Known and Novel Anomalies : Abstract: Drug-induced toxicity remains a leading cause of failure in preclinical development and early clinical trials. Detecting adverse effects at an early stage is critical to reduce attrition and...
- Two-Stage Grid Optimization for Group-wise Quantization of LLMs : Abstract: Group-wise quantization is an effective strategy for mitigating accuracy degradation in low-bit quantization of large language models (LLMs). Among existing methods, GPTQ has been widely ado...
- Navigating Simply, Aligning Deeply: Winning Solutions for Mouse vs. AI 2025 : Abstract: Visual robustness and neural alignment remain critical challenges in developing artificial agents that can match biological vision systems. We present the winning approaches from Team HCMUS_...
- DISPO: Enhancing Training Efficiency and Stability in Reinforcement Learning for Large Language Model Mathematical Reasoning : Abstract: Reinforcement learning with verifiable rewards has emerged as a promising paradigm for enhancing the reasoning capabilities of large language models particularly in mathematics. Current appr...
- HERMES: A Holistic End-to-End Risk-Aware Multimodal Embodied System with Vision-Language Models for Long-Tail Autonomous Driving : Abstract: End-to-end autonomous driving models increasingly benefit from large vision--language models for semantic understanding, yet ensuring safe and accurate operation under long-tail conditions r...
- DeALOG: Decentralized Multi-Agents Log-Mediated Reasoning Framework : Abstract: Complex question answering across text, tables and images requires integrating diverse information sources. A framework supporting specialized processing with coordination and interpretabili...
- ESSAM: A Novel Competitive Evolution Strategies Approach to Reinforcement Learning for Memory Efficient LLMs Fine-Tuning : Abstract: Reinforcement learning (RL) has become a key training step for improving mathematical reasoning in large language models (LLMs), but it often has high GPU memory usage, which makes it hard t...
- LASS-ODE: Scaling ODE Computations to Connect Foundation Models with Dynamical Physical Systems : Abstract: Foundation models have transformed language, vision, and time series data analysis, yet progress on dynamic predictions for physical systems remains limited. Given the complexity of physical...
- Multi-Agent Teams Hold Experts Back : Abstract: Multi-agent LLM systems are increasingly deployed as autonomous collaborators, where agents interact freely rather than execute fixed, pre-specified workflows. In such settings, effective co...
- How Does Unfaithful Reasoning Emerge from Autoregressive Training? A Study of Synthetic Experiments : Abstract: Chain-of-thought (CoT) reasoning generated by large language models (LLMs) is often unfaithful: intermediate steps can be logically inconsistent or fail to reflect the causal relationship le...
- Offline Discovery of Interpretable Skills from Multi-Task Trajectories : Abstract: Hierarchical Imitation Learning is a powerful paradigm for acquiring complex robot behaviors from demonstrations. A central challenge, however, lies in discovering reusable skills from long-...
- Inter- and Intra-Subject Variability in EEG: A Systematic Survey : Abstract: Electroencephalography (EEG) underpins neuroscience, clinical neurophysiology, and brain-computer interfaces (BCIs), yet pronounced inter- and intra-subject variability limits reliability, r...
- Calibrating Behavioral Parameters with Large Language Models : Abstract: Behavioral parameters such as loss aversion, herding, and extrapolation are central to asset pricing models but remain difficult to measure reliably. We develop a framework that treats large...
- Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment : Abstract: Query Auto-Completion (QAC) suggests query completions as users type, helping them articulate intent and reach results more efficiently. Existing approaches face fundamental challenges: trad...
- Toward Universal and Transferable Jailbreak Attacks on Vision-Language Models : Abstract: Vision-language models (VLMs) extend large language models (LLMs) with vision encoders, enabling text generation conditioned on both images and text. However, this multimodal integration exp...
- HierCon: Hierarchical Contrastive Attention for Audio Deepfake Detection : Abstract: Audio deepfakes generated by modern TTS and voice conversion systems are increasingly difficult to distinguish from real speech, raising serious risks for security and online trust. While st...
- VEQ: Modality-Adaptive Quantization for MoE Vision-Language Models : Abstract: Mixture-of-Experts(MoE) Vision-Language Models (VLMs) offer remarkable performance but incur prohibitive memory and computational costs, making compression essential. Post-Training Quantizat...
- Adaptive Dual-Weighting Framework for Federated Learning via Out-of-Distribution Detection : Abstract: Federated Learning (FL) enables collaborative model training across large-scale distributed service nodes while preserving data privacy, making it a cornerstone of intelligent service system...
- Superposition unifies power-law training dynamics : Abstract: We investigate the role of feature superposition in the emergence of power-law training dynamics using a teacher-student framework. We first derive an analytic theory for training without su...
- Residual Decoding: Mitigating Hallucinations in Large Vision-Language Models via History-Aware Residual Guidance : Abstract: Large Vision-Language Models (LVLMs) can reason effectively from image-text inputs and perform well in various multimodal tasks. Despite this success, they are affected by language priors an...
- Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning : Abstract: Post-training of reasoning LLMs is a holistic process that typically consists of an offline SFT stage followed by an online reinforcement learning (RL) stage. However, SFT is often optimized...
- TLDiffGAN: A Latent Diffusion-GAN Framework with Temporal Information Fusion for Anomalous Sound Detection : Abstract: Existing generative models for unsupervised anomalous sound detection are limited by their inability to fully capture the complex feature distribution of normal sounds, while the potential o...
- Personality Expression Across Contexts: Linguistic and Behavioral Variation in LLM Agents : Abstract: Large Language Models (LLMs) can be conditioned with explicit personality prompts, yet their behavioral realization often varies depending on context. This study examines how identical perso...
- From Utterance to Vividity: Training Expressive Subtitle Translation LLM via Adaptive Local Preference Optimization : Abstract: The rapid development of Large Language Models (LLMs) has significantly enhanced the general capabilities of machine translation. However, as application scenarios become more complex, the l...
- Vortex Stretching in the Navier-Stokes Equations and Information Dissipation in Diffusion Models: A Reformulation from a Partial Differential Equation Viewpoint : Abstract: We present a new inverse-time formulation of vortex stretching in the Navier-Stokes equations, based on a PDE framework inspired by score-based diffusion models. By absorbing the ill-posed b...
- OLion: Approaching the Hadamard Ideal by Intersecting Spectral and $\ell_{\infty}$ Implicit Biases : Abstract: Many optimizers can be interpreted as steepest-descent methods under norm-induced geometries, and thus inherit corresponding implicit biases. We introduce \nameA{} (\fullname{}), which combi...
- SPELL: Synthesis of Programmatic Edits using LLMs : Abstract: Library migration is a common but error-prone task in software development. Developers may need to replace one library with another due to reasons like changing requirements or licensing cha...
- MarkovScale: Towards Optimal Sequential Scaling at Inference Time : Abstract: Sequential scaling is a prominent inference-time scaling paradigm, yet its performance improvements are typically modest and not well understood, largely due to the prevalence of heuristic, ...
- Statistical MIA: Rethinking Membership Inference Attack for Reliable Unlearning Auditing : Abstract: Machine unlearning (MU) is essential for enforcing the right to be forgotten in machine learning systems. A key challenge of MU is how to reliably audit whether a model has truly forgotten s...
- Multi-Horizon Electricity Price Forecasting with Deep Learning in the Australian National Electricity Market : Abstract: Accurate electricity price forecasting (EPF) is essential for operational planning, trading, and flexible asset scheduling in liberalised power systems, yet remains challenging due to volati...
- Semantically Aware UAV Landing Site Assessment from Remote Sensing Imagery via Multimodal Large Language Models : Abstract: Safe UAV emergency landing requires more than just identifying flat terrain; it demands understanding complex semantic risks (e.g., crowds, temporary structures) invisible to traditional geo...
- FedBGS: A Blockchain Approach to Segment Gossip Learning in Decentralized Systems : Abstract: Privacy-Preserving Federated Learning (PPFL) is a Decentralized machine learning paradigm that enables multiple participants to collaboratively train a global model without sharing their dat...
- The Gaussian-Head OFL Family: One-Shot Federated Learning from Client Global Statistics : Abstract: Classical Federated Learning relies on a multi-round iterative process of model exchange and aggregation between server and clients, with high communication costs and privacy risks from repe...
- Autoregressive, Yet Revisable: In Decoding Revision for Secure Code Generation : Abstract: Large Language Model (LLM) based code generation is predominantly formulated as a strictly monotonic process, appending tokens linearly to an immutable prefix. This formulation contrasts to ...
- AI Meets Plasticity: A Comprehensive Survey : Abstract: Artificial intelligence (AI) is rapidly emerging as a new paradigm of scientific discovery, namely data-driven science, across nearly all scientific disciplines. In materials science and eng...
- Learning from Anonymized and Incomplete Tabular Data : Abstract: User-driven privacy allows individuals to control whether and at what granularity their data is shared, leading to datasets that mix original, generalized, and missing values within the same...
- Supervised Fine-Tuning Needs to Unlock the Potential of Token Priority : Abstract: The transition from fitting empirical data to achieving true human utility is fundamentally constrained by a granularity mismatch, where fine-grained autoregressive generation is often super...
- Lotus: Efficient LLM Training by Randomized Low-Rank Gradient Projection with Adaptive Subspace Switching : Abstract: Training efficiency in large-scale models is typically assessed through memory consumption, training time, and model performance. Current methods often exhibit trade-offs among these metrics...
- Mechanistic Interpretability of Brain-to-Speech Models Across Speech Modes : Abstract: Brain-to-speech decoding models demonstrate robust performance in vocalized, mimed, and imagined speech; yet, the fundamental mechanisms via which these models capture and transmit informati...
- Sample Efficient Active Algorithms for Offline Reinforcement Learning : Abstract: Offline reinforcement learning (RL) enables policy learning from static data but often suffers from poor coverage of the state-action space and distributional shift problems. This problem ca...
- PACER: Blockwise Pre-verification for Speculative Decoding with Adaptive Length : Abstract: Speculative decoding (SD) is a powerful technique for accelerating the inference process of large language models (LLMs) without sacrificing accuracy. Typically, SD employs a small draft mod...
- Multi-LLM Adaptive Conformal Inference for Reliable LLM Responses : Abstract: Ensuring factuality is essential for the safe use of Large Language Models (LLMs) in high-stakes domains such as medicine and law. Conformal inference provides distribution-free guarantees, ...
- Dispelling the Curse of Singularities in Neural Network Optimizations : Abstract: This work investigates the optimization instability of deep neural networks from a less-explored yet insightful perspective: the emergence and amplification of singularities in the parametri...
- EverMemBench: Benchmarking Long-Term Interactive Memory in Large Language ModelsEverMemBench: Benchmarking Long-Term Interactive Memory in Large Language Models : Abstract: Long-term conversational memory is essential for LLM-based assistants, yet existing benchmarks focus on dyadic, single-topic dialogues that fail to capture real-world complexity. We introduc...
- TxRay: Agentic Postmortem of Live Blockchain Attacks : Abstract: Decentralized Finance (DeFi) has turned blockchains into financial infrastructure, allowing anyone to trade, lend, and build protocols without intermediaries, but this openness exposes pools...
- Beyond Pixels: Visual Metaphor Transfer via Schema-Driven Agentic Reasoning : Abstract: A visual metaphor constitutes a high-order form of human creativity, employing cross-domain semantic fusion to transform abstract concepts into impactful visual rhetoric. Despite the remarka...
- Adaptive Quantum-Safe Cryptography for 6G Vehicular Networks via Context-Aware Optimization : Abstract: Powerful quantum computers in the future may be able to break the security used for communication between vehicles and other devices (Vehicle-to-Everything, or V2X). New security methods cal...
- Towards knowledge-based workflows: a semantic approach to atomistic simulations for mechanical and thermodynamic properties : Abstract: Mechanical and thermodynamic properties, including the influence of crystal defects, are critical for evaluating materials in engineering applications. Molecular dynamics simulations provide...
- PaAno: Patch-Based Representation Learning for Time-Series Anomaly Detection : Abstract: Although recent studies on time-series anomaly detection have increasingly adopted ever-larger neural network architectures such as transformers and foundation models, they incur high comput...
- When Domains Interact: Asymmetric and Order-Sensitive Cross-Domain Effects in Reinforcement Learning for Reasoning : Abstract: Group Relative Policy Optimization (GRPO) has become a key technique for improving reasoning abilities in large language models, yet its behavior under different domain sequencing strategies...
- Deep Variational Contrastive Learning for Joint Risk Stratification and Time-to-Event Estimation : Abstract: Survival analysis is essential for clinical decision-making, as it allows practitioners to estimate time-to-event outcomes, stratify patient risk profiles, and guide treatment planning. Deep...
- PolyGen: Fully Synthetic Vision-Language Training via Multi-Generator Ensembles : Abstract: Synthetic data offers a scalable solution for vision-language pre-training, yet current state-of-the-art methods typically rely on scaling up a single generative backbone, which introduces g...
- Context Dependence and Reliability in Autoregressive Language Models : Abstract: Large language models (LLMs) generate outputs by utilizing extensive context, which often includes redundant information from prompts, retrieved passages, and interaction history. In critica...
- "If You're Very Clever, No One Knows You've Used It": The Social Dynamics of Developing Generative AI Literacy in the Workplace : Abstract: Generative AI (GenAI) tools are rapidly transforming knowledge work, making AI literacy a critical priority for organizations. However, research on AI literacy lacks empirical insight into h...
- How well can VLMs rate audio descriptions: A multi-dimensional quantitative assessment framework : Abstract: Digital video is central to communication, education, and entertainment, but without audio description (AD), blind and low-vision audiences are excluded. While crowdsourced platforms and vis...
- An Odd Estimator for Shapley Values : Abstract: The Shapley value is a ubiquitous framework for attribution in machine learning, encompassing feature importance, data valuation, and causal inference. However, its exact computation is gene...
- From Pragmas to Partners: A Symbiotic Evolution of Agentic High-Level Synthesis : Abstract: The rise of large language models has sparked interest in AI-driven hardware design, raising the question: does high-level synthesis (HLS) still matter in the agentic era? We argue that HLS ...
- Semi-supervised CAPP Transformer Learning via Pseudo-labeling : Abstract: High-level Computer-Aided Process Planning (CAPP) generates manufacturing process plans from part specifications. It suffers from limited dataset availability in industry, reducing model gen...
- DCD: Decomposition-based Causal Discovery from Autocorrelated and Non-Stationary Temporal Data : Abstract: Multivariate time series in domains such as finance, climate science, and healthcare often exhibit long-term trends, seasonal patterns, and short-term fluctuations, complicating causal infer...
- CIPHER: Cryptographic Insecurity Profiling via Hybrid Evaluation of Responses : Abstract: Large language models (LLMs) are increasingly used to assist developers with code, yet their implementations of cryptographic functionality often contain exploitable flaws. Minor design choi...
- TQL: Scaling Q-Functions with Transformers by Preventing Attention Collapse : Abstract: Despite scale driving substantial recent advancements in machine learning, reinforcement learning (RL) methods still primarily use small value functions. Naively scaling value functions -- i...
- The Gradient-Causal Gap: Why Gradient Importance Fails on Complex Tasks : Abstract: Removing ''important'' high-gradient components from a neural network can improve generalization, while removing unimportant'' low-gradient components can destroy it. We demonstrate this par...
- SentiFuse: Deep Multi-model Fusion Framework for Robust Sentiment Extraction : Abstract: Sentiment analysis models exhibit complementary strengths, yet existing approaches lack a unified framework for effective integration. We present SentiFuse, a flexible and model-agnostic fra...
- Cross-Paradigm Evaluation of Gaze-Based Semantic Object Identification for Intelligent Vehicles : Abstract: Understanding where drivers direct their visual attention during driving, as characterized by gaze behavior, is critical for developing next-generation advanced driver-assistance systems and...
- Understanding vision transformer robustness through the lens of out-of-distribution detection : Abstract: Vision transformers have shown remarkable performance in vision tasks, but enabling them for accessible and real-time use is still challenging. Quantization reduces memory and inference cost...
- P-EAGLE: Parallel-Drafting EAGLE with Scalable Training : Abstract: Reasoning LLMs produce longer outputs, requiring speculative decoding drafters trained on extended sequences. Parallel drafting - predicting multiple tokens per forward pass - offers latency...
- Rod Flow: A Continuous-Time Model for Gradient Descent at the Edge of Stability : Abstract: How can we understand gradient-based training over non-convex landscapes? The edge of stability phenomenon, introduced in Cohen et al. (2021), indicates that the answer is not so simple: nam...
- Community-Level Modeling of Gyral Folding Patterns for Robust and Anatomically Informed Individualized Brain Mapping : Abstract: Cortical folding exhibits substantial inter-individual variability while preserving stable anatomical landmarks that enable fine-scale characterization of cortical organization. Among these,...
- Causal Preference Elicitation : Abstract: We propose causal preference elicitation, a Bayesian framework for expert-in-the-loop causal discovery that actively queries local edge relations to concentrate a posterior over directed acy...
- OpInf-LLM: Parametric PDE Solving with LLMs via Operator Inference : Abstract: Solving diverse partial differential equations (PDEs) is fundamental in science and engineering. Large language models (LLMs) have demonstrated strong capabilities in code generation, symbol...
- Draw2Learn: A Human-AI Collaborative Tool for Drawing-Based Science Learning : Abstract: Drawing supports learning by externalizing mental models, but providing timely feedback at scale remains challenging. We present Draw2Learn, a system that explores how AI can act as a suppor...
- Governance at the Edge of Architecture: Regulating NeuroAI and Neuromorphic Systems : Abstract: Current AI governance frameworks, including regulatory benchmarks for accuracy, latency, and energy efficiency, are built for static, centrally trained artificial neural networks on von Neum...
- Harnessing Flexible Spatial and Temporal Data Center Workloads for Grid Regulation Services : Abstract: Data centers (DCs) are increasingly recognized as flexible loads that can support grid frequency regulation. Yet, most existing methods treat workload scheduling and regulation capacity bidd...
- MarkCleaner: High-Fidelity Watermark Removal via Imperceptible Micro-Geometric Perturbation : Abstract: Semantic watermarks exhibit strong robustness against conventional image-space attacks. In this work, we show that such robustness does not survive under micro-geometric perturbations: spati...
- White-Box Neural Ensemble for Vehicular Plasticity: Quantifying the Efficiency Cost of Symbolic Auditability in Adaptive NMPC : Abstract: We present a white-box adaptive NMPC architecture that resolves vehicular plasticity (adaptation to varying operating regimes without retraining) by arbitrating among frozen, regime-specific...
- You Need an Encoder for Native Position-Independent Caching : Abstract: The Key-Value (KV) cache of Large Language Models (LLMs) is prefix-based, making it highly inefficient for processing contexts retrieved in arbitrary order. Position-Independent Caching (PIC...
- A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning : Abstract: Reinforcement learning (RL) is a dominant paradigm for improving the reasoning abilities of large language models, yet its effectiveness varies across tasks and compute budgets. We propose a...
- Toward a Machine Bertin: Why Visualization Needs Design Principles for Machine Cognition : Abstract: Visualization's design knowledge-effectiveness rankings, encoding guidelines, color models, preattentive processing rules -- derives from six decades of psychophysical studies of human visio...
- Making Avatars Interact: Towards Text-Driven Human-Object Interaction for Controllable Talking Avatars : Abstract: Generating talking avatars is a fundamental task in video generation. Although existing methods can generate full-body talking avatars with simple human motion, extending this task to ground...
- Toward Cognitive Supersensing in Multimodal Large Language Model : Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable success in open-vocabulary perceptual tasks, yet their ability to solve complex cognitive problems remains limited, especial...
- Plain Transformers are Surprisingly Powerful Link Predictors : Abstract: Link prediction is a core challenge in graph machine learning, demanding models that capture rich and complex topological dependencies. While Graph Neural Networks (GNNs) are the standard so...
- InfoTok: Regulating Information Flow for Capacity-Constrained Shared Visual Tokenization in Unified MLLMs : Abstract: Unified multimodal large language models (MLLMs) integrate image understanding and generation in a single framework, with the visual tokenizer acting as the sole interface that maps visual i...
- Multimodal UNcommonsense: From Odd to Ordinary and Ordinary to Odd : Abstract: Commonsense reasoning in multimodal contexts remains a foundational challenge in artificial intelligence. We introduce Multimodal UNcommonsense(MUN), a benchmark designed to evaluate models'...
- DREAMS: A Social Exchange Theory-Informed Modeling of Misinformation Engagement on Social Media : Abstract: Social media engagement prediction is a central challenge in computational social science, particularly for understanding how users interact with misinformation. Existing approaches often tr...
- Generative Visual Code Mobile World Models : Abstract: Mobile Graphical User Interface (GUI) World Models (WMs) offer a promising path for improving mobile GUI agent performance at train- and inference-time. However, current approaches face a cr...
- DrawSim-PD: Simulating Student Science Drawings to Support NGSS-Aligned Teacher Diagnostic Reasoning : Abstract: Developing expertise in diagnostic reasoning requires practice with diverse student artifacts, yet privacy regulations prohibit sharing authentic student work for teacher professional develo...
- On the Fragility of AI-Based Channel Decoders under Small Channel Perturbations : Abstract: Recent advances in deep learning have led to AI-based error correction decoders that report empirical performance improvements over traditional belief-propagation (BP) decoding on AWGN chann...
- Provable Defense Framework for LLM Jailbreaks via Noise-Augumented Alignment : Abstract: Large Language Models (LLMs) remain vulnerable to adaptive jailbreaks that easily bypass empirical defenses like GCG. We propose a framework for certifiable robustness that shifts safety gua...
- Spectral Text Fusion: A Frequency-Aware Approach to Multimodal Time-Series Forecasting : Abstract: Multimodal time series forecasting is crucial in real-world applications, where decisions depend on both numerical data and contextual signals. The core challenge is to effectively combine t...
- Detecting AI-Generated Content in Academic Peer Reviews : Abstract: The growing availability of large language models (LLMs) has raised questions about their role in academic peer review. This study examines the temporal emergence of AI-generated content in ...
- In-Run Data Shapley for Adam Optimizer : Abstract: Reliable data attribution is essential for mitigating bias and reducing computational waste in modern machine learning, with the Shapley value serving as the theoretical gold standard. While...
- Bridging the Semantic Chasm: Synergistic Conceptual Anchoring for Generalized Few-Shot and Zero-Shot OOD Perception : Abstract: This manuscript presents a pioneering Synergistic Neural Agents Network (SynerNet) framework designed to mitigate the phenomenon of cross-modal alignment degeneration in Vision-Language Mode...
- Standardized Methods and Recommendations for Green Federated Learning : Abstract: Federated learning (FL) enables collaborative model training over privacy-sensitive, distributed data, but its environmental impact is difficult to compare across studies due to inconsistent...
- When RAG Hurts: Diagnosing and Mitigating Attention Distraction in Retrieval-Augmented LVLMs : Abstract: While Retrieval-Augmented Generation (RAG) is one of the dominant paradigms for enhancing Large Vision-Language Models (LVLMs) on knowledge-based VQA tasks, recent work attributes RAG failur...
- AdaFuse: Adaptive Multimodal Fusion for Lung Cancer Risk Prediction via Reinforcement Learning : Abstract: Multimodal fusion has emerged as a promising paradigm for disease diagnosis and prognosis, integrating complementary information from heterogeneous data sources such as medical images, clini...
- Post-Training Probability Manifold Correction via Structured SVD Pruning and Self-Referential Distillation : Abstract: Large language models are expensive to deploy. We introduce Sparse Knowledge Distillation (SparseKD), a post-training method that compresses transformer models by combining structured SVD pr...
- Generalized Inverses of Matrix Products: From Fundamental Subspaces to Randomized Decompositions : Abstract: We investigate the Moore-Penrose pseudoinverse and generalized inverse of a matrix product $A=CR$ to establish a unifying framework for generalized and randomized matrix inverses. This analy...
- Fast Forward: Accelerating LLM Prefill with Predictive FFN Sparsity : Abstract: The prefill stage of large language model (LLM) inference is a key computational bottleneck for long-context workloads. At short-to-moderate context lengths (1K--16K tokens), Feed-Forward Ne...
- ZEST: Zero-shot Embodied Skill Transfer for Athletic Robot Control : Abstract: Achieving robust, human-like whole-body control on humanoid robots for agile, contact-rich behaviors remains a central challenge, demanding heavy per-skill engineering and a brittle process ...
- A Conditional Companion: Lived Experiences of People with Mental Health Disorders Using LLMs : Abstract: Large Language Models (LLMs) are increasingly used for mental health support, yet little is known about how people with mental health challenges engage with them, how they evaluate their use...
- Variational Approach for Job Shop Scheduling : Abstract: This paper proposes a novel Variational Graph-to-Scheduler (VG2S) framework for solving the Job Shop Scheduling Problem (JSSP), a critical task in manufacturing that directly impacts operati...
- Robustness of AutoML on Dirty Categorical Data : Abstract: The goal of automated machine learning (AutoML) is to reduce trial and error when doing machine learning (ML). Although AutoML methods for classification are able to deal with data imperfect...
- Text is All You Need for Vision-Language Model Jailbreaking : Abstract: Large Vision-Language Models (LVLMs) are increasingly equipped with robust safety safeguards to prevent responses to harmful or disallowed prompts. However, these defenses often focus on ana...
- LLMs as High-Dimensional Nonlinear Autoregressive Models with Attention: Training, Alignment and Inference : Abstract: Large language models (LLMs) based on transformer architectures are typically described through collections of architectural components and training procedures, obscuring their underlying co...
- When Agents "Misremember" Collectively: Exploring the Mandela Effect in LLM-based Multi-Agent Systems : Abstract: Recent advancements in large language models (LLMs) have significantly enhanced the capabilities of collaborative multi-agent systems, enabling them to address complex challenges. However, w...
- LatentTrack: Sequential Weight Generation via Latent Filtering : Abstract: We introduce LatentTrack (LT), a sequential neural architecture for online probabilistic prediction under nonstationary dynamics. LT performs causal Bayesian filtering in a low-dimensional l...
- LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs : Abstract: Transforming a large language model (LLM) into a Vision-Language Model (VLM) can be achieved by mapping the visual tokens from a vision encoder into the embedding space of an LLM. Intriguing...
- PAIR-Former: Budgeted Relational MIL for miRNA Target Prediction : Abstract: Functional miRNA--mRNA targeting is a large-bag prediction problem: each transcript yields a heavy-tailed pool of candidate target sites (CTSs), yet only a pair-level label is observed. We f...
- Words that make SENSE: Sensorimotor Norms in Learned Lexical Token Representations : Abstract: While word embeddings derive meaning from co-occurrence patterns, human language understanding is grounded in sensory and motor experience. We present $\text{SENSE}$ $(\textbf{S}\text{ensori...
- Quantum Phase Recognition via Quantum Attention Mechanism : Abstract: Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large system...
- Quality-Diversity Optimization as Multi-Objective Optimization : Abstract: The Quality-Diversity (QD) optimization aims to discover a collection of high-performing solutions that simultaneously exhibit diverse behaviors within a user-defined behavior space. This pa...
- From Junior to Senior: Allocating Agency and Navigating Professional Growth in Agentic AI-Mediated Software Engineering : Abstract: Juniors enter as AI-natives, seniors adapted mid-career. AI is not just changing how engineers code-it is reshaping who holds agency across work and professional growth. We contribute junior...
- Culturally-Grounded Governance for Multilingual Language Models: Rights, Data Boundaries, and Accountable AI Design : Abstract: Multilingual large language models (MLLMs) are increasingly deployed across cultural, linguistic, and political contexts, yet existing governance frameworks largely assume English-centric da...
- Sparse Shortcuts: Facilitating Efficient Fusion in Multimodal Large Language Models : Abstract: With the remarkable success of large language models (LLMs) in natural language understanding and generation, multimodal large language models (MLLMs) have rapidly advanced in their ability ...
- Contrastive Learning for Privacy Enhancements in Industrial Internet of Things : Abstract: The Industrial Internet of Things (IIoT) integrates intelligent sensing, communication, and analytics into industrial environments, including manufacturing, energy, and critical infrastructu...
- Physiology as Language: Translating Respiration to Sleep EEG : Abstract: This paper introduces a novel cross-physiology translation task: synthesizing sleep electroencephalography (EEG) from respiration signals. To address the significant complexity gap between t...
- Convergent World Representations and Divergent Tasks : Abstract: While neural representations are central to modern deep learning, the conditions governing their geometry and their roles in downstream adaptability remain poorly understood. We develop a fr...
- Contrastive Domain Generalization for Cross-Instrument Molecular Identification in Mass Spectrometry : Abstract: Identifying molecules from mass spectrometry (MS) data remains a fundamental challenge due to the semantic gap between physical spectral peaks and underlying chemical structures. Existing de...
- Learning to Decode Against Compositional Hallucination in Video Multimodal Large Language Models : Abstract: Current research on video hallucination mitigation primarily focuses on isolated error types, leaving compositional hallucinations, arising from incorrect reasoning over multiple interacting...
- Data Distribution as a Lever for Guiding Optimizers Toward Superior Generalization in LLMs : Abstract: Can modifying the training data distribution guide optimizers toward solutions with improved generalization when training large language models (LLMs)? In this work, we theoretically analyze...
- MAUGen: A Unified Diffusion Approach for Multi-Identity Facial Expression and AU Label Generation : Abstract: The lack of large-scale, demographically diverse face images with precise Action Unit (AU) occurrence and intensity annotations has long been recognized as a fundamental bottleneck in develo...
- RAG-GNN: Integrating Retrieved Knowledge with Graph Neural Networks for Precision Medicine : Abstract: Network topology excels at structural predictions but fails to capture functional semantics encoded in biomedical literature. We present a retrieval-augmented generation (RAG) embedding fram...
- Multimodal Machine Learning for Integrating Heterogeneous Analytical Systems : Abstract: Understanding structure-property relationships in complex materials requires integrating complementary measurements across multiple length scales. Here we propose an interpretable "multimoda...
- Hermes the Polyglot: A Unified Framework to Enhance Expressiveness for Multimodal Interlingual Subtitling : Abstract: Interlingual subtitling, which translates subtitles of visual media into a target language, is essential for entertainment localization but has not yet been explored in machine translation. ...
- Jailbreaking LLMs via Calibration : Abstract: Safety alignment in Large Language Models (LLMs) often creates a systematic discrepancy between a model's aligned output and the underlying pre-aligned data distribution. We propose a framew...
- Rethinking Zero-Shot Time Series Classification: From Task-specific Classifiers to In-Context Inference : Abstract: The zero-shot evaluation of time series foundation models (TSFMs) for classification typically uses a frozen encoder followed by a task-specific classifier. However, this practice violates t...
- MoDEx: Mixture of Depth-specific Experts for Multivariate Long-term Time Series Forecasting : Abstract: Multivariate long-term time series forecasting (LTSF) supports critical applications such as traffic-flow management, solar-power scheduling, and electricity-transformer monitoring. The exis...
- From Associations to Activations: Comparing Behavioral and Hidden-State Semantic Geometry in LLMs : Abstract: We investigate the extent to which an LLM's hidden-state geometry can be recovered from its behavior in psycholinguistic experiments. Across eight instruction-tuned transformer models, we ru...
- Action-Free Offline-to-Online RL via Discretised State Policies : Abstract: Most existing offline RL methods presume the availability of action labels within the dataset, but in many practical scenarios, actions may be missing due to privacy, storage, or sensor limi...
- S$^3$POT: Contrast-Driven Face Occlusion Segmentation via Self-Supervised Prompt Learning : Abstract: Existing face parsing methods usually misclassify occlusions as facial components. This is because occlusion is a high-level concept, it does not refer to a concrete category of object. Thus...
- Non-Contrastive Vision-Language Learning with Predictive Embedding Alignment : Abstract: Vision-language models have transformed multimodal representation learning, yet dominant contrastive approaches like CLIP require large batch sizes, careful negative sampling, and extensive ...
- Can Small Language Models Handle Context-Summarized Multi-Turn Customer-Service QA? A Synthetic Data-Driven Comparative Evaluation : Abstract: Customer-service question answering (QA) systems increasingly rely on conversational language understanding. While Large Language Models (LLMs) achieve strong performance, their high computa...
- Improving Neuropathological Reconstruction Fidelity via AI Slice Imputation : Abstract: Neuropathological analyses benefit from spatially precise volumetric reconstructions that enhance anatomical delineation and improve morphometric accuracy. Our prior work has shown the feasi...
- RecGOAT: Graph Optimal Adaptive Transport for LLM-Enhanced Multimodal Recommendation with Dual Semantic Alignment : Abstract: Multimodal recommendation systems typically integrates user behavior with multimodal data from items, thereby capturing more accurate user preferences. Concurrently, with the rise of large m...
- Forecasting Energy Availability in Local Energy Communities via LSTM Federated Learning : Abstract: Local Energy Communities are emerging as crucial players in the landscape of sustainable development. A significant challenge for these communities is achieving self-sufficiency through effe...
- From Detection to Prevention: Explaining Security-Critical Code to Avoid Vulnerabilities : Abstract: Security vulnerabilities often arise unintentionally during development due to a lack of security expertise and code complexity. Traditional tools, such as static and dynamic analysis, detec...
- Deep Time-series Forecasting Needs Kernelized Moment Balancing : Abstract: Deep time-series forecasting can be formulated as a distribution balancing problem aimed at aligning the distribution of the forecasts and ground truths. According to Imbens' criterion, true...
- Rethinking Hallucinations: Correctness, Consistency, and Prompt Multiplicity : Abstract: Large language models (LLMs) are known to "hallucinate" by generating false or misleading outputs. Hallucinations pose various harms, from erosion of trust to widespread misinformation. Exis...
- Augmenting Clinical Decision-Making with an Interactive and Interpretable AI Copilot: A Real-World User Study with Clinicians in Nephrology and Obstetrics : Abstract: Clinician skepticism toward opaque AI hinders adoption in high-stakes healthcare. We present AICare, an interactive and interpretable AI copilot for collaborative clinical decision-making. B...
- EchoReview: Learning Peer Review from the Echoes of Scientific Citations : Abstract: As the volume of scientific submissions continues to grow rapidly, traditional peer review systems are facing unprecedented scalability pressures, highlighting the urgent need for automated ...
- Pareto-Conditioned Diffusion Models for Offline Multi-Objective Optimization : Abstract: Multi-objective optimization (MOO) arises in many real-world applications where trade-offs between competing objectives must be carefully balanced. In the offline setting, where only a stati...
- ExperienceWeaver: Optimizing Small-sample Experience Learning for LLM-based Clinical Text Improvement : Abstract: Clinical text improvement is vital for healthcare efficiency but remains difficult due to limited high-quality data and the complex constraints of medical documentation. While Large Language...
- SA-VLA: Spatially-Aware Flow-Matching for Vision-Language-Action Reinforcement Learning : Abstract: Vision-Language-Action (VLA) models exhibit strong generalization in robotic manipulation, yet reinforcement learning (RL) fine-tuning often degrades robustness under spatial distribution sh...
- Decouple Searching from Training: Scaling Data Mixing via Model Merging for Large Language Model Pre-training : Abstract: Determining an effective data mixture is a key factor in Large Language Model (LLM) pre-training, where models must balance general competence with proficiency on hard tasks such as math and...
- HyperOffload: Graph-Driven Hierarchical Memory Management for Large Language Models on SuperNode Architectures : Abstract: The rapid evolution of Large Language Models (LLMs) towards long-context reasoning and sparse architectures has pushed memory requirements far beyond the capacity of individual device HBM. W...
- Bypassing Prompt Injection Detectors through Evasive Injections : Abstract: Large language models (LLMs) are increasingly used in interactive and retrieval-augmented systems, but they remain vulnerable to task drift; deviations from a user's intended instruction due...
- GraphNNK -- Graph Classification and Interpretability : Abstract: Graph Neural Networks (GNNs) have become a standard approach for learning from graph-structured data. However, their reliance on parametric classifiers (most often linear softmax layers) lim...
- Evolving Interpretable Constitutions for Multi-Agent Simulation : Abstract: Constitutional AI has focused on single-model alignment using fixed principles. However, multi-agent systems create novel alignment challenges through emergent social dynamics. We present Co...
- Adaptive Ability Decomposing for Unlocking Large Reasoning Model Effective Reinforcement Learning : Abstract: Reinforcement learning with verifiable rewards (RLVR) has shown great potential to enhance the reasoning ability of large language models (LLMs). However, due to the limited amount of inform...
- Evaluating Deep Learning-Based Nerve Segmentation in Brachial Plexus Ultrasound Under Realistic Data Constraints : Abstract: Accurate nerve localization is critical for the success of ultrasound-guided regional anesthesia, yet manual identification remains challenging due to low image contrast, speckle noise, and ...
- BLOCK-EM: Preventing Emergent Misalignment by Blocking Causal Features : Abstract: Emergent misalignment can arise when a language model is fine-tuned on a narrowly scoped supervised objective: the model learns the target behavior, yet also develops undesirable out-of-doma...
- Eliciting Trustworthiness Priors of Large Language Models via Economic Games : Abstract: One critical aspect of building human-centered, trustworthy artificial intelligence (AI) systems is maintaining calibrated trust: appropriate reliance on AI systems outperforms both overtrus...
- Controlling Repetition in Protein Language Models : Abstract: Protein language models (PLMs) have enabled advances in structure prediction and de novo protein design, yet they frequently collapse into pathological repetition during generation. Unlike i...
- Multi-Objective Multi-Fidelity Bayesian Optimization with Causal Priors : Abstract: Multi-fidelity Bayesian optimization (MFBO) accelerates the search for the global optimum of black-box functions by integrating inexpensive, low-fidelity approximations. The central task of ...
- Latent Shadows: The Gaussian-Discrete Duality in Masked Diffusion : Abstract: Masked discrete diffusion is a dominant paradigm for high-quality language modeling where tokens are iteratively corrupted to a mask state, yet its inference efficiency is bottlenecked by th...
- JTok: On Token Embedding as another Axis of Scaling Law via Joint Token Self-modulation : Abstract: LLMs have traditionally scaled along dense dimensions, where performance is coupled with near-linear increases in computational cost. While MoE decouples capacity from compute, it introduces...
- Don't Forget Its Variance! The Minimum Path Variance Principle for Accurate and Stable Score-Based Density Ratio Estimation : Abstract: Score-based methods have emerged as a powerful framework for density ratio estimation (DRE), but they face an important paradox in that, while theoretically path-independent, their practical...
- Exploration of Unary Arithmetic-Based Matrix Multiply Units for Low Precision DL Accelerators : Abstract: General matrix multiplication (GEMM) is a fundamental operation in deep learning (DL). With DL moving increasingly toward low precision, recent works have proposed novel unary GEMM designs a...
- Factuality on Demand: Controlling the Factuality-Informativeness Trade-off in Text Generation : Abstract: Large language models (LLMs) encode knowledge with varying degrees of confidence. When responding to queries, models face an inherent trade-off: they can generate responses that are less inf...
- RMFlow: Refined Mean Flow by a Noise-Injection Step for Multimodal Generation : Abstract: Mean flow (MeanFlow) enables efficient, high-fidelity image generation, yet its single-function evaluation (1-NFE) generation often cannot yield compelling results. We address this issue by ...
- Towards Multiscale Graph-based Protein Learning with Geometric Secondary Structural Motifs : Abstract: Graph neural networks (GNNs) have emerged as powerful tools for learning protein structures by capturing spatial relationships at the residue level. However, existing GNN-based methods often...
- Improving Flow Matching by Aligning Flow Divergence : Abstract: Conditional flow matching (CFM) stands out as an efficient, simulation-free approach for training flow-based generative models, achieving remarkable performance for data generation. However,...
- DIAMOND: Directed Inference for Artifact Mitigation in Flow Matching Models : Abstract: Despite impressive results from recent text-to-image models like FLUX, visual and anatomical artifacts remain a significant hurdle for practical and professional use. Existing methods for ar...
- EffGen: Enabling Small Language Models as Capable Autonomous Agents : Abstract: Most existing language model agentic systems today are built and optimized for large language models (e.g., GPT, Claude, Gemini) via API calls. While powerful, this approach faces several li...
- GAPNet: Plug-in Jointly Learning Task-Specific Graph for Dynamic Stock Relation : Abstract: The advent of the web has led to a paradigm shift in the financial relations, with the real-time dissemination of news, social discourse, and financial filings contributing significantly to ...
- Hallucination is a Consequence of Space-Optimality: A Rate-Distortion Theorem for Membership Testing : Abstract: Large language models often hallucinate with high confidence on "random facts" that lack inferable patterns. We formalize the memorization of such facts as a membership testing problem, unif...
- Do Schwartz Higher-Order Values Help Sentence-Level Human Value Detection? When Hard Gating Hurts : Abstract: Sentence-level human value detection is typically framed as multi-label classification over Schwartz values, but it remains unclear whether Schwartz higher-order (HO) categories provide usab...
- A Baseline Multimodal Approach to Emotion Recognition in Conversations : Abstract: We present a lightweight multimodal baseline for emotion recognition in conversations using the SemEval-2024 Task 3 dataset built from the sitcom Friends. The goal of this report is not to p...
- Continuous-Utility Direct Preference Optimization : Abstract: Large language model reasoning is often treated as a monolithic capability, relying on binary preference supervision that fails to capture partial progress or fine-grained reasoning quality....
- MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers : Abstract: The Model Context Protocol (MCP) is rapidly becoming the standard interface for Large Language Models (LLMs) to discover and invoke external tools. However, existing evaluations often fail t...
- CLAMP: Contrastive Learning for 3D Multi-View Action-Conditioned Robotic Manipulation Pretraining : Abstract: Leveraging pre-trained 2D image representations in behavior cloning policies has achieved great success and has become a standard approach for robotic manipulation. However, such representat...
- Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs : Abstract: LLMs are multilingual by training, yet their lingua franca is often English, reflecting English language dominance in pretraining. Other languages remain in parametric memory but are systema...
- FinEvo: From Isolated Backtests to Ecological Market Games for Multi-Agent Financial Strategy Evolution : Abstract: Conventional financial strategy evaluation relies on isolated backtests in static environments. Such evaluations assess each policy independently, overlook correlations and interactions, and...
- Multimodal Scientific Learning Beyond Diffusions and Flows : Abstract: Scientific machine learning (SciML) increasingly requires models that capture multimodal conditional uncertainty arising from ill-posed inverse problems, multistability, and chaotic dynamics...
- GradingAttack: Attacking Large Language Models Towards Short Answer Grading Ability : Abstract: Large language models (LLMs) have demonstrated remarkable potential for automatic short answer grading (ASAG), significantly boosting student assessment efficiency and scalability in educati...
- MedSpeak: A Knowledge Graph-Aided ASR Error Correction Framework for Spoken Medical QA : Abstract: Spoken question-answering (SQA) systems relying on automatic speech recognition (ASR) often struggle with accurately recognizing medical terminology. To this end, we propose MedSpeak, a nove...
- A longitudinal geospatial multimodal dataset of post-discharge frailty, physiology, mobility, and neighborhoods : Abstract: Frailty in older adults is associated with increased vulnerability to functional decline, reduced mobility, social isolation, and challenges during the transition from hospital to community ...
- Simple Role Assignment is Extraordinarily Effective for Safety Alignment : Abstract: Principle-based alignment often lacks context sensitivity and completeness. Grounded in Theory of Mind, we propose role conditioning as a compact alternative: social roles (e.g., mother, jud...
- SCPL: Enhancing Neural Network Training Throughput with Decoupled Local Losses and Model Parallelism : Abstract: Adopting large-scale AI models in enterprise information systems is often hindered by high training costs and long development cycles, posing a significant managerial challenge. The standard...
- The Impact of Machine Learning Uncertainty on the Robustness of Counterfactual Explanations : Abstract: Counterfactual explanations are widely used to interpret machine learning predictions by identifying minimal changes to input features that would alter a model's decision. However, most exis...
- SPGCL: Effective Graph Contrastive Learning via SVD-Guided Structural Perturbation : Abstract: Graph Neural Networks (GNNs) can be highly sensitive to structural noise, including spurious or missing edges caused by adversarial attacks or non-adversarial imperfections. Existing graph c...
- Responsible Evaluation of AI for Mental Health : Abstract: Although artificial intelligence (AI) shows growing promise for mental health care, current approaches to evaluating AI tools in this domain remain fragmented and poorly aligned with clinica...
- Modality as Heterogeneity: Node Splitting and Graph Rewiring for Multimodal Graph Learning : Abstract: Multimodal graphs are gaining increasing attention due to their rich representational power and wide applicability, yet they introduce substantial challenges arising from severe modality con...
- Adoption and Use of LLMs at an Academic Medical Center : Abstract: While large language models (LLMs) can support clinical documentation needs, standalone tools struggle with "workflow friction" from manual data entry. We developed ChatEHR, a system that en...
- Standards for trustworthy AI in the European Union: technical rationale, structural challenges, and an implementation path : Abstract: This white paper examines the technical foundations of European AI standardization under the AI Act. It explains how harmonized standards enable the presumption of conformity mechanism, desc...
- Design and Empirical Study of a Large Language Model-Based Multi-Agent Investment System for Chinese Public REITs : Abstract: This study addresses the low-volatility Chinese Public Real Estate Investment Trusts (REITs) market, proposing a large language model (LLM)-driven trading framework based on multi-agent coll...
- SPARC-RAG: Adaptive Sequential-Parallel Scaling with Context Management for Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) grounds large language model outputs in external evidence, but remains challenged on multi-hop question answering that requires long reasoning. Recent wo...
- CARE-RFT: Confidence-Anchored Reinforcement Finetuning for Reliable Reasoning in Large Language Models : Abstract: Reinforcement finetuning (RFT) has emerged as a powerful paradigm for unlocking reasoning capabilities in large language models. However, we identify a critical trade-off: while unconstraine...
- Impact of LLMs news Sentiment Analysis on Stock Price Movement Prediction : Abstract: This paper addresses stock price movement prediction by leveraging LLM-based news sentiment analysis. Earlier works have largely focused on proposing and assessing sentiment analysis models ...
- ECCO: Evidence-Driven Causal Reasoning for Compiler Optimization : Abstract: Compiler auto-tuning faces a dichotomy between traditional black-box search methods, which lack semantic guidance, and recent Large Language Model (LLM) approaches, which often suffer from s...
- From Numbers to Prompts: A Cognitive Symbolic Transition Mechanism for Lightweight Time-Series Forecasting : Abstract: Large language models have achieved remarkable success in time series prediction tasks, but their substantial computational and memory requirements limit deployment on lightweight platforms....
- Generative Artificial Intelligence in Small and Medium Enterprises: Navigating its Promises and Challenges : Abstract: The latest technological developments in generative artificial intelligence (GAI) offer powerful capabilities to small and medium enterprises (SMEs), as they facilitate the democratization o...
- Interpreting and Controlling Model Behavior via Constitutions for Atomic Concept Edits : Abstract: We introduce a black-box interpretability framework that learns a verifiable constitution: a natural language summary of how changes to a prompt affect a model's specific behavior, such as i...
- EDU-CIRCUIT-HW: Evaluating Multimodal Large Language Models on Real-World University-Level STEM Student Handwritten Solutions : Abstract: Multimodal Large Language Models (MLLMs) hold significant promise for revolutionizing traditional education and reducing teachers' workload. However, accurately interpreting unconstrained ST...
- Mirage2Matter: A Physically Grounded Gaussian World Model from Video : Abstract: The scalability of embodied intelligence is fundamentally constrained by the scarcity of real-world interaction data. While simulation platforms provide a promising alternative, existing app...
- Frequent Pattern Mining approach to Image Compression : Abstract: The paper focuses on Image Compression, explaining efficient approaches based on Frequent Pattern Mining(FPM). The proposed compression mechanism is based on clustering similar pixels in the...
- Radiomics in Medical Imaging: Methods, Applications, and Challenges : Abstract: Radiomics enables quantitative medical image analysis by converting imaging data into structured, high-dimensional feature representations for predictive modeling. Despite methodological dev...
- Autonomous Multi-Agent AI for High-Throughput Polymer Informatics: From Property Prediction to Generative Design Across Synthetic and Bio-Polymers : Abstract: We present an integrated multiagent AI ecosystem for polymer discovery that unifies high-throughput materials workflows, artificial intelligence, and computational modeling within a single P...
- R3G: A Reasoning--Retrieval--Reranking Framework for Vision-Centric Answer Generation : Abstract: Vision-centric retrieval for VQA requires retrieving images to supply missing visual cues and integrating them into the reasoning process. However, selecting the right images and integrating...
- HYPE-EDIT-1: Benchmark for Measuring Reliability in Frontier Image Editing Models : Abstract: Public demos of image editing models are typically best-case samples; real workflows pay for retries and review time. We introduce HYPE-EDIT-1, a 100-task benchmark of reference-based market...
- SITUATE -- Synthetic Object Counting Dataset for VLM training : Abstract: We present SITUATE, a novel dataset designed for training and evaluating Vision Language Models on counting tasks with spatial constraints. The dataset bridges the gap between simple 2D data...
- 1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization : Abstract: Few-shot learning (FSL) challenges model generalization to novel classes based on just a few shots of labeled examples, a testbed where traditional test-time augmentations fail to be effecti...
- IC-EO: Interpretable Code-based assistant for Earth Observation : Abstract: Despite recent advances in computer vision, Earth Observation (EO) analysis remains difficult to perform for the laymen, requiring expert knowledge and technical capabilities. Furthermore, m...
- VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents : Abstract: In recent years, multimodal image editing models have achieved substantial progress, enabling users to manipulate visual content through natural language in a flexible and interactive manner...
- MiniTensor: A Lightweight, High-Performance Tensor Operations Library : Abstract: We present MiniTensor, an open source tensor operations library that focuses on minimalism, correctness, and performance. MiniTensor exposes a familiar PyTorch-like Python API while it execu...
- PredictionMarketBench: A SWE-bench-Style Framework for Backtesting Trading Agents on Prediction Markets : Abstract: Prediction markets offer a natural testbed for trading agents: contracts have binary payoffs, prices can be interpreted as probabilities, and realized performance depends critically on marke...
- Scalable Analytic Classifiers with Associative Drift Compensation for Class-Incremental Learning of Vision Transformers : Abstract: Class-incremental learning (CIL) with Vision Transformers (ViTs) faces a major computational bottleneck during the classifier reconstruction phase, where most existing methods rely on costly...
- Learning Physics-Grounded 4D Dynamics with Neural Gaussian Force Fields : Abstract: Predicting physical dynamics from raw visual data remains a major challenge in AI. While recent video generation models have achieved impressive visual quality, they still cannot consistentl...
- Reversible Diffusion Decoding for Diffusion Language Models : Abstract: Diffusion language models enable parallel token generation through block-wise decoding, but their irreversible commitments can lead to stagnation, where the reverse diffusion process fails t...
- Investigating the Impact of Histopathological Foundation Models on Regressive Prediction of Homologous Recombination Deficiency : Abstract: Foundation models pretrained on large-scale histopathology data have found great success in various fields of computational pathology, but their impact on regressive biomarker prediction rem...
- Real-Time Human Activity Recognition on Edge Microcontrollers: Dynamic Hierarchical Inference with Multi-Spectral Sensor Fusion : Abstract: The demand for accurate on-device pattern recognition in edge applications is intensifying, yet existing approaches struggle to reconcile accuracy with computational constraints. To address ...
- See Without Decoding: Motion-Vector-Based Tracking in Compressed Video : Abstract: We propose a lightweight compressed-domain tracking model that operates directly on video streams, without requiring full RGB video decoding. Using motion vectors and transform coefficients ...
- ReasoningBomb: A Stealthy Denial-of-Service Attack by Inducing Pathologically Long Reasoning in Large Reasoning Models : Abstract: Large reasoning models (LRMs) extend large language models with explicit multi-step reasoning traces, but this capability introduces a new class of prompt-induced inference-time denial-of-se...
- ProDCARL: Reinforcement Learning-Aligned Diffusion Models for De Novo Antimicrobial Peptide Design : Abstract: Antimicrobial resistance threatens healthcare sustainability and motivates low-cost computational discovery of antimicrobial peptides (AMPs). De novo peptide generation must optimize antimic...
- RAPTOR: Ridge-Adaptive Logistic Probes : Abstract: Probing studies what information is encoded in a frozen LLM's layer representations by training a lightweight predictor on top of them. Beyond analysis, probes are often used operationally i...
- Sheaf Neural Networks and biomedical applications : Abstract: The purpose of this paper is to elucidate the theory and mathematical modelling behind the sheaf neural network (SNN) algorithm and then show how SNN can effectively answer to biomedical que...
- Block removal for large language models through constrained binary optimization : Abstract: Compressing resource-intensive large language models by removing whole transformer blocks is a seemingly simple idea, but identifying which blocks to remove constitutes an exponentially diff...
- Why Are AI Agent Involved Pull Requests (Fix-Related) Remain Unmerged? An Empirical Study : Abstract: Autonomous coding agents (e.g., OpenAI Codex, Devin, GitHub Copilot) are increasingly used to generate fix-related pull requests (PRs) in real world software repositories. However, their pra...
- Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints : Abstract: Locally deployed Small Language Models (SLMs) must continually support diverse tasks under strict memory and computation constraints, making selective reliance on cloud Large Language Models...
- Towards Agentic Intelligence for Materials Science : Abstract: The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fin...
- The Blessing of Dimensionality in LLM Fine-tuning: A Variance-Curvature Perspective : Abstract: Weight-perturbation evolution strategies (ES) can fine-tune billion-parameter language models with surprisingly small populations (e.g., $N\!\approx\!30$), contradicting classical zeroth-ord...
- Learning Robust Reasoning through Guided Adversarial Self-Play : Abstract: Reinforcement learning from verifiable rewards (RLVR) produces strong reasoning models, yet they can fail catastrophically when the conditioning context is fallible (e.g., corrupted chain-of...
- The Illusion of Forgetting: Attack Unlearned Diffusion via Initial Latent Variable Optimization : Abstract: Although unlearning-based defenses claim to purge Not-Safe-For-Work (NSFW) concepts from diffusion models (DMs), we reveals that this "forgetting" is largely an illusion. Unlearning partiall...
- Stabilizing Diffusion Posterior Sampling by Noise--Frequency Continuation : Abstract: Diffusion posterior sampling solves inverse problems by combining a pretrained diffusion prior with measurement-consistency guidance, but it often fails to recover fine details because measu...
- Spec-Driven Development:From Code to Contract in the Age of AI Coding Assistants : Abstract: The rise of AI coding assistants has reignited interest in an old idea: what if specifications-not code-were the primary artifact of software development? Spec-driven development (SDD) inver...
- CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning : Abstract: Understanding camera dynamics is a fundamental pillar of video spatial intelligence. However, existing multimodal models predominantly treat this task as a black-box classification, often co...
- EigenAI: Deterministic Inference, Verifiable Results : Abstract: EigenAI is a verifiable AI platform built on top of the EigenLayer restaking ecosystem. At a high level, it combines a deterministic large-language model (LLM) inference engine with a crypto...
- Visible Singularities Guided Correlation Network for Limited-Angle CT Reconstruction : Abstract: Limited-angle computed tomography (LACT) offers the advantages of reduced radiation dose and shortened scanning time. Traditional reconstruction algorithms exhibit various inherent limitatio...
- QUASAR: A Universal Autonomous System for Atomistic Simulation and a Benchmark of Its Capabilities : Abstract: The integration of large language models (LLMs) into materials science offers a transformative opportunity to streamline computational workflows, yet current agentic systems remain constrain...
- LPIPS-AttnWav2Lip: Generic Audio-Driven lip synchronization for Talking Head Generation in the Wild : Abstract: Researchers have shown a growing interest in Audio-driven Talking Head Generation. The primary challenge in talking head generation is achieving audio-visual coherence between the lips and t...
- AI-Generated Image Detectors Overrely on Global Artifacts: Evidence from Inpainting Exchange : Abstract: Modern deep learning-based inpainting enables realistic local image manipulation, raising critical challenges for reliable detection. However, we observe that current detectors primarily rel...
- On the calibration of survival models with competing risks : Abstract: Survival analysis deals with modeling the time until an event occurs, and accurate probability estimates are crucial for decision-making, particularly in the competing-risks setting where mu...
- Rank-and-Reason: Multi-Agent Collaboration Accelerates Zero-Shot Protein Mutation Prediction : Abstract: Zero-shot mutation prediction is vital for low-resource protein engineering, yet existing protein language models (PLMs) often yield statistically confident results that ignore fundamental b...
- SCALED : Surrogate-gradient for Codec-Aware Learning of Downsampling in ABR Streaming : Abstract: The rapid growth in video consumption has introduced significant challenges to modern streaming architectures. Over-the-Top (OTT) video delivery now predominantly relies on Adaptive Bitrate ...
- Vision-Language Model Purified Semi-Supervised Semantic Segmentation for Remote Sensing Images : Abstract: The semi-supervised semantic segmentation (S4) can learn rich visual knowledge from low-cost unlabeled images. However, traditional S4 architectures all face the challenge of low-quality pse...
- Semantic-Aware Advanced Persistent Threat Detection Using Autoencoders on LLM-Encoded System Logs : Abstract: Advanced Persistent Threats (APTs) are among the most challenging cyberattacks to detect. They are carried out by highly skilled attackers who carefully study their targets and operate in a ...
- Analyzing Shapley Additive Explanations to Understand Anomaly Detection Algorithm Behaviors and Their Complementarity : Abstract: Unsupervised anomaly detection is a challenging problem due to the diversity of data distributions and the lack of labels. Ensemble methods are often adopted to mitigate these challenges by ...
- Interpretable Unsupervised Deformable Image Registration via Confidence-bound Multi-Hop Visual Reasoning : Abstract: Unsupervised deformable image registration requires aligning complex anatomical structures without reference labels, making interpretability and reliability critical. Existing deep learning ...
- TessPay: Verify-then-Pay Infrastructure for Trusted Agentic Commerce : Abstract: The global economy is entering the era of Agentic Commerce, where autonomous agents can discover services, negotiate prices, and transact value. However adoption towards agentic commerce fac...
- A Geometric Multimodal Foundation Model Integrating Bp-MRI and Clinical Reports in Prostate Cancer Classification : Abstract: Prostate cancer (PCa) is one of the most common cancers in men worldwide. Bi-parametric MRI (bp-MRI) and clinical variables are crucial for PCa identification and improving treatment decisio...
- Tri-LLM Cooperative Federated Zero-Shot Intrusion Detection with Semantic Disagreement and Trust-Aware Aggregation : Abstract: Federated learning (FL) has become an effective paradigm for privacy-preserving, distributed Intrusion Detection Systems (IDS) in cyber-physical and Internet of Things (IoT) networks, where ...
- MapDream: Task-Driven Map Learning for Vision-Language Navigation : Abstract: Vision-Language Navigation (VLN) requires agents to follow natural language instructions in partially observed 3D environments, motivating map representations that aggregate spatial context ...
- DIVERGE: Diversity-Enhanced RAG for Open-Ended Information Seeking : Abstract: Existing retrieval-augmented generation (RAG) systems are primarily designed under the assumption that each query has a single correct answer. This overlooks common information-seeking scena...
- SANEval: Open-Vocabulary Compositional Benchmarks with Failure-mode Diagnosis : Abstract: The rapid progress of text-to-image (T2I) models has unlocked unprecedented creative potential, yet their ability to faithfully render complex prompts involving multiple objects, attributes,...
- TABES: Trajectory-Aware Backward-on-Entropy Steering for Masked Diffusion Models : Abstract: Masked Diffusion Models (MDMs) have emerged as a promising non-autoregressive paradigm for generative tasks, offering parallel decoding and bidirectional context utilization. However, curren...
- Intelligent Reasoning Cues: A Framework and Case Study of the Roles of AI Information in Complex Decisions : Abstract: Artificial intelligence (AI)-based decision support systems can be highly accurate yet still fail to support users or improve decisions. Existing theories of AI-assisted decision-making focu...
- Subspace Clustering on Incomplete Data with Self-Supervised Contrastive Learning : Abstract: Subspace clustering aims to group data points that lie in a union of low-dimensional subspaces and finds wide application in computer vision, hyperspectral imaging, and recommendation system...
- PLACID: Identity-Preserving Multi-Object Compositing via Video Diffusion with Synthetic Trajectories : Abstract: Recent advances in generative AI have dramatically improved photorealistic image synthesis, yet they fall short for studio-level multi-object compositing. This task demands simultaneous (i) ...
- TokenTrim: Inference-Time Token Pruning for Autoregressive Long Video Generation : Abstract: Auto-regressive video generation enables long video synthesis by iteratively conditioning each new batch of frames on previously generated content. However, recent work has shown that such p...
- VoxServe: Streaming-Centric Serving System for Speech Language Models : Abstract: Deploying modern Speech Language Models (SpeechLMs) in streaming settings requires systems that provide low latency, high throughput, and strong guarantees of streamability. Existing systems...
- Training LLMs with Fault Tolerant HSDP on 100,000 GPUs : Abstract: Large-scale training systems typically use synchronous training, requiring all GPUs to be healthy simultaneously. In our experience training on O(100K) GPUs, synchronous training results in ...
- Sample Complexity Analysis for Constrained Bilevel Reinforcement Learning : Abstract: Several important problem settings within the literature of reinforcement learning (RL), such as meta-learning, hierarchical learning, and RL from human feedback (RL-HF), can be modelled as ...
- TimeBlind: A Spatio-Temporal Compositionality Benchmark for Video LLMs : Abstract: Fine-grained spatio-temporal understanding is essential for video reasoning and embodied AI. Yet, while Multimodal Large Language Models (MLLMs) master static semantics, their grasp of tempo...
- LogicGaze: Benchmarking Causal Consistency in Visual Narratives via Counterfactual Verification : Abstract: While sequential reasoning enhances the capability of Vision-Language Models (VLMs) to execute complex multimodal tasks, their reliability in grounding these reasoning chains within actual v...
- Self-Attention at Constant Cost per Token via Symmetry-Aware Taylor Approximation : Abstract: The most widely used artificial intelligence (AI) models today are Transformers employing self-attention. In its standard form, self-attention incurs costs that increase with context length,...
- Multi-Speaker Conversational Audio Deepfake: Taxonomy, Dataset and Pilot Study : Abstract: The rapid advances in text-to-speech (TTS) technologies have made audio deepfakes increasingly realistic and accessible, raising significant security and trust concerns. While existing resea...
- Semantics-Preserving Evasion of LLM Vulnerability Detectors : Abstract: LLM-based vulnerability detectors are increasingly deployed in security-critical code review, yet their resilience to evasion under behavior-preserving edits remains poorly understood. We ev...
- Opportunistic Promptable Segmentation: Leveraging Routine Radiological Annotations to Guide 3D CT Lesion Segmentation : Abstract: The development of machine learning models for CT imaging depends on the availability of large, high-quality, and diverse annotated datasets. Although large volumes of CT images and reports ...
- Beyond the Loss Curve: Scaling Laws, Active Learning, and the Limits of Learning from Exact Posteriors : Abstract: How close are neural networks to the best they could possibly do? Standard benchmarks cannot answer this because they lack access to the true posterior p(y|x). We use class-conditional norma...
- Optimal Transport-Guided Adversarial Attacks on Graph Neural Network-Based Bot Detection : Abstract: The rise of bot accounts on social media poses significant risks to public discourse. To address this threat, modern bot detectors increasingly rely on Graph Neural Networks (GNNs). However,...
- LingLanMiDian: Systematic Evaluation of LLMs on TCM Knowledge and Clinical Reasoning : Abstract: Large language models (LLMs) are advancing rapidly in medical NLP, yet Traditional Chinese Medicine (TCM) with its distinctive ontology, terminology, and reasoning patterns requires domain-f...
- ORCH: many analyses, one merge-a deterministic multi-agent orchestrator for discrete-choice reasoning with EMA-guided routing : Abstract: Recent advances in large-scale language models (LLMs) have made multi-agent architectures attractive for challenging reasoning tasks. However, many existing systems rely on stochastic routin...
- INDIBATOR: Diverse and Fact-Grounded Individuality for Multi-Agent Debate in Molecular Discovery : Abstract: Multi-agent systems have emerged as a powerful paradigm for automating scientific discovery. To differentiate agent behavior in the multi-agent system, current frameworks typically assign ge...
- Synesthesia of Vehicles: Tactile Data Synthesis from Visual Inputs : Abstract: Autonomous vehicles (AVs) rely on multi-modal fusion for safety, but current visual and optical sensors fail to detect road-induced excitations which are critical for vehicles' dynamic contr...
- ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems : Abstract: Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade perform...
- SOPRAG: Multi-view Graph Experts Retrieval for Industrial Standard Operating Procedures : Abstract: Standard Operating Procedures (SOPs) are essential for ensuring operational safety and consistency in industrial environments. However, retrieving and following these procedures presents uni...
- ProcMEM: Learning Reusable Procedural Memory from Experience via Non-Parametric PPO for LLM Agents : Abstract: LLM-driven agents demonstrate strong performance in sequential decision-making but often rely on on-the-fly reasoning, re-deriving solutions even in recurring scenarios. This insufficient ex...
- Entropy-Guided Data-Efficient Training for Multimodal Reasoning Reward Models : Abstract: Multimodal reward models are crucial for aligning multimodal large language models with human preferences. Recent works have incorporated reasoning capabilities into these models, achieving ...
- Geometric Analysis of Token Selection in Multi-Head Attention : Abstract: We present a geometric framework for analysing multi-head attention in large language models (LLMs). Without altering the mechanism, we view standard attention through a top-N selection lens...
- DomusFM: A Foundation Model for Smart-Home Sensor Data : Abstract: Smart-home sensor data holds significant potential for several applications, including healthcare monitoring and assistive technologies. Existing approaches, however, face critical limitatio...
- Large Language Model and Formal Concept Analysis: a comparative study for Topic Modeling : Abstract: Topic modeling is a research field finding increasing applications: historically from document retrieving, to sentiment analysis and text summarization. Large Language Models (LLM) are curre...
- Small Generalizable Prompt Predictive Models Can Steer Efficient RL Post-Training of Large Reasoning Models : Abstract: Reinforcement learning enhances the reasoning capabilities of large language models but often involves high computational costs due to rollout-intensive optimization. Online prompt selection...
- Evolving from Tool User to Creator via Training-Free Experience Reuse in Multimodal Reasoning : Abstract: Existing Tool-Integrated Reasoning (TIR) models have effectively extended the question-answering capabilities of LLMs by incorporating external tools. However, real-world scenarios present n...
- Emergent Analogical Reasoning in Transformers : Abstract: Analogy is a central faculty of human intelligence, enabling abstract patterns discovered in one domain to be applied to another. Despite its central role in cognition, the mechanisms by whi...
- Thinking Like a Doctor: Conversational Diagnosis through the Exploration of Diagnostic Knowledge Graphs : Abstract: Conversational diagnosis requires multi-turn history-taking, where an agent asks clarifying questions to refine differential diagnoses under incomplete information. Existing approaches often...
- Do I Really Know? Learning Factual Self-Verification for Hallucination Reduction : Abstract: Factual hallucination remains a central challenge for large language models (LLMs). Existing mitigation approaches primarily rely on either external post-hoc verification or mapping uncertai...
- Light Alignment Improves LLM Safety via Model Self-Reflection with a Single Neuron : Abstract: The safety of large language models (LLMs) has increasingly emerged as a fundamental aspect of their development. Existing safety alignment for LLMs is predominantly achieved through post-tr...
- Edit Knowledge, Not Just Facts via Multi-Step Reasoning over Background Stories : Abstract: Enabling artificial intelligence systems, particularly large language models, to integrate new knowledge and flexibly apply it during reasoning remains a central challenge. Existing knowledg...
- Canonical Intermediate Representation for LLM-based optimization problem formulation and code generation : Abstract: Automatically formulating optimization models from natural language descriptions is a growing focus in operations research, yet current LLM-based approaches struggle with the composite const...
- Constrained Process Maps for Multi-Agent Generative AI Workflows : Abstract: Large language model (LLM)-based agents are increasingly used to perform complex, multi-step workflows in regulated settings such as compliance and due diligence. However, many agentic archi...
- Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models : Abstract: The agency expected of Agentic Large Language Models goes beyond answering correctly, requiring autonomy to set goals and decide what to explore. We term this investigatory intelligence, dis...
- Rethinking the Role of Entropy in Optimizing Tool-Use Behaviors for Large Language Model Agents : Abstract: Tool-using agents based on Large Language Models (LLMs) excel in tasks such as mathematical reasoning and multi-hop question answering. However, in long trajectories, agents often trigger ex...
- SIDiffAgent: Self-Improving Diffusion Agent : Abstract: Text-to-image diffusion models have revolutionized generative AI, enabling high-quality and photorealistic image synthesis. However, their practical deployment remains hindered by several li...
- Understanding the Reversal Curse Mitigation in Masked Diffusion Models through Attention and Training Dynamics : Abstract: Autoregressive language models (ARMs) suffer from the reversal curse: after learning that "$A$ is $B$", they often fail on the reverse query "$B$ is $A$". Masked diffusion-based language mod...
- Mitigating Safety Tax via Distribution-Grounded Refinement in Large Reasoning Models : Abstract: Safety alignment incurs safety tax that perturbs a large reasoning model's (LRM) general reasoning ability. Existing datasets used for safety alignment for an LRM are usually constructed by ...
- Traffic-Aware Navigation in Road Networks : Abstract: This project compares three graph search approaches for the task of traffic-aware navigation in Kingston's road network. These approaches include a single-run multi-query preprocessing algor...
- Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization : Abstract: While large language models (LLMs) have shown strong performance in math and logic reasoning, their ability to handle combinatorial optimization (CO) -- searching high-dimensional solution s...
- TIDE: Trajectory-based Diagnostic Evaluation of Test-Time Improvement in LLM Agents : Abstract: Recent advances in autonomous LLM agents demonstrate their ability to improve performance through iterative interaction with the environment. We define this paradigm as Test-Time Improvement...
- More Than a Quick Glance: Overcoming the Greedy Bias in KV-Cache Compression : Abstract: While Large Language Models (LLMs) can theoretically support extensive context windows, their actual deployment is constrained by the linear growth of Key-Value (KV) cache memory. Prevailing...
- Position: Explaining Behavioral Shifts in Large Language Models Requires a Comparative Approach : Abstract: Large-scale foundation models exhibit behavioral shifts: intervention-induced behavioral changes that appear after scaling, fine-tuning, reinforcement learning or in-context learning. While ...
- Interpreting and Controlling LLM Reasoning through Integrated Policy Gradient : Abstract: Large language models (LLMs) demonstrate strong reasoning abilities in solving complex real-world problems. Yet, the internal mechanisms driving these complex reasoning behaviors remain opaq...
- Context Learning for Multi-Agent Discussion : Abstract: Multi-Agent Discussion (MAD) has garnered increasing attention very recently, where multiple LLM instances collaboratively solve problems via structured discussion. However, we find that cur...
- Live-Evo: Online Evolution of Agentic Memory from Continuous Feedback : Abstract: Large language model (LLM) agents are increasingly equipped with memory, which are stored experience and reusable guidance that can improve task-solving performance. Recent \emph{self-evolvi...
- Trust by Design: Skill Profiles for Transparent, Cost-Aware LLM Routing : Abstract: How should Large Language Model (LLM) practitioners select the right model for a task without wasting money? We introduce BELLA (Budget-Efficient LLM Selection via Automated skill-profiling)...
- Structure Enables Effective Self-Localization of Errors in LLMs : Abstract: Self-correction in language models remains elusive. In this work, we explore whether language models can explicitly localize errors in incorrect reasoning, as a path toward building AI syste...
- SafeGround: Know When to Trust GUI Grounding Models via Uncertainty Calibration : Abstract: Graphical User Interface (GUI) grounding aims to translate natural language instructions into executable screen coordinates, enabling automated GUI interaction. Nevertheless, incorrect groun...
- Thinking with Comics: Enhancing Multimodal Reasoning through Structured Visual Storytelling : Abstract: Chain-of-Thought reasoning has driven large language models to extend from thinking with text to thinking with images and videos. However, different modalities still have clear limitations: ...
- Drift-Bench: Diagnosing Cooperative Breakdowns in LLM Agents under Input Faults via Multi-Turn Interaction : Abstract: As Large Language Models transition to autonomous agents, user inputs frequently violate cooperative assumptions (e.g., implicit intent, missing parameters, false presuppositions, or ambiguo...
- MentisOculi: Revealing the Limits of Reasoning with Mental Imagery : Abstract: Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capable of native interleaved genera...
- Avenir-Web: Human-Experience-Imitating Multimodal Web Agents with Mixture of Grounding Experts : Abstract: Despite advances in multimodal large language models, autonomous web agents still struggle to reliably execute long-horizon tasks on complex and dynamic web interfaces. Existing agents often...
- Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge : Abstract: Autoregressive large language models (LLMs) have achieved remarkable success in many complex tasks, yet they can still fail in very simple logical reasoning such as the "reversal curse" -- w...
- AgentRx: Diagnosing AI Agent Failures from Execution Trajectories : Abstract: AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manual...
- Privacy in Practice: Private COVID-19 Detection in X-Ray Images (Extended Version) : Abstract: Machine learning (ML) can help fight pandemics like COVID-19 by enabling rapid screening of large volumes of images. To perform data analysis while maintaining patient privacy, we create ML ...
- Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection : Abstract: Smartwatch health sensor data are increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprise sensitive p...
- Disentangled Interest Network for Out-of-Distribution CTR Prediction : Abstract: Click-through rate (CTR) prediction, which estimates the probability of a user clicking on a given item, is a critical task for online information services. Existing approaches often make st...
- Efficient Multilingual Search Relevance Modeling in E-Commerce via LLM Mixture-of-Experts : Abstract: In e-commerce platforms, search relevance directly influences both user experience and merchant revenue. In multi-country deployments, diverse linguistic, cultural, and product catalog conte...
- PPoGA: Predictive Plan-on-Graph with Action for Knowledge Graph Question Answering : Abstract: Large Language Models (LLMs) augmented with Knowledge Graphs (KGs) have advanced complex question answering, yet they often remain susceptible to failure when their initial high-level reason...
- Unlocking Electronic Health Records: A Hybrid Graph RAG Approach to Safe Clinical AI for Patient QA : Abstract: Electronic health record (EHR) systems present clinicians with vast repositories of clinical information, creating a significant cognitive burden where critical details are easily overlooked...
- ChunkNorris: A High-Performance and Low-Energy Approach to PDF Parsing and Chunking : Abstract: In Retrieval-Augmented Generation applications, the Information Retrieval part is central as it provides the contextual information that enables a Large Language Model to generate an appropr...
- Chained Prompting for Better Systematic Review Search Strategies : Abstract: Systematic reviews require the use of rigorously designed search strategies to ensure both comprehensive retrieval and minimization of bias. Conventional manual approaches, although methodol...
- OGD4All: A Framework for Accessible Interaction with Geospatial Open Government Data Based on Large Language Models : Abstract: We present OGD4All, a transparent, auditable, and reproducible framework based on Large Language Models (LLMs) to enhance citizens' interaction with geospatial Open Government Data (OGD). Th...
- What Artificial Intelligence can do for High-Performance Computing systems? : Abstract: High-performance computing (HPC) centers consume substantial power, incurring environmental and operational costs. This review assesses how artificial intelligence (AI), including machine le...
- G-MemLLM: Gated Latent Memory Augmentation for Long-Context Reasoning in Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, yet they remain constrained by the finite capacity of their context windows and the ...
- PTCBENCH: Benchmarking Contextual Stability of Personality Traits in LLM Systems : Abstract: With the increasing deployment of large language models (LLMs) in affective agents and AI systems, maintaining a consistent and authentic LLM personality becomes critical for user trust and ...
- SafeTalkCoach: Diversity-Driven Multi-Agent Simulation for Parent-Teen Health Conversations : Abstract: The importance of effective parent-child communication about sexual health is widely acknowledged, but real-world data on these conversations is scarce and challenging to collect, due to the...
- AutoBinder Agent: An MCP-Based Agent for End-to-End Protein Binder Design : Abstract: Modern AI technologies for drug discovery are distributed across heterogeneous platforms-including web applications, desktop environments, and code libraries-leading to fragmented workflows,...
- Beyond Static Question Banks: Dynamic Knowledge Expansion via LLM-Automated Graph Construction and Adaptive Generation : Abstract: Personalized education systems increasingly rely on structured knowledge representations to support adaptive learning and question generation. However, existing approaches face two fundament...
- Early Warning Signals Appear Long Before Dropping Out: An Idiographic Approach Grounded in Complex Dynamic Systems Theory : Abstract: The ability to sustain engagement and recover from setbacks (i.e., resilience) -- is fundamental for learning. When resilience weakens, students are at risk of disengagement and may drop out...
- Strategies for Creating Uncertainty in the AI Era to Trigger Students Critical Thinking: Pedagogical Design, Assessment Rubric, and Exam System : Abstract: Generative AI challenges traditional assessments by allowing students to produce correct answers without demonstrating understanding or reasoning. Rather than prohibiting AI, this work argue...
- Representation Learning Enhanced Deep Reinforcement Learning for Optimal Operation of Hydrogen-based Multi-Energy Systems : Abstract: Hydrogen-based multi-energy systems (HMES) have emerged as a promising low-carbon and energy-efficient solution, as it can enable the coordinated operation of electricity, heating and coolin...
- Construct, Align, and Reason: Large Ontology Models for Enterprise Knowledge Management : Abstract: Enterprise-scale knowledge management faces significant challenges in integrating multi-source heterogeneous data and enabling effective semantic reasoning. Traditional knowledge graphs ofte...
- Happy Young Women, Grumpy Old Men? Emotion-Driven Demographic Biases in Synthetic Face Generation : Abstract: Synthetic face generation has rapidly advanced with the emergence of text-to-image (T2I) and of multimodal large language models, enabling high-fidelity image production from natural-languag...
- Synthetic Student Responses: LLM-Extracted Features for IRT Difficulty Parameter Estimation : Abstract: Educational assessment relies heavily on knowing question difficulty, traditionally determined through resource-intensive pre-testing with students. This creates significant barriers for bot...
- LOGOS-CA: A Cellular Automaton Using Natural Language as State and Rule : Abstract: Large Language Models (LLMs), trained solely on massive text data, have achieved high performance on the Winograd Schema Challenge (WSC), a benchmark proposed to measure commonsense knowledg...
- Bitcoin Price Prediction using Machine Learning and Combinatorial Fusion Analysis : Abstract: In this work, we propose to apply a new model fusion and learning paradigm, known as Combinatorial Fusion Analysis (CFA), to the field of Bitcoin price prediction. Price prediction of financ...
- LSSF: Safety Alignment for Large Language Models through Low-Rank Safety Subspace Fusion : Abstract: The safety mechanisms of large language models (LLMs) exhibit notable fragility, as even fine-tuning on datasets without harmful content may still undermine their safety capabilities. Meanwh...
- Enhancing few-shot time series forecasting with LLM-guided diffusion : Abstract: Time series forecasting in specialized domains is often constrained by limited data availability, where conventional models typically require large-scale datasets to effectively capture unde...
- Student Perceptions of Large Language Models Use in Self-Reflection and Design Critique in Architecture Studio : Abstract: This study investigates the integration of Large Language Models (LLMs) into the feedback mechanisms of the architectural design studio, shifting the focus from generative production to refl...
- JSR-GFNet: Jamming-to-Signal Ratio-Aware Dynamic Gating for Interference Classification in future Cognitive Global Navigation Satellite Systems : Abstract: The transition toward cognitive global navigation satellite system (GNSS) receivers requires accurate interference classification to trigger adaptive mitigation strategies. However, conventi...
- When LLMs Imagine People: A Human-Centered Persona Brainstorm Audit for Bias and Fairness in Creative Applications : Abstract: Biased outputs from Large Language Models (LLMs) can reinforce stereotypes and perpetuate inequities in real-world applications, making fairness auditing essential. We introduce the Persona ...
- Lightweight Edge Learning via Dataset Pruning : Abstract: Edge learning facilitates ubiquitous intelligence by enabling model training and adaptation directly on data-generating devices, thereby mitigating privacy risks and communication latency. H...
- Quantum Circuit-Based Learning Models: Bridging Quantum Computing and Machine Learning : Abstract: Machine Learning (ML) has been widely applied across numerous domains due to its ability to automatically identify informative patterns from data for various tasks. The availability of large...
- AI-assisted Protocol Information Extraction For Improved Accuracy and Efficiency in Clinical Trial Workflows : Abstract: Increasing clinical trial protocol complexity, amendments, and challenges around knowledge management create significant burden for trial teams. Structuring protocol content into standard fo...
- How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI : Abstract: Large-scale data has fuelled the success of frontier artificial intelligence (AI) models over the past decade. This expansion has relied on sustained efforts by large technology corporations...
- Explore Brain-Inspired Machine Intelligence for Connecting Dots on Graphs Through Holographic Blueprint of Oscillatory Synchronization : Abstract: Neural coupling in both neuroscience and artificial intelligence emerges as dynamic oscillatory patterns that encode abstract concepts. To this end, we hypothesize that a deeper understandin...
- TextBFGS: Quasi-Newton Optimization for Discrete Executable Text via Gradient-Operator Retrieval : Abstract: Optimizing discrete executable text such as prompts and code has recently been framed as a gradient-based process, effectively translating backpropagation concepts to the semantic space. How...
- Scalable Generative Game Engine: Breaking the Resolution Wall via Hardware-Algorithm Co-Design : Abstract: Real-time generative game engines represent a paradigm shift in interactive simulation, promising to replace traditional graphics pipelines with neural world models. However, existing approa...
- Structured Self-Consistency:A Multi-Task Evaluation of LLMs on VirtualHome : Abstract: Embodied AI requires agents to understand goals, plan actions, and execute tasks in simulated environments.We present a comprehensive evaluation of Large Language Models (LLMs) on the Virtua...
- Inference-Only Prompt Projection for Safe Text-to-Image Generation with TV Guarantees : Abstract: Text-to-Image (T2I) diffusion models enable high-quality open-ended synthesis, but their real-world deployment demands safeguards that suppress unsafe generations without degrading benign pr...
- Predictive Maintenance for Ultrafiltration Membranes Using Explainable Similarity-Based Prognostics : Abstract: In reverse osmosis desalination, ultrafiltration (UF) membranes degrade due to fouling, leading to performance loss and costly downtime. Most plants rely on scheduled preventive maintenance,...
- SEISMO: Increasing Sample Efficiency in Molecular Optimization with a Trajectory-Aware LLM Agent : Abstract: Optimizing the structure of molecules to achieve desired properties is a central bottleneck across the chemical sciences, particularly in the pharmaceutical industry where it underlies the d...
- OpenGuanDan: A Large-Scale Imperfect Information Game Benchmark : Abstract: The advancement of data-driven artificial intelligence (AI), particularly machine learning, heavily depends on large-scale benchmarks. Despite remarkable progress across domains ranging from...
- HumanStudy-Bench: Towards AI Agent Design for Participant Simulation : Abstract: Large language models (LLMs) are increasingly used as simulated participants in social science experiments, but their behavior is often unstable and highly sensitive to design choices. Prior...
- From Prompt to Graph: Comparing LLM-Based Information Extraction Strategies in Domain-Specific Ontology Development : Abstract: Ontologies are essential for structuring domain knowledge, improving accessibility, sharing, and reuse. However, traditional ontology construction relies on manual annotation and conventiona...
- Self-Guard: Defending Large Reasoning Models via enhanced self-reflection : Abstract: The emergence of Large Reasoning Models (LRMs) introduces a new paradigm of explicit reasoning, enabling remarkable advances yet posing unique risks such as reasoning manipulation and inform...
- Physics-informed Diffusion Generation for Geomagnetic Map Interpolation : Abstract: Geomagnetic map interpolation aims to infer unobserved geomagnetic data at spatial points, yielding critical applications in navigation and resource exploration. However, existing methods fo...
- Learning More from Less: Unlocking Internal Representations for Benchmark Compression : Abstract: The prohibitive cost of evaluating Large Language Models (LLMs) necessitates efficient alternatives to full-scale benchmarking. Prevalent approaches address this by identifying a small cores...
- Neuro-symbolic AI for Predictive Maintenance (PdM) -- review and recommendations : Abstract: In this document we perform a systematic review the State-of-the-art in Predictive Maintenance (PdM) over the last five years in industrial settings such as commercial buildings, pharmaceuti...
- Engineering AI Agents for Clinical Workflows: A Case Study in Architecture,MLOps, and Governance : Abstract: The integration of Artificial Intelligence (AI) into clinical settings presents a software engineering challenge, demanding a shift from isolated models to robust, governable, and reliable s...
- Environment-Aware Adaptive Pruning with Interleaved Inference Orchestration for Vision-Language-Action Models : Abstract: While Vision-Language-Action (VLA) models hold promise in embodied intelligence, their large parameter counts lead to substantial inference latency that hinders real-time manipulation, motiv...
- World Models as an Intermediary between Agents and the Real World : Abstract: Large language model (LLM) agents trained using reinforcement learning has achieved superhuman performance in low-cost environments like games, mathematics, and coding. However, these succes...
- MissMAC-Bench: Building Solid Benchmark for Missing Modality Issue in Robust Multimodal Affective Computing : Abstract: As a knowledge discovery task over heterogeneous data sources, current Multimodal Affective Computing (MAC) heavily rely on the completeness of multiple modalities to accurately understand h...
- Resource-Efficient Reinforcement for Reasoning Large Language Models via Dynamic One-Shot Policy Refinement : Abstract: Large language models (LLMs) have exhibited remarkable performance on complex reasoning tasks, with reinforcement learning under verifiable rewards (RLVR) emerging as a principled framework ...
- Optimizing Agentic Reasoning with Retrieval via Synthetic Semantic Information Gain Reward : Abstract: Agentic reasoning enables large reasoning models (LRMs) to dynamically acquire external knowledge, but yet optimizing the retrieval process remains challenging due to the lack of dense, prin...
- Persuasion Propagation in LLM Agents : Abstract: Modern AI agents increasingly combine conversational interaction with autonomous task execution, such as coding and web research, raising a natural question: what happens when an agent engag...
- Position: Human-Centric AI Requires a Minimum Viable Level of Human Understanding : Abstract: AI systems increasingly produce fluent, correct, end-to-end outcomes. Over time, this erodes users' ability to explain, verify, or intervene. We define this divergence as the Capability-Comp...
- Multi-Head Attention Is a Multi-Player Game : Abstract: Modern transformer attention is internally multi-agent -- heads compete and coordinate -- yet we train it as if it were a monolithic optimizer. We formalize this gap: cross-entropy training ...
- Foundation CAN LM: A Pretrained Language Model For Automotive CAN Data : Abstract: The Controller Area Network (CAN) bus provides a rich source of vehicular signals increasingly leveraged for applications in automotive and auto insurance domains, including collision detect...
- Beyond Output Critique: Self-Correction via Task Distillation : Abstract: Large language models (LLMs) have shown promising self-correction abilities, where iterative refinement improves the quality of generated responses. However, most existing approaches operate...
- Synapse Compendium Aware Federated Knowledge Exchange for Tool Routed LLMs : Abstract: Collaborative learning among LLM-based agents under federated learning faces challenges, including communication costs, heterogeneity in data, and tool-usage, limiting their effectiveness. W...
- Supervised sparse auto-encoders as unconstrained feature models for semantic composition : Abstract: Sparse auto-encoders (SAEs) have re-emerged as a prominent method for mechanistic interpretability, yet they face two significant challenges: the non-smoothness of the $L_1$ penalty, which h...
- Learning Abstractions for Hierarchical Planning in Program-Synthesis Agents : Abstract: Humans learn abstractions and use them to plan efficiently to quickly generalize across tasks -- an ability that remains challenging for state-of-the-art large language model (LLM) agents an...
- The Keyhole Effect: Why Chat Interfaces Fail at Data Analysis : Abstract: Chat has become the default interface for AI-assisted data analysis. For multi-step, state-dependent analytical tasks, this is a mistake. Building on Woods (1984) Keyhole Effect, the cogniti...
- MindGuard: Guardrail Classifiers for Multi-Turn Mental Health Support : Abstract: Large language models are increasingly used for mental health support, yet their conversational coherence alone does not ensure clinical appropriateness. Existing general-purpose safeguards ...
- R-HTN: Rebellious Online HTN Planning for Safety and Game AI : Abstract: We introduce online Hierarchical Task Network (HTN) agents whose behaviors are governed by a set of built-in directives \D. Like other agents that are capable of rebellion (i.e., {\it intell...
- Small-Margin Preferences Still Matter-If You Train Them Right : Abstract: Preference optimization methods such as DPO align large language models (LLMs) using paired comparisons, but their effectiveness can be highly sensitive to the quality and difficulty of pref...
- Reasoning and Tool-use Compete in Agentic RL:From Quantifying Interference to Disentangled Tuning : Abstract: Agentic Reinforcement Learning (ARL) focuses on training large language models (LLMs) to interleave reasoning with external tool execution to solve complex tasks. Most existing ARL methods t...
- Error Taxonomy-Guided Prompt Optimization : Abstract: Automatic Prompt Optimization (APO) is a powerful approach for extracting performance from large language models without modifying their weights. Many existing methods rely on trial-and-erro...
- How RLHF Amplifies Sycophancy : Abstract: Large language models often exhibit increased sycophantic behavior after preference-based post-training, showing a stronger tendency to affirm a user's stated or implied belief even when thi...
- HalluHard: A Hard Multi-Turn Hallucination Benchmark : Abstract: Large language models (LLMs) still produce plausible-sounding but ungrounded factual claims, a problem that worsens in multi-turn dialogue as context grows and early errors cascade. We intro...
- Discovering Process-Outcome Credit in Multi-Step LLM Reasoning : Abstract: Reinforcement Learning (RL) serves as a potent paradigm for enhancing reasoning capabilities in Large Language Models (LLMs), yet standard outcome-based approaches often suffer from reward s...
- SetPO: Set-Level Policy Optimization for Diversity-Preserving LLM Reasoning : Abstract: Reinforcement learning with verifiable rewards has shown notable effectiveness in enhancing large language models (LLMs) reasoning performance, especially in mathematics tasks. However, such...
- ConvexBench: Can LLMs Recognize Convex Functions? : Abstract: Convex analysis is a modern branch of mathematics with many applications. As Large Language Models (LLMs) start to automate research-level math and sciences, it is important for LLMs to demo...
- AutoHealth: An Uncertainty-Aware Multi-Agent System for Autonomous Health Data Modeling : Abstract: LLM-based agents have demonstrated strong potential for autonomous machine learning, yet their applicability to health data remains limited. Existing systems often struggle to generalize acr...
- EvoOpt-LLM: Evolving industrial optimization models with large language models : Abstract: Optimization modeling via mixed-integer linear programming (MILP) is fundamental to industrial planning and scheduling, yet translating natural-language requirements into solver-executable m...
- MedBeads: An Agent-Native, Immutable Data Substrate for Trustworthy Medical AI : Abstract: Background: As of 2026, Large Language Models (LLMs) demonstrate expert-level medical knowledge. However, deploying them as autonomous "Clinical Agents" remains limited. Current Electronic M...
- Hard Constraints Meet Soft Generation: Guaranteed Feasibility for LLM-based Combinatorial Optimization : Abstract: Large language models (LLMs) have emerged as promising general-purpose solvers for combinatorial optimization (CO), yet they fundamentally lack mechanisms to guarantee solution feasibility w...
- Probing RLVR training instability through the lens of objective-level hacking : Abstract: Prolonged reinforcement learning with verifiable rewards (RLVR) has been shown to drive continuous improvements in the reasoning capabilities of large language models, but the training is of...
- Transforming Vehicle Diagnostics: A Multimodal Approach to Error Patterns Prediction : Abstract: Accurately diagnosing and predicting vehicle malfunctions is crucial for maintenance and safety in the automotive industry. While modern diagnostic systems primarily rely on sequences of veh...
- Lyapunov Stability-Aware Stackelberg Game for Low-Altitude Economy: A Control-Oriented Pruning-Based DRL Approach : Abstract: With the rapid expansion of the low-altitude economy, Unmanned Aerial Vehicles (UAVs) serve as pivotal aerial base stations supporting diverse services from users, ranging from latency-sensi...
- PersistBench: When Should Long-Term Memories Be Forgotten by LLMs? : Abstract: Conversational assistants are increasingly integrating long-term memory with large language models (LLMs). This persistence of memories, e.g., the user is vegetarian, can enhance personaliza...
- Capabilities and Fundamental Limits of Latent Chain-of-Thought : Abstract: Latent Chain-of-Thought (Latent CoT) models promise efficient reasoning via continuous representations, yet exhibit puzzling performance inconsistencies: excelling at exploration (ProsQA: 97...
- Multi-Agent Causal Reasoning System for Error Pattern Rule Automation in Vehicles : Abstract: Modern vehicles generate thousands of different discrete events known as Diagnostic Trouble Codes (DTCs). Automotive manufacturers use Boolean combinations of these codes, called error patte...
- Do All Individual Layers Help? An Empirical Study of Task-Interfering Layers in Vision-Language Models : Abstract: Current VLMs have demonstrated capabilities across a wide range of multimodal tasks. Typically, in a pretrained VLM, all layers are engaged by default to make predictions on downstream tasks...
- ASP-Bench: From Natural Language to Logic Programs : Abstract: Automating the translation of natural-language specifications into logic programs is a challenging task that affects neurosymbolic engineering. We present ASP-Bench, a benchmark comprising 1...
- A State-Transition Framework for Efficient LLM Reasoning : Abstract: While Long Chain-of-Thought (CoT) reasoning significantly improves Large Language Models (LLMs) performance on complex reasoning tasks, the substantial computational and memory costs of gene...
- Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction : Abstract: The rapid evolution of agentic workflows has demonstrated strong performance of LLM-based agents in addressing complex reasoning tasks. However, existing workflow optimization methods typica...
- Addressing Explainability of Generative AI using SMILE (Statistical Model-agnostic Interpretability with Local Explanations) : Abstract: The rapid advancement of generative artificial intelligence has enabled models capable of producing complex textual and visual outputs; however, their decision-making processes remain largel...
- Not All Preferences Are Created Equal: Stability-Aware and Gradient-Efficient Alignment for Reasoning Models : Abstract: Preference-based alignment is pivotal for training large reasoning models; however, standard methods like Direct Preference Optimization (DPO) typically treat all preference pairs uniformly,...
- FutureMind: Equipping Small Language Models with Strategic Thinking-Pattern Priors via Adaptive Knowledge Distillation : Abstract: Small Language Models (SLMs) are attractive for cost-sensitive and resource-limited settings due to their efficient, low-latency inference. However, they often struggle with complex, knowled...
- Predictive Scheduling for Efficient Inference-Time Reasoning in Large Language Models : Abstract: Large language models (LLMs) achieve state-of-the-art accuracy on complex reasoning tasks by generating multiple chain-of-thought (CoT) traces, but using a fixed token budget per query leads...
- LLM-Driven Ontology Construction for Enterprise Knowledge Graphs : Abstract: Enterprise Knowledge Graphs have become essential for unifying heterogeneous data and enforcing semantic governance. However, the construction of their underlying ontologies remains a resour...
- RE-MCDF: Closed-Loop Multi-Expert LLM Reasoning for Knowledge-Grounded Clinical Diagnosis : Abstract: Electronic medical records (EMRs), particularly in neurology, are inherently heterogeneous, sparse, and noisy, which poses significant challenges for large language models (LLMs) in clinical...
- Model Specific Task Similarity for Vision Language Model Selection via Layer Conductance : Abstract: While open sourced Vision-Language Models (VLMs) have proliferated, selecting the optimal pretrained model for a specific downstream task remains challenging. Exhaustive evaluation is often ...
- Aggregation Queries over Unstructured Text: Benchmark and Agentic Method : Abstract: Aggregation query over free text is a long-standing yet underexplored problem. Unlike ordinary question answering, aggregate queries require exhaustive evidence collection and systems are re...
- Building Better Deception Probes Using Targeted Instruction Pairs : Abstract: Linear probes are a promising approach for monitoring AI systems for deceptive behaviour. Previous work has shown that a linear classifier trained on a contrastive instruction pair and a sim...
- SimGym: Traffic-Grounded Browser Agents for Offline A/B Testing in E-Commerce : Abstract: A/B testing remains the gold standard for evaluating e-commerce UI changes, yet it diverts traffic, takes weeks to reach significance, and risks harming user experience. We introduce SimGym,...
- Agyn: A Multi-Agent System for Team-Based Autonomous Software Engineering : Abstract: Large language models have demonstrated strong capabilities in individual software engineering tasks, yet most autonomous systems still treat issue resolution as a monolithic or pipeline-bas...
- Legal Infrastructure for Transformative AI Governance : Abstract: Most of our AI governance efforts focus on substance: what rules do we want in place? What limits or checks do we want to impose on AI development and deployment? But a key role for law is n...
- Learning to Guide Local Search for MPE Inference in Probabilistic Graphical Models : Abstract: Most Probable Explanation (MPE) inference in Probabilistic Graphical Models (PGMs) is a fundamental yet computationally challenging problem arising in domains such as diagnosis, planning, an...
- Qrita: High-performance Top-k and Top-p Algorithm for GPUs using Pivot-based Truncation and Selection : Abstract: Top-k and Top-p are the dominant truncation operators in the sampling of large language models. Despite their widespread use, implementing them efficiently over large vocabularies remains a ...
- PRISM: Festina Lente Proactivity -- Risk-Sensitive, Uncertainty-Aware Deliberation for Proactive Agents : Abstract: Proactive agents must decide not only what to say but also whether and when to intervene. Many current systems rely on brittle heuristics or indiscriminate long reasoning, which offers littl...
- MAGIC: A Co-Evolving Attacker-Defender Adversarial Game for Robust LLM Safety : Abstract: Ensuring robust safety alignment is crucial for Large Language Models (LLMs), yet existing defenses often lag behind evolving adversarial attacks due to their \textbf{reliance on static, pre...
- S1-NexusAgent: a Self-Evolving Agent Framework for Multidisciplinary Scientific Research : Abstract: Modern scientific research relies on large-scale data, complex workflows, and specialized tools, which existing LLMs and tool-based agents struggle to handle due to limitations in long-horiz...
- Autonomous Question Formation for Large Language Model-Driven AI Systems : Abstract: Large language model (LLM)-driven AI systems are increasingly important for autonomous decision-making in dynamic and open environments. However, most existing systems rely on predefined tas...
- Reasoning with Autoregressive-Diffusion Collaborative Thoughts : Abstract: Autoregressive and diffusion models represent two complementary generative paradigms. Autoregressive models excel at sequential planning and constraint composition, yet struggle with tasks t...
- ToPT: Task-Oriented Prompt Tuning for Urban Region Representation Learning : Abstract: Learning effective region embeddings from heterogeneous urban data underpins key urban computing tasks (e.g., crime prediction, resource allocation). However, prevailing two-stage methods yi...
- ProjDevBench: Benchmarking AI Coding Agents on End-to-End Project Development : Abstract: Recent coding agents can generate complete codebases from simple prompts, yet existing evaluations focus on issue-level bug fixing and lag behind end-to-end development. We introduce ProjDev...
- FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning : Abstract: In recent years, a variety of powerful agentic workflows have been applied to solve a wide range of human problems. However, existing workflow orchestration still faces key challenges, inclu...
- TRIP-Bench: A Benchmark for Long-Horizon Interactive Agents in Real-World Scenarios : Abstract: As LLM-based agents are deployed in increasingly complex real-world settings, existing benchmarks underrepresent key challenges such as enforcing global constraints, coordinating multi-tool ...
- What LLMs Think When You Don't Tell Them What to Think About? : Abstract: Characterizing the behavior of large language models (LLMs) across diverse settings is critical for reliable monitoring and AI safety. However, most existing analyses rely on topic- or task-...
- Beyond Dense States: Elevating Sparse Transcoders to Active Operators for Latent Reasoning : Abstract: Latent reasoning compresses the chain-of-thought (CoT) into continuous hidden states, yet existing methods rely on dense latent transitions that remain difficult to interpret and control. Me...
- Mitigating loss of control in advanced AI systems through instrumental goal trajectories : Abstract: Researchers at artificial intelligence labs and universities are concerned that highly capable artificial intelligence (AI) systems may erode human control by pursuing instrumental goals. Ex...
- Optimizing Prompts for Large Language Models: A Causal Approach : Abstract: Large Language Models (LLMs) are increasingly embedded in enterprise workflows, yet their performance remains highly sensitive to prompt design. Automatic Prompt Optimization (APO) seeks to ...
- MACD: Model-Aware Contrastive Decoding via Counterfactual Data : Abstract: Video language models (Video-LLMs) are prone to hallucinations, often generating plausible but ungrounded content when visual evidence is weak, ambiguous, or biased. Existing decoding method...
- Controlling Exploration-Exploitation in GFlowNets via Markov Chain Perspectives : Abstract: Generative Flow Network (GFlowNet) objectives implicitly fix an equal mixing of forward and backward policies, potentially constraining the exploration-exploitation trade-off during training...
- Adversarial Reward Auditing for Active Detection and Mitigation of Reward Hacking : Abstract: Reinforcement Learning from Human Feedback (RLHF) remains vulnerable to reward hacking, where models exploit spurious correlations in learned reward models to achieve high scores while viola...
- PRISM: Parametrically Refactoring Inference for Speculative Sampling Draft Models : Abstract: Large Language Models (LLMs), constrained by their auto-regressive nature, suffer from slow decoding. Speculative decoding methods have emerged as a promising solution to accelerate LLM deco...
- Efficient Cross-Architecture Knowledge Transfer for Large-Scale Online User Response Prediction : Abstract: Deploying new architectures in large-scale user response prediction systems incurs high model switching costs due to expensive retraining on massive historical data and performance degradati...
- Scalable and Secure AI Inference in Healthcare: A Comparative Benchmarking of FastAPI and Triton Inference Server on Kubernetes : Abstract: Efficient and scalable deployment of machine learning (ML) models is a prerequisite for modern production environments, particularly within regulated domains such as healthcare and pharmaceu...
- Learning to Price: Interpretable Attribute-Level Models for Dynamic Markets : Abstract: Dynamic pricing in high-dimensional markets poses fundamental challenges of scalability, uncertainty, and interpretability. Existing low-rank bandit formulations learn efficiently but rely o...
- From Gameplay Traces to Game Mechanics: Causal Induction with Large Language Models : Abstract: Deep learning agents can achieve high performance in complex game domains without often understanding the underlying causal game mechanics. To address this, we investigate Causal Induction: ...
- Complete Identification of Deep ReLU Neural Networks by Many-Valued Logic : Abstract: Deep ReLU neural networks admit nontrivial functional symmetries: vastly different architectures and parameters (weights and biases) can realize the same function. We address the complete id...
- Localizing and Correcting Errors for LLM-based Planners : Abstract: Large language models (LLMs) have demonstrated strong reasoning capabilities on math and coding, but frequently fail on symbolic classical planning tasks. Our studies, as well as prior work,...
- Assessing Domain-Level Susceptibility to Emergent Misalignment from Narrow Finetuning : Abstract: Emergent misalignment poses risks to AI safety as language models are increasingly used for autonomous tasks. In this paper, we present a population of large language models (LLMs) fine-tune...
- Autonomous Data Processing using Meta-Agents : Abstract: Traditional data processing pipelines are typically static and handcrafted for specific tasks, limiting their adaptability to evolving requirements. While general-purpose agents and coding a...
- SayNext-Bench: Why Do LLMs Struggle with Next-Utterance Prediction? : Abstract: We explore the use of large language models (LLMs) for next-utterance prediction in human dialogue. Despite recent advances in LLMs demonstrating their ability to engage in natural conversat...
- MHDash: An Online Platform for Benchmarking Mental Health-Aware AI Assistants : Abstract: Large language models (LLMs) are increasingly applied in mental health support systems, where reliable recognition of high-risk states such as suicidal ideation and self-harm is safety-criti...
- Position: Agentic Evolution is the Path to Evolving LLMs : Abstract: As Large Language Models (LLMs) move from curated training sets into open-ended real-world environments, a fundamental limitation emerges: static training cannot keep pace with continual dep...
- POET: Protocol Optimization via Eligibility Tuning : Abstract: Eligibility criteria (EC) are essential for clinical trial design, yet drafting them remains a time-intensive and cognitively demanding task for clinicians. Existing automated approaches oft...
- KEPO: Knowledge-Enhanced Preference Optimization for Reinforcement Learning with Reasoning : Abstract: Reinforcement learning (RL) has emerged as a promising paradigm for inducing explicit reasoning behaviors in large language and vision-language models. However, reasoning-oriented RL post-tr...
- RobustDebias: Debiasing Language Models using Distributionally Robust Optimization : Abstract: Pretrained language models have been shown to exhibit biases and social stereotypes. Prior work on debiasing these models has largely focused on modifying embedding spaces during pretraining...
- PolarMem: A Training-Free Polarized Latent Graph Memory for Verifiable Multimodal Agents : Abstract: As multimodal agents evolve from passive observers to long-horizon decision-makers, they require memory systems that provide not just information availability but logical verifiability. A fu...
- Do Latent-CoT Models Think Step-by-Step? A Mechanistic Study on Sequential Reasoning Tasks : Abstract: Latent Chain-of-Thought (Latent-CoT) aims to enable step-by-step computation without emitting long rationales, yet its mechanisms remain unclear. We study CODI, a continuous-thought teacher-...
- Cross-Modal Memory Compression for Efficient Multi-Agent Debate : Abstract: Multi-agent debate can improve reasoning quality and reduce hallucinations, but it incurs rapidly growing context as debate rounds and agent count increase. Retaining full textual histories ...
- Benchmarking Agents in Insurance Underwriting Environments : Abstract: As AI agents integrate into enterprise applications, their evaluation demands benchmarks that reflect the complexity of real-world operations. Instead, existing benchmarks overemphasize open...
- Dual Latent Memory for Visual Multi-agent System : Abstract: While Visual Multi-Agent Systems (VMAS) promise to enhance comprehensive abilities through inter-agent collaboration, empirical evidence reveals a counter-intuitive "scaling wall": increasin...
- Replacing Parameters with Preferences: Federated Alignment of Heterogeneous Vision-Language Models : Abstract: VLMs have broad potential in privacy-sensitive domains such as healthcare and finance, yet strict data-sharing constraints render centralized training infeasible. FL mitigates this issue by ...
- PCBSchemaGen: Constraint-Guided Schematic Design via LLM for Printed Circuit Boards (PCB) : Abstract: Printed Circuit Board (PCB) schematic design plays an essential role in all areas of electronic industries. Unlike prior works that focus on digital or analog circuits alone, PCB design must...
- Diagnosing the Reliability of LLM-as-a-Judge via Item Response Theory : Abstract: While LLM-as-a-Judge is widely used in automated evaluation, existing validation practices primarily operate at the level of observed outputs, offering limited insight into whether LLM judge...
- How Far Are LLMs from Professional Poker Players? Revisiting Game-Theoretic Reasoning with Agentic Tool Use : Abstract: As Large Language Models (LLMs) are increasingly applied in high-stakes domains, their ability to reason strategically under uncertainty becomes critical. Poker provides a rigorous testbed, ...
- Uncovering Latent Communication Patterns in Brain Networks via Adaptive Flow Routing : Abstract: Unraveling how macroscopic cognitive phenotypes emerge from microscopic neuronal connectivity remains one of the core pursuits of neuroscience. To this end, researchers typically leverage mu...
- Unmasking Reasoning Processes: A Process-aware Benchmark for Evaluating Structural Mathematical Reasoning in LLMs : Abstract: Recent large language models (LLMs) achieve near-saturation accuracy on many established mathematical reasoning benchmarks, raising concerns about their ability to diagnose genuine reasoning...
- Learning Modal-Mixed Chain-of-Thought Reasoning with Latent Embeddings : Abstract: We study how to extend chain-of-thought (CoT) beyond language to better handle multimodal reasoning. While CoT helps LLMs and VLMs articulate intermediate steps, its text-only form often fai...
- Small Shifts, Large Gains: Unlocking Traditional TSP Heuristic Guided-Sampling via Unsupervised Neural Instance Modification : Abstract: The Traveling Salesman Problem (TSP) is one of the most representative NP-hard problems in route planning and a long-standing benchmark in combinatorial optimization. Traditional heuristic t...
- Exploring Information Seeking Agent Consolidation : Abstract: Information-seeking agents have emerged as a powerful paradigm for solving knowledge-intensive tasks. Existing information-seeking agents are typically specialized for open web, documents, o...
- DockSmith: Scaling Reliable Coding Environments via an Agentic Docker Builder : Abstract: Reliable Docker-based environment construction is a dominant bottleneck for scaling execution-grounded training and evaluation of software engineering agents. We introduce DockSmith, a speci...
Research Sources: 1371 | Generated: 2/3/2026
