AI RESEARCH PAPERS & ACADEMIC SOURCES
- Winner Team Mia at TextVQA Challenge 2021: Vision-and-Language Representation Learning with Pre-trained Sequence-to-Sequence Model : Abstract: TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reas...
- MOVA: Towards Scalable and Synchronized Video-Audio Generation : Abstract: Audio is indispensable for real-world video, yet generation models have largely overlooked audio components. Current approaches to producing audio-visual content often rely on cascaded pipel...
- Omni-Video 2: Scaling MLLM-Conditioned Diffusion for Unified Video Generation and Editing : Abstract: We present Omni-Video 2, a scalable and computationally efficient model that connects pretrained multimodal large-language models (MLLMs) with video diffusion models for unified video genera...
- Any-to-All MRI Synthesis: A Unified Foundation Model for Nasopharyngeal Carcinoma and Its Downstream Applications : Abstract: Magnetic resonance imaging (MRI) is essential for nasopharyngeal carcinoma (NPC) radiotherapy (RT), but practical constraints, such as patient discomfort, long scan times, and high costs oft...
- VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning : Abstract: The growing capability of video generation poses escalating security risks, making reliable detection increasingly essential. In this paper, we introduce VideoVeritas, a framework that integ...
- TiFRe: Text-guided Video Frame Reduction for Efficient Video Multi-modal Large Language Models : Abstract: With the rapid development of Large Language Models (LLMs), Video Multi-Modal Large Language Models (Video MLLMs) have achieved remarkable performance in video-language tasks such as video u...
- Grow with the Flow: 4D Reconstruction of Growing Plants with Gaussian Flow Fields : Abstract: Modeling the time-varying 3D appearance of plants during their growth poses unique challenges: unlike many dynamic scenes, plants generate new geometry over time as they expand, branch, and ...
- Modeling 3D Pedestrian-Vehicle Interactions for Vehicle-Conditioned Pose Forecasting : Abstract: Accurately predicting pedestrian motion is crucial for safe and reliable autonomous driving in complex urban environments. In this work, we present a 3D vehicle-conditioned pedestrian pose f...
- WorldArena: A Unified Benchmark for Evaluating Perception and Functional Utility of Embodied World Models : Abstract: While world models have emerged as a cornerstone of embodied intelligence by enabling agents to reason about environmental dynamics through action-conditioned prediction, their evaluation re...
- Generalizing Sports Feedback Generation by Watching Competitions and Reading Books: A Rock Climbing Case Study : Abstract: While there is rapid progress in video-LLMs with advanced reasoning capabilities, prior work shows that these models struggle on the challenging task of sports feedback generation and requir...
- Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction : Abstract: Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as autom...
- WorldCompass: Reinforcement Learning for Long-Horizon World Models : Abstract: This work presents WorldCompass, a novel Reinforcement Learning (RL) post-training framework for the long-horizon, interactive video-based world models, enabling them to explore the world mo...
- Autoregressive Image Generation with Masked Bit Modeling : Abstract: This paper challenges the dominance of continuous pipelines in visual generation. We systematically investigate the performance gap between discrete and continuous methods. Contrary to the b...
- Learning to Anchor Visual Odometry: KAN-Based Pose Regression for Planetary Landing : Abstract: Accurate and real-time 6-DoF localization is mission-critical for autonomous lunar landing, yet existing approaches remain limited: visual odometry (VO) drifts unboundedly, while map-based a...
- FeudalNav: A Simple Framework for Visual Navigation : Abstract: Visual navigation for robotics is inspired by the human ability to navigate environments using visual cues and memory, eliminating the need for detailed maps. In unseen, unmapped, or GPS-den...
- LangGS-SLAM: Real-Time Language-Feature Gaussian Splatting SLAM : Abstract: In this paper, we propose a RGB-D SLAM system that reconstructs a language-aligned dense feature field while sustaining low-latency tracking and mapping. First, we introduce a Top-K Renderin...
- When Simultaneous Localization and Mapping Meets Wireless Communications: A Survey : Abstract: The availability of commercial wireless communication and sensing equipment combined with the advancements in intelligent autonomous systems paves the way towards robust joint communications...
- A Distributed Multi-Modal Sensing Approach for Human Activity Recognition in Real-Time Human-Robot Collaboration : Abstract: Human activity recognition (HAR) is fundamental in human-robot collaboration (HRC), enabling robots to respond to and dynamically adapt to human intentions. This paper introduces a HAR syste...
- Guidestar-Free Adaptive Optics with Asymmetric Apertures : Abstract: This work introduces the first closed-loop adaptive optics (AO) system capable of optically correcting aberrations in real-time without a guidestar or a wavefront sensor. Nearly 40 years ago...
- U-Net Based Image Enhancement for Short-time Muon Scattering Tomography : Abstract: Muon Scattering Tomography (MST) is a promising non-invasive inspection technique, yet the practical application of short-time MST is hindered by poor image quality due to limited muon flux....
- Exploring Polarimetric Properties Preservation during Reconstruction of PolSAR images using Complex-valued Convolutional Neural Networks : Abstract: The inherently complex-valued nature of Polarimetric SAR data necessitates using specialized algorithms capable of directly processing complex-valued representations. However, this aspect re...
- Wavelet-Domain Masked Image Modeling for Color-Consistent HDR Video Reconstruction : Abstract: High Dynamic Range (HDR) video reconstruction aims to recover fine brightness, color, and details from Low Dynamic Range (LDR) videos. However, existing methods often suffer from color inacc...
- Surveillance Facial Image Quality Assessment: A Multi-dimensional Dataset and Lightweight Model : Abstract: Surveillance facial images are often captured under unconstrained conditions, resulting in severe quality degradation due to factors such as low resolution, motion blur, occlusion, and poor ...
- Global Symmetry and Orthogonal Transformations from Geometrical Moment $n$-tuples : Abstract: Detecting symmetry is crucial for effective object grasping for several reasons. Recognizing symmetrical features or axes within an object helps in developing efficient grasp strategies, as ...
- DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation : Abstract: Semi-supervised learning (SSL) has emerged as a critical paradigm for medical image segmentation, mitigating the immense cost of dense annotations. However, prevailing SSL frameworks are fun...
- Research on a Camera Position Measurement Method based on a Parallel Perspective Error Transfer Model : Abstract: Camera pose estimation from sparse correspondences is a fundamental problem in geometric computer vision and remains particularly challenging in near-field scenarios, where strong perspectiv...
- Dynamic Black-hole Emission Tomography with Physics-informed Neural Fields : Abstract: With the success of static black-hole imaging, the next frontier is the dynamic and 3D imaging of black holes. Recovering the dynamic 3D gas near a black hole would reveal previously-unseen ...
- Chamelion: Reliable Change Detection for Long-Term LiDAR Mapping in Transient Environments : Abstract: Online change detection is crucial for mobile robots to efficiently navigate through dynamic environments. Detecting changes in transient settings, such as active construction sites or frequ...
- A Unified Framework for Multimodal Image Reconstruction and Synthesis using Denoising Diffusion Models : Abstract: Image reconstruction and image synthesis are important for handling incomplete multimodal imaging data, but existing methods require various task-specific models, complicating training and d...
- Informative Object-centric Next Best View for Object-aware 3D Gaussian Splatting in Cluttered Scenes : Abstract: In cluttered scenes with inevitable occlusions and incomplete observations, selecting informative viewpoints is essential for building a reliable representation. In this context, 3D Gaussian...
- Reliability-aware Execution Gating for Near-field and Off-axis Vision-guided Robotic Alignment : Abstract: Vision-guided robotic systems are increasingly deployed in precision alignment tasks that require reliable execution under near-field and off-axis configurations. While recent advances in po...
- retinalysis-vascx: An explainable software toolbox for the extraction of retinal vascular biomarkers : Abstract: The automatic extraction of retinal vascular biomarkers from color fundus images (CFI) is essential for large-scale studies of the retinal vasculature. We present VascX, an open-source Pytho...
- Designing Multi-Robot Ground Video Sensemaking with Public Safety Professionals : Abstract: Videos from fleets of ground robots can advance public safety by providing scalable situational awareness and reducing professionals' burden. Yet little is known about how to design and inte...
- Dexterous Manipulation Policies from RGB Human Videos via 4D Hand-Object Trajectory Reconstruction : Abstract: Multi-finger robotic hand manipulation and grasping are challenging due to the high-dimensional action space and the difficulty of acquiring large-scale training data. Existing approaches la...
- $\chi_{0}$: Resource-Aware Robust Manipulation via Taming Distributional Inconsistencies : Abstract: High-reliability long-horizon robotic manipulation has traditionally relied on large-scale data and compute to understand complex real-world dynamics. However, we identify that the primary b...
- Fast Image-based Neural Relighting with Translucency-Reflection Modeling : Abstract: Image-based lighting (IBL) is a widely used technique that renders objects using a high dynamic range image or environment map. However, aggregating the irradiance at the object's surface is...
- VedicTHG: Symbolic Vedic Computation for Low-Resource Talking-Head Generation in Educational Avatars : Abstract: Talking-head avatars are increasingly adopted in educational technology to deliver content with social presence and improved engagement. However, many recent talking-head generation (THG) me...
- Improving Efficiency of Diffusion Models via Multi-Stage Framework and Tailored Multi-Decoder Architectures : Abstract: Diffusion models, emerging as powerful deep generative tools, excel in various applications. They operate through a two-steps process: introducing noise into training samples and then employ...
- Determination of efficiency indicators of the stand for intelligent control of manual operations in industrial production : Abstract: Manual operations remain essential in industrial production because of their flexibility and low implementation cost. However, ensuring their quality and monitoring execution in real time re...
- TSJNet: A Multi-modality Target and Semantic Awareness Joint-driven Image Fusion Network : Abstract: This study aims to address the problem of incomplete information in unimodal images for semantic segmentation and object detection tasks. Existing multimodal fusion methods suffer from limit...
- View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV : Abstract: In this paper, we address the challenge of Multi-Object Tracking (MOT) in moving Unmanned Aerial Vehicle (UAV) scenarios, where irregular flight trajectories, such as hovering, turning left/...
- EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning : Abstract: Multi-modal Large Language Models (MLLMs) have advanced greatly in general tasks. However, they still face challenges in geometric reasoning, a task that requires synergistic integration of ...
- "ScatSpotter" -- A Dog Poop Detection Dataset : Abstract: Small, amorphous waste objects such as biological droppings and microtrash can be difficult to see, especially in cluttered scenes, yet they matter for environmental cleanliness, public heal...
- ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval : Abstract: ERVD: An Efficient and Robust ViT-Based Distillation Framework for Remote Sensing Image Retrieval
- ALIGN: Advanced Query Initialization with LiDAR-Image Guidance for Occlusion-Robust 3D Object Detection : Abstract: Recent query-based 3D object detection methods using camera and LiDAR inputs have shown strong performance, but existing query initialization strategies,such as random sampling or BEV heatma...
- Visual Prompt-Agnostic Evolution : Abstract: Visual Prompt Tuning (VPT) adapts a frozen Vision Transformer (ViT) to downstream tasks by inserting a small number of learnable prompt tokens into the token sequence at each layer. However,...
- CAF-Mamba: Mamba-Based Cross-Modal Adaptive Attention Fusion for Multimodal Depression Detection : Abstract: Depression is a prevalent mental health disorder that severely impairs daily functioning and quality of life. While recent deep learning approaches for depression detection have shown promis...
- ReRoPE: Repurposing RoPE for Relative Camera Control : Abstract: Video generation with controllable camera viewpoints is essential for applications such as interactive content creation, gaming, and simulation. Existing methods typically adapt pre-trained ...
- ViT-5: Vision Transformers for The Mid-2020s : Abstract: This work presents a systematic investigation into modernizing Vision Transformer backbones by leveraging architectural advancements from the past five years. While preserving the canonical ...
- Building Damage Detection using Satellite Images and Patch-Based Transformer Methods : Abstract: Rapid building damage assessment is critical for post-disaster response. Damage classification models built on satellite imagery provide a scalable means of obtaining situational awareness. ...
- MambaFusion: Adaptive State-Space Fusion for Multimodal 3D Object Detection : Abstract: Reliable 3D object detection is fundamental to autonomous driving, and multimodal fusion algorithms using cameras and LiDAR remain a persistent challenge. Cameras provide dense visual cues b...
- Fields of The World: A Field Guide for Extracting Agricultural Field Boundaries : Abstract: Field boundary maps are a building block for agricultural data products and support crop monitoring, yield estimation, and disease estimation. This tutorial presents the Fields of The World ...
- DAS-SK: An Adaptive Model Integrating Dual Atrous Separable and Selective Kernel CNN for Agriculture Semantic Segmentation : Abstract: Semantic segmentation in high-resolution agricultural imagery demands models that strike a careful balance between accuracy and computational efficiency to enable deployment in practical sys...
- PEGAsus: 3D Personalization of Geometry and Appearance : Abstract: We present PEGAsus, a new framework capable of generating Personalized 3D shapes by learning shape concepts at both Geometry and Appearance levels. First, we formulate 3D shape personalizati...
- Generative Regression for Left Ventricular Ejection Fraction Estimation from Echocardiography Video : Abstract: Estimating Left Ventricular Ejection Fraction (LVEF) from echocardiograms constitutes an ill-posed inverse problem. Inherent noise, artifacts, and limited viewing angles introduce ambiguity,...
- Geospatial-Reasoning-Driven Vocabulary-Agnostic Remote Sensing Semantic Segmentation : Abstract: Open-vocabulary semantic segmentation has emerged as a promising research direction in remote sensing, enabling the recognition of diverse land-cover types beyond pre-defined category sets. ...
- Chain-of-Caption: Training-free improvement of multimodal large language model on referring expression comprehension : Abstract: Given a textual description, the task of referring expression comprehension (REC) involves the localisation of the referred object in an image. Multimodal large language models (MLLMs) have ...
- Efficient-SAM2: Accelerating SAM2 with Object-Aware Visual Encoding and Memory Retrieval : Abstract: Segment Anything Model 2 (SAM2) shows excellent performance in video object segmentation tasks; however, the heavy computational burden hinders its application in real-time video processing....
- CAE-AV: Improving Audio-Visual Learning via Cross-modal Interactive Enrichment : Abstract: Audio-visual learning suffers from modality misalignment caused by off-screen sources and background clutter, and current methods usually amplify irrelevant regions or moments, leading to un...
- Language-Guided Transformer Tokenizer for Human Motion Generation : Abstract: In this paper, we focus on motion discrete tokenization, which converts raw motion into compact discrete tokens--a process proven crucial for efficient motion generation. In this paradigm, i...
- What, Whether and How? Unveiling Process Reward Models for Thinking with Images Reasoning : Abstract: The rapid advancement of Large Vision Language Models (LVLMs) has demonstrated excellent abilities in various visual tasks. Building upon these developments, the thinking with images paradig...
- E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs : Abstract: E-commerce short videos represent a high-revenue segment of the online video industry characterized by a goal-driven format and dense multi-modal signals. Current models often struggle with ...
- Geometric Image Editing via Effects-Sensitive In-Context Inpainting with Diffusion Transformers : Abstract: Recent advances in diffusion models have significantly improved image editing. However, challenges persist in handling geometric transformations, such as translation, rotation, and scaling, ...
- D$^2$-VR: Degradation-Robust and Distilled Video Restoration with Synergistic Optimization Strategy : Abstract: The integration of diffusion priors with temporal alignment has emerged as a transformative paradigm for video restoration, delivering fantastic perceptual quality, yet the practical deploym...
- RealSynCol: a high-fidelity synthetic colon dataset for 3D reconstruction applications : Abstract: Deep learning has the potential to improve colonoscopy by enabling 3D reconstruction of the colon, providing a comprehensive view of mucosal surfaces and lesions, and facilitating the identi...
- Understanding and Optimizing Attention-Based Sparse Matching for Diverse Local Features : Abstract: We revisit the problem of training attention-based sparse image matching models for various local features. We first identify one critical design choice that has been previously overlooked, ...
- Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition : Abstract: Despite the growing video understanding capabilities of recent Multimodal Large Language Models (MLLMs), existing video benchmarks primarily assess understanding based on models' static, int...
- TriC-Motion: Tri-Domain Causal Modeling Grounded Text-to-Motion Generation : Abstract: Text-to-motion generation, a rapidly evolving field in computer vision, aims to produce realistic and text-aligned motion sequences. Current methods primarily focus on spatial-temporal model...
- Are Vision Foundation Models Foundational for Electron Microscopy Image Segmentation? : Abstract: Although vision foundation models (VFMs) are increasingly reused for biomedical image analysis, it remains unclear whether the latent representations they provide are general enough to suppo...
- GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving : Abstract: Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships r...
- Automatic regularization parameter choice for tomography using a double model approach : Abstract: Image reconstruction in X-ray tomography is an ill-posed inverse problem, particularly with limited available data. Regularization is thus essential, but its effectiveness hinges on the choi...
- Thegra: Graph-based SLAM for Thermal Imagery : Abstract: Thermal imaging provides a practical sensing modality for visual SLAM in visually degraded environments such as low illumination, smoke, or adverse weather. However, thermal imagery often ex...
- TIBR4D: Tracing-Guided Iterative Boundary Refinement for Efficient 4D Gaussian Segmentation : Abstract: Object-level segmentation in dynamic 4D Gaussian scenes remains challenging due to complex motion, occlusions, and ambiguous boundaries. In this paper, we present an efficient learning-free ...
- FLAG-4D: Flow-Guided Local-Global Dual-Deformation Model for 4D Reconstruction : Abstract: We introduce FLAG-4D, a novel framework for generating novel views of dynamic scenes by reconstructing how 3D Gaussian primitives evolve through space and time. Existing methods typically re...
- SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning : Abstract: Photorealistic color retouching plays a vital role in visual content creation, yet manual retouching remains inaccessible to non-experts due to its reliance on specialized expertise. Referen...
- Overview and Comparison of AVS Point Cloud Compression Standard : Abstract: Point cloud is a prevalent 3D data representation format with significant application values in immersive media, autonomous driving, digital heritage protection, etc. However, the large data...
- Inspiration Seeds: Learning Non-Literal Visual Combinations for Generative Exploration : Abstract: While generative models have become powerful tools for image synthesis, they are typically optimized for executing carefully crafted textual prompts, offering limited support for the open-en...
- Improving Reconstruction of Representation Autoencoder : Abstract: Recent work leverages Vision Foundation Models as image encoders to boost the generative performance of latent diffusion models (LDMs), as their semantic feature distributions are easy to le...
- Revisiting [CLS] and Patch Token Interaction in Vision Transformers : Abstract: Vision Transformers have emerged as powerful, scalable and versatile representation learners. To capture both global and local features, a learnable [CLS] class token is typically prepended ...
- Deep Learning-Based Fixation Type Prediction for Quality Assurance in Digital Pathology : Abstract: Accurate annotation of fixation type is a critical step in slide preparation for pathology laboratories. However, this manual process is prone to errors, impacting downstream analyses and ...
- WiFlow: A Lightweight WiFi-based Continuous Human Pose Estimation Network with Spatio-Temporal Feature Decoupling : Abstract: Human pose estimation is fundamental to intelligent perception in the Internet of Things (IoT), enabling applications ranging from smart healthcare to human-computer interaction. While WiFi-...
- A Machine Learning accelerated geophysical fluid solver : Abstract: Machine learning methods have been successful in many areas, like image classification and natural language processing. However, it still needs to be determined how to apply ML to areas with...
- OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence : Abstract: Hypothesis. Artificial general intelligence is, at its core, a compression problem. Effective compression demands resonance: deep learning scales best when its architecture aligns with the f...
- Low-Light Video Enhancement with An Effective Spatial-Temporal Decomposition Paradigm : Abstract: Low-Light Video Enhancement (LLVE) seeks to restore dynamic or static scenes plagued by severe invisibility and noise. In this paper, we present an innovative video decomposition strategy th...
- TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions : Abstract: This paper proposes Omni Dense Captioning, a novel task designed to generate continuous, fine-grained, and structured audio-visual narratives with explicit timestamps. To ensure dense semant...
- Rotated Lights for Consistent and Efficient 2D Gaussians Inverse Rendering : Abstract: Inverse rendering aims to decompose a scene into its geometry, material properties and light conditions under a certain rendering model. It has wide applications like view synthesis, relight...
- FusionEdit: Semantic Fusion and Attention Modulation for Training-Free Image Editing : Abstract: Text-guided image editing aims to modify specific regions according to the target prompt while preserving the identity of the source image. Recent methods exploit explicit binary masks to co...
- SynSacc: A Blender-to-V2E Pipeline for Synthetic Neuromorphic Eye-Movement Data and Sim-to-Real Spiking Model Training : Abstract: The study of eye movements, particularly saccades and fixations, are fundamental to understanding the mechanisms of human cognition and perception. Accurate classification of these movements...
- Closing the Confusion Loop: CLIP-Guided Alignment for Source-Free Domain Adaptation : Abstract: Source-Free Domain Adaptation (SFDA) tackles the problem of adapting a pre-trained source model to an unlabeled target domain without accessing any source data, which is quite suitable for t...
- From Correspondence to Actions: Human-Like Multi-Image Spatial Reasoning in Multi-modal Large Language Models : Abstract: While multimodal large language models (MLLMs) have made substantial progress in single-image spatial reasoning, multi-image spatial reasoning, which requires integration of information from...
- Shifting the Breaking Point of Flow Matching for Multi-Instance Editing : Abstract: Flow matching models have recently emerged as an efficient alternative to diffusion, especially for text-guided image generation and editing, offering faster inference through continuous-tim...
- MVAnimate: Enhancing Character Animation with Multi-View Optimization : Abstract: The demand for realistic and versatile character animation has surged, driven by its wide-ranging applications in various domains. However, the animation generation algorithms modeling human...
- Moving Beyond Functional Connectivity: Time-Series Modeling for fMRI-Based Brain Disorder Classification : Abstract: Functional magnetic resonance imaging (fMRI) enables non-invasive brain disorder classification by capturing blood-oxygen-level-dependent (BOLD) signals. However, most existing methods rely ...
- ALIVE: Animate Your World with Lifelike Audio-Video Generation : Abstract: Video generation is rapidly evolving towards unified audio-video generation. In this paper, we present ALIVE, a generation model that adapts a pretrained Text-to-Video (T2V) model to Sora-st...
- All-Optical Segmentation via Diffractive Neural Networks for Autonomous Driving : Abstract: Semantic segmentation and lane detection are crucial tasks in autonomous driving systems. Conventional approaches predominantly rely on deep neural networks (DNNs), which incur high energy c...
- Rolling Sink: Bridging Limited-Horizon Training and Open-Ended Testing in Autoregressive Video Diffusion : Abstract: Recently, autoregressive (AR) video diffusion models has achieved remarkable performance. However, due to their limited training durations, a train-test gap emerges when testing at longer ho...
- Uncertainty-Aware Counterfactual Traffic Signal Control with Predictive Safety and Starvation-Avoidance Constraints Using Vision-Based Sensing : Abstract: Real-world deployment of adaptive traffic signal control, to date, remains limited due to the uncertainty associated with vision-based perception, implicit safety, and non-interpretable cont...
- Out of the box age estimation through facial imagery: A Comprehensive Benchmark of Vision-Language Models vs. out-of-the-box Traditional Architectures : Abstract: Facial age estimation is critical for content moderation, age verification, and deepfake detection, yet no prior benchmark has systematically compared modern vision-language models (VLMs) ag...
- Back to Physics: Operator-Guided Generative Paths for SMS MRI Reconstruction : Abstract: Simultaneous multi-slice (SMS) imaging with in-plane undersampling enables highly accelerated MRI but yields a strongly coupled inverse problem with deterministic inter-slice interference an...
- Open-Text Aerial Detection: A Unified Framework For Aerial Visual Grounding And Detection : Abstract: Open-Vocabulary Aerial Detection (OVAD) and Remote Sensing Visual Grounding (RSVG) have emerged as two key paradigms for aerial scene understanding. However, each paradigm suffers from inher...
- VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping : Abstract: We present a training-free, plug-and-play method, namely VFace, for high-quality face swapping in videos. It can be seamlessly integrated with image-based face swapping approaches built on d...
- Geometry-Aware Rotary Position Embedding for Consistent Video World Model : Abstract: Predictive world models that simulate future observations under explicit camera control are fundamental to interactive AI. Despite rapid advances, current systems lack spatial persistence: t...
- Recovering 3D Shapes from Ultra-Fast Motion-Blurred Images : Abstract: We consider the problem of 3D shape recovery from ultra-fast motion-blurred images. While 3D reconstruction from static images has been extensively studied, recovering geometry from extreme ...
- Thinking in Structures: Evaluating Spatial Intelligence through Reasoning on Constrained Manifolds : Abstract: Spatial intelligence is crucial for vision--language models (VLMs) in the physical world, yet many benchmarks evaluate largely unconstrained scenes where models can exploit 2D shortcuts. We ...
- WristMIR: Coarse-to-Fine Region-Aware Retrieval of Pediatric Wrist Radiographs with Radiology Report-Driven Learning : Abstract: Retrieving wrist radiographs with analogous fracture patterns is challenging because clinically important cues are subtle, highly localized and often obscured by overlapping anatomy or varia...
- Rethinking Practical and Efficient Quantization Calibration for Vision-Language Models : Abstract: Post-training quantization (PTQ) is a primary approach for deploying large language models without fine-tuning, and the quantized performance is often strongly affected by the calibration in...
- Which private attributes do VLMs agree on and predict well? : Abstract: Visual Language Models (VLMs) are often used for zero-shot detection of visual attributes in the image. We present a zero-shot evaluation of open-source VLMs for privacy-related attribute re...
- Integrating Specialized and Generic Agent Motion Prediction with Dynamic Occupancy Grid Maps : Abstract: Accurate prediction of driving scene is a challenging task due to uncertainty in sensor data, the complex behaviors of agents, and the possibility of multiple feasible futures. Existing pred...
- One-Shot Crowd Counting With Density Guidance For Scene Adaptaion : Abstract: Crowd scenes captured by cameras at different locations vary greatly, and existing crowd models have limited generalization for unseen surveillance scenes. To improve the generalization of t...
- D-ORCA: Dialogue-Centric Optimization for Robust Audio-Visual Captioning : Abstract: Spoken dialogue is a primary source of information in videos; therefore, accurately identifying who spoke what and when is essential for deep video understanding. We introduce D-ORCA, a \tex...
- EasyTune: Efficient Step-Aware Fine-Tuning for Diffusion-Based Motion Generation : Abstract: In recent years, motion generative models have undergone significant advancement, yet pose challenges in aligning with downstream objectives. Recent studies have shown that using differentia...
- FSP-Diff: Full-Spectrum Prior-Enhanced DualDomain Latent Diffusion for Ultra-Low-Dose Spectral CT Reconstruction : Abstract: Spectral computed tomography (CT) with photon-counting detectors holds immense potential for material discrimination and tissue characterization. However, under ultra-low-dose conditions, th...
- Continuity-driven Synergistic Diffusion with Neural Priors for Ultra-Sparse-View CBCT Reconstruction : Abstract: The clinical application of cone-beam computed tomography (CBCT) is constrained by the inherent trade-off between radiation exposure and image quality. Ultra-sparse angular sampling, employe...
- Deepfake Synthesis vs. Detection: An Uneven Contest : Abstract: The rapid advancement of deepfake technology has significantly elevated the realism and accessibility of synthetic media. Emerging techniques, such as diffusion-based models and Neural Radia...
- PhysDrape: Learning Explicit Forces and Collision Constraints for Physically Realistic Garment Draping : Abstract: Deep learning-based garment draping has emerged as a promising alternative to traditional Physics-Based Simulation (PBS), yet robust collision handling remains a critical bottleneck. Most ex...
- Enhanced Mixture 3D CGAN for Completion and Generation of 3D Objects : Abstract: The generation and completion of 3D objects represent a transformative challenge in computer vision. Generative Adversarial Networks (GANs) have recently demonstrated strong potential in syn...
- Vanilla Group Equivariant Vision Transformer: Simple and Effective : Abstract: Incorporating symmetry priors as inductive biases to design equivariant Vision Transformers (ViTs) has emerged as a promising avenue for enhancing their performance. However, existing equiva...
- Automatic register identification for the open web using multilingual deep learning : Abstract: This article presents multilingual deep learning models for identifying web registers -- text varieties such as news reports and discussion forums -- across 16 languages. We introduce the Mu...
- Black Big Boxes: Tracing Adjective Order Preferences in Large Language Models : Abstract: In English and other languages, multiple adjectives in noun phrases follow intricate ordering patterns. These patterns have been widely studied in linguistics and provide a useful test case ...
- Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals : Abstract: Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient when the information required to answer a query is localized within the context. We present ...
- Cross-Modal Retrieval for Motion and Text via DropTriple Loss : Abstract: Cross-modal retrieval of image-text and video-text is a prominent research area in computer vision and natural language processing. However, there has been insufficient attention given to cr...
- Robust and Real-Time Bangladeshi Currency Recognition: A Dual-Stream MobileNet and EfficientNet Approach : Abstract: Accurate currency recognition is essential for assistive technologies, particularly for visually impaired individuals who rely on others to identify banknotes. This dependency puts them at r...
- Gaussian-Constrained LeJEPA Representations for Unsupervised Scene Discovery and Pose Consistency : Abstract: Unsupervised 3D scene reconstruction from unstructured image collections remains a fundamental challenge in computer vision, particularly when images originate from multiple unrelated scenes...
- Deep Learning Based Multi-Level Classification for Aviation Safety : Abstract: Bird strikes pose a significant threat to aviation safety, often resulting in loss of life, severe aircraft damage, and substantial financial costs. Existing bird strike prevention strategie...
- COMBOOD: A Semiparametric Approach for Detecting Out-of-distribution Data for Image Classification : Abstract: Identifying out-of-distribution (OOD) data at inference time is crucial for many machine learning applications, especially for automation. We present a novel unsupervised semi-parametric fra...
- Enhancing IMU-Based Online Handwriting Recognition via Contrastive Learning with Zero Inference Overhead : Abstract: Online handwriting recognition using inertial measurement units opens up handwriting on paper as input for digital devices. Doing it on edge hardware improves privacy and lowers latency, but...
- Toward Accurate and Accessible Markerless Neuronavigation : Abstract: Neuronavigation is widely used in biomedical research and interventions to guide the precise placement of instruments around the head to support procedures such as transcranial magnetic stim...
- RECITYGEN -- Interactive and Generative Participatory Urban Design Tool with Latent Diffusion and Segment Anything : Abstract: Urban design profoundly impacts public spaces and community engagement. Traditional top-down methods often overlook public input, creating a gap in design aspirations and reality. Recent adv...
- From Images to Decisions: Assistive Computer Vision for Non-Metallic Content Estimation in Scrap Metal : Abstract: Scrap quality directly affects energy use, emissions, and safety in steelmaking. Today, the share of non-metallic inclusions (contamination) is judged visually by inspectors - an approach th...
- Exploring Physical Intelligence Emergence via Omni-Modal Architecture and Physical Data Engine : Abstract: Physical understanding remains brittle in omni-modal models because key physical attributes are visually ambiguous and sparsely represented in web-scale data. We present OmniFysics, a compac...
- Contactless estimation of continuum displacement and mechanical compressibility from image series using a deep learning based framework : Abstract: Contactless and non-invasive estimation of mechanical properties of physical media from optical observations is of interest for manifold engineering and biomedical applications, where direct...
- TLC-Plan: A Two-Level Codebook Based Network for End-to-End Vector Floorplan Generation : Abstract: Automated floorplan generation aims to improve design quality, architectural efficiency, and sustainability by jointly modeling global spatial organization and precise geometric detail. Howe...
- Zero-Shot UAV Navigation in Forests via Relightable 3D Gaussian Splatting : Abstract: UAV navigation in unstructured outdoor environments using passive monocular vision is hindered by the substantial visual domain gap between simulation and reality. While 3D Gaussian Splattin...
- Privacy in Image Datasets: A Case Study on Pregnancy Ultrasounds : Abstract: The rise of generative models has led to increased use of large-scale datasets collected from the internet, often with minimal or no data curation. This raises concerns about the inclusion o...
- DuMeta++: Spatiotemporal Dual Meta-Learning for Generalizable Few-Shot Brain Tissue Segmentation Across Diverse Ages : Abstract: Accurate segmentation of brain tissues from MRI scans is critical for neuroscience and clinical applications, but achieving consistent performance across the human lifespan remains challengi...
- Condition Matters in Full-head 3D GANs : Abstract: Conditioning is crucial for stable training of full-head 3D GANs. Without any conditioning signal, the model suffers from severe mode collapse, making it impractical to training. However, a ...
- Understanding Real-World Traffic Safety through RoadSafe365 Benchmark : Abstract: Although recent traffic benchmarks have advanced multimodal data analysis, they generally lack systematic evaluation aligned with official safety standards. To fill this gap, we introduce Ro...
- TwistNet-2D: Learning Second-Order Channel Interactions via Spiral Twisting for Texture Recognition : Abstract: Second-order feature statistics are central to texture recognition, yet current methods face a fundamental tension: bilinear pooling and Gram matrices capture global channel correlations but...
- VideoNeuMat: Neural Material Extraction from Generative Video Models : Abstract: Creating photorealistic materials for 3D rendering requires exceptional artistic skill. Generative models for materials could help, but are currently limited by the lack of high-quality trai...
- Diabetic Retinopathy Lesion Segmentation through Attention Mechanisms : Abstract: Diabetic Retinopathy (DR) is an eye disease which arises due to diabetes mellitus. It might cause vision loss and blindness. To prevent irreversible vision loss, early detection through syst...
- Row-Column Separated Attention Based Low-Light Image/Video Enhancement : Abstract: U-Net structure is widely used for low-light image/video enhancement. The enhanced images result in areas with large local noise and loss of more details without proper guidance for global i...
- Perspective-aware fusion of incomplete depth maps and surface normals for accurate 3D reconstruction : Abstract: We address the problem of reconstructing 3D surfaces from depth and surface normal maps acquired by a sensor system based on a single perspective camera. Depth and normal maps can be obtaine...
- PTB-XL-Image-17K: A Large-Scale Synthetic ECG Image Dataset with Comprehensive Ground Truth for Deep Learning-Based Digitization : Abstract: Electrocardiogram (ECG) digitization-converting paper-based or scanned ECG images back into time-series signals-is critical for leveraging decades of legacy clinical data in modern deep lear...
- SoulX-FlashHead: Oracle-guided Generation of Infinite Real-time Streaming Talking Heads : Abstract: Achieving a balance between high-fidelity visual quality and low-latency streaming remains a formidable challenge in audio-driven portrait generation. Existing large-scale models often suffe...
- SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning : Abstract: Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing eval...
- GlobalWasteData: A Large-Scale, Integrated Dataset for Robust Waste Classification and Environmental Monitoring : Abstract: The growing amount of waste is a problem for the environment that requires efficient sorting techniques for various kinds of waste. An automated waste classification system is used for this ...
- Thermal odometry and dense mapping using learned ddometry and Gaussian splatting : Abstract: Thermal infrared sensors, with wavelengths longer than smoke particles, can capture imagery independent of darkness, dust, and smoke. This robustness has made them increasingly valuable for ...
- Learning Brain Representation with Hierarchical Visual Embeddings : Abstract: Decoding visual representations from brain signals has attracted significant attention in both neuroscience and artificial intelligence. However, the degree to which brain signals truly enco...
- IM-Animation: An Implicit Motion Representation for Identity-decoupled Character Animation : Abstract: Recent progress in video diffusion models has markedly advanced character animation, which synthesizes motioned videos by animating a static identity image according to a driving video. Expl...
- Adaptive Image Zoom-in with Bounding Box Transformation for UAV Object Detection : Abstract: Detecting objects from UAV-captured images is challenging due to the small object size. In this work, a simple and efficient adaptive zoom-in framework is explored for object detection on UA...
- CA-YOLO: Cross Attention Empowered YOLO for Biomimetic Localization : Abstract: In modern complex environments, achieving accurate and efficient target localization is essential in numerous fields. However, existing systems often face limitations in both accuracy and th...
- MUFASA: A Multi-Layer Framework for Slot Attention : Abstract: Unsupervised object-centric learning (OCL) decomposes visual scenes into distinct entities. Slot attention is a popular approach that represents individual objects as latent vectors, called ...
- FlexID: Training-Free Flexible Identity Injection via Intent-Aware Modulation for Text-to-Image Generation : Abstract: Personalized text-to-image generation aims to seamlessly integrate specific identities into textual descriptions. However, existing training-free methods often rely on rigid visual feature i...
- SIGMA: Selective-Interleaved Generation with Multi-Attribute Tokens : Abstract: Recent unified models such as Bagel demonstrate that paired image-edit data can effectively align multiple visual tasks within a single diffusion transformer. However, these models remain li...
- Human Identification at a Distance: Challenges, Methods and Results on the Competition HID 2025 : Abstract: Human identification at a distance (HID) is challenging because traditional biometric modalities such as face and fingerprints are often difficult to acquire in real-world scenarios. Gait re...
- Visualizing the Invisible: Enhancing Radiologist Performance in Breast Mammography via Task-Driven Chromatic Encoding : Abstract: Purpose:Mammography screening is less sensitive in dense breasts, where tissue overlap and subtle findings increase perceptual difficulty. We present MammoColor, an end-to-end framework with...
- HistoMet: A Pan-Cancer Deep Learning Framework for Prognostic Prediction of Metastatic Progression and Site Tropism from Primary Tumor Histopathology : Abstract: Metastatic Progression remains the leading cause of cancer-related mortality, yet predicting whether a primary tumor will metastasize and where it will disseminate directly from histopatholo...
- Uncovering Modality Discrepancy and Generalization Illusion for General-Purpose 3D Medical Segmentation : Abstract: While emerging 3D medical foundation models are envisioned as versatile tools with offer general-purpose capabilities, their validation remains largely confined to regional and structural im...
- Influence of Geometry, Class Imbalance and Alignment on Reconstruction Accuracy -- A Micro-CT Phantom-Based Evaluation : Abstract: The accuracy of the 3D models created from medical scans depends on imaging hardware, segmentation methods and mesh processing techniques etc. The effects of geometry type, class imbalance, ...
- Semantic-Deviation-Anchored Multi-Branch Fusion for Unsupervised Anomaly Detection and Localization in Unstructured Conveyor-Belt Coal Scenes : Abstract: Reliable foreign-object anomaly detection and pixel-level localization in conveyor-belt coal scenes are essential for safe and intelligent mining operations. This task is particularly challe...
- A hybrid Kolmogorov-Arnold network for medical image segmentation : Abstract: Medical image segmentation plays a vital role in diagnosis and treatment planning, but remains challenging due to the inherent complexity and variability of medical images, especially in cap...
- Measuring cross-language intelligibility between Romance languages with computational tools : Abstract: We present an analysis of mutual intelligibility in related languages applied for languages in the Romance family. We introduce a novel computational metric for estimating intelligibility ba...
- DLLM Agent: See Farther, Run Faster : Abstract: Diffusion large language models (DLLMs) have emerged as an alternative to autoregressive (AR) decoding with appealing efficiency and modeling properties, yet their implications for agentic m...
- SED-SFT: Selectively Encouraging Diversity in Supervised Fine-Tuning : Abstract: Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has emerged as the standard post-training paradigm for large language models (LLMs). However, the conventional SFT proces...
- From Native Memes to Global Moderation: Cros-Cultural Evaluation of Vision-Language Models for Hateful Meme Detection : Abstract: Cultural context profoundly shapes how people interpret online content, yet vision-language models (VLMs) remain predominantly trained through Western or English-centric lenses. This limits ...
- Let's Simplify Step by Step: Guiding LLM Towards Multilingual Unsupervised Proficiency-Controlled Sentence Simplification : Abstract: Large language models demonstrate limited capability in proficiency-controlled sentence simplification, particularly when simplifying across large readability levels. We propose a framework ...
- SciClaimEval: Cross-modal Claim Verification in Scientific Papers : Abstract: We present SciClaimEval, a new scientific dataset for the claim verification task. Unlike existing resources, SciClaimEval features authentic claims, including refuted ones, directly extract...
- Letting Tutor Personas "Speak Up" for LLMs: Learning Steering Vectors from Dialogue via Preference Optimization : Abstract: With the emergence of large language models (LLMs) as a powerful class of generative artificial intelligence (AI), their use in tutoring has become increasingly prominent. Prior works on LLM...
- Blind to the Human Touch: Overlap Bias in LLM-Based Summary Evaluation : Abstract: Large language model (LLM) judges have often been used alongside traditional, algorithm-based metrics for tasks like summarization because they better capture semantic information, are bette...
- SRR-Judge: Step-Level Rating and Refinement for Enhancing Search-Integrated Reasoning in Search Agents : Abstract: Recent deep search agents built on large reasoning models (LRMs) excel at complex question answering by iteratively planning, acting, and gathering evidence, a capability known as search-int...
- Attn-GS: Attention-Guided Context Compression for Efficient Personalized LLMs : Abstract: Personalizing large language models (LLMs) to individual users requires incorporating extensive interaction histories and profiles, but input token constraints make this impractical due to h...
- Thinking Makes LLM Agents Introverted: How Mandatory Thinking Can Backfire in User-Engaged Agents : Abstract: Eliciting reasoning has emerged as a powerful technique for improving the performance of large language models (LLMs) on complex tasks by inducing thinking. However, their effectiveness in r...
- LLMs Know More About Numbers than They Can Say : Abstract: Although state-of-the-art LLMs can solve math problems, we find that they make errors on numerical comparisons with mixed notation: "Which is larger, $5.7 \times 10^2$ or $580$?" This raises...
- Evaluating and Calibrating LLM Confidence on Questions with Multiple Correct Answers : Abstract: Confidence calibration is essential for making large language models (LLMs) reliable, yet existing training-free methods have been primarily studied under single-answer question answering. I...
- Patches of Nonlinearity: Instruction Vectors in Large Language Models : Abstract: Despite the recent success of instruction-tuned language models and their ubiquitous usage, very little is known of how models process instructions internally. In this work, we address this ...
- Cross-Linguistic Persona-Driven Data Synthesis for Robust Multimodal Cognitive Decline Detection : Abstract: Speech-based digital biomarkers represent a scalable, non-invasive frontier for the early identification of Mild Cognitive Impairment (MCI). However, the development of robust diagnostic mod...
- The Judge Who Never Admits: Hidden Shortcuts in LLM-based Evaluation : Abstract: Large language models (LLMs) are increasingly used as automatic judges to evaluate system outputs in tasks such as reasoning, question answering, and creative writing. A faithful judge shoul...
- Diverge to Induce Prompting: Multi-Rationale Induction for Zero-Shot Reasoning : Abstract: To address the instability of unguided reasoning paths in standard Chain-of-Thought prompting, recent methods guide large language models (LLMs) by first eliciting a single reasoning strateg...
- Beyond Raw Detection Scores: Markov-Informed Calibration for Boosting Machine-Generated Text Detection : Abstract: While machine-generated texts (MGTs) offer great convenience, they also pose risks such as disinformation and phishing, highlighting the need for reliable detection. Metric-based methods, wh...
- TDGNet: Hallucination Detection in Diffusion Language Models via Temporal Dynamic Graphs : Abstract: Diffusion language models (D-LLMs) offer parallel denoising and bidirectional context, but hallucination detection for D-LLMs remains underexplored. Prior detectors developed for auto-regres...
- NLP for Local Governance Meeting Records: A Focus Article on Tasks, Datasets, Metrics and Benchmark : Abstract: Local governance meeting records are official documents, in the form of minutes or transcripts, documenting how proposals, discussions, and procedural actions unfold during institutional mee...
- LLMs and people both learn to form conventions -- just not with each other : Abstract: Humans align to one another in conversation -- adopting shared conventions that ease communication. We test whether LLMs form the same kinds of conventions in a multimodal communication game...
- Pretraining with Token-Level Adaptive Latent Chain-of-Thought : Abstract: Scaling large language models by increasing parameters and training data is increasingly constrained by limited high-quality corpora and rising communication costs. This work explores an alt...
- Document Reconstruction Unlocks Scalable Long-Context RLVR : Abstract: Reinforcement Learning with Verifiable Rewards~(RLVR) has become a prominent paradigm to enhance the capabilities (i.e.\ long-context) of Large Language Models~(LLMs). However, it often reli...
- On convexity and efficiency in semantic systems : Abstract: There are two widely held characterizations of human semantic category systems: (1) they form convex partitions of conceptual spaces, and (2) they are efficient for communication. While prio...
- Language Predicts Identity Fusion Across Cultures and Reveals Divergent Pathways to Violence : Abstract: In light of increasing polarization and political violence, understanding the psychological roots of extremism is increasingly important. Prior research shows that identity fusion predicts w...
- New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR : Abstract: Whether Reinforcement Learning with Verifiable Rewards (RLVR) endows Large Language Models (LLMs) with new capabilities or merely elicits latent traces remains a central debate. In this work...
- Knowledge Augmented Entity and Relation Extraction for Legal Documents with Hypergraph Neural Network : Abstract: With the continuous progress of digitization in Chinese judicial institutions, a substantial amount of electronic legal document information has been accumulated. To unlock its potential val...
- When Does Context Help? Error Dynamics of Contextual Information in Large Language Models : Abstract: Contextual information at inference time, such as demonstrations, retrieved knowledge, or interaction history, can substantially improve large language models (LLMs) without parameter update...
- JUSTICE: Judicial Unified Synthesis Through Intermediate Conclusion Emulation for Automated Judgment Document Generation : Abstract: Automated judgment document generation is a significant yet challenging legal AI task. As the conclusive written instrument issued by a court, a judgment document embodies complex legal reas...
- Improving Data and Reward Design for Scientific Reasoning in Large Language Models : Abstract: Solving open-ended science questions remains challenging for large language models, particularly due to inherently unreliable supervision and evaluation. The bottleneck lies in the data cons...
- An Attention-over-Attention Generative Model for Joint Multiple Intent Detection and Slot Filling : Abstract: In task-oriented dialogue systems, spoken language understanding (SLU) is a critical component, which consists of two sub-tasks, intent detection and slot filling. Most existing methods focu...
- UReason: Benchmarking the Reasoning Paradox in Unified Multimodal Models : Abstract: To elicit capabilities for addressing complex and implicit visual requirements, recent unified multimodal models increasingly adopt chain-of-thought reasoning to guide image generation. Howe...
- WorldTravel: A Realistic Multimodal Travel-Planning Benchmark with Tightly Coupled Constraints : Abstract: Real-world autonomous planning requires coordinating tightly coupled constraints where a single decision dictates the feasibility of all subsequent actions. However, existing benchmarks pred...
- ViGoEmotions: A Benchmark Dataset For Fine-grained Emotion Detection on Vietnamese Texts : Abstract: Emotion classification plays a significant role in emotion prediction and harmful content detection. Recent advancements in NLP, particularly through large language models (LLMs), have great...
- TEAM: Temporal-Spatial Consistency Guided Expert Activation for MoE Diffusion Language Model Acceleration : Abstract: Diffusion large language models (dLLMs) have recently gained significant attention due to their inherent support for parallel decoding. Building on this paradigm, Mixture-of-Experts (MoE) dL...
- Large Language Models and Impossible Language Acquisition: "False Promise" or an Overturn of our Current Perspective towards AI : Abstract: In Chomsky's provocative critique "The False Promise of CHATGPT," Large Language Models (LLMs) are characterized as mere pattern predictors that do not acquire languages via intrinsic causal...
- Characterizing, Evaluating, and Optimizing Complex Reasoning : Abstract: Large Reasoning Models (LRMs) increasingly rely on reasoning traces with complex internal structures. However, existing work lacks a unified answer to three fundamental questions: (1) what d...
- How Do Language Models Understand Tables? A Mechanistic Analysis of Cell Location : Abstract: While Large Language Models (LLMs) are increasingly deployed for table-related tasks, the internal mechanisms enabling them to process linearized two-dimensional structured tables remain opa...
- Beyond Scalar Scores: Reinforcement Learning for Error-Aware Quality Estimation of Machine Translation : Abstract: Quality Estimation (QE) aims to assess the quality of machine translation (MT) outputs without relying on reference translations, making it essential for real-world, large-scale MT evaluatio...
- VocalNet-MDM: Accelerating Streaming Speech LLM via Self-Distilled Masked Diffusion Modeling : Abstract: Recent Speech Large Language Models~(LLMs) have achieved impressive capabilities in end-to-end speech interaction. However, the prevailing autoregressive paradigm imposes strict serial const...
- Do Multilingual LLMs have specialized language heads? : Abstract: Multilingual large language models (LLMs) have gained significant popularity for their ability to process and generate text across multiple languages. However, deploying these models in prod...
- Fundamental Reasoning Paradigms Induce Out-of-Domain Generalization in Language Models : Abstract: Deduction, induction, and abduction are fundamental reasoning paradigms, core for human logical thinking. Although improving Large Language Model (LLM) reasoning has attracted significant re...
- Old wine in old glasses: Comparing computational and qualitative methods in identifying incivility on Persian Twitter during the #MahsaAmini movement : Abstract: This paper compares three approaches to detecting incivility in Persian tweets: human qualitative coding, supervised learning with ParsBERT, and large language models (ChatGPT). Using 47,278...
- Challenges in Translating Technical Lectures: Insights from the NPTEL : Abstract: This study examines the practical applications and methodological implications of Machine Translation in Indian Languages, specifically Bangla, Malayalam, and Telugu, within emerging transla...
- Do Images Clarify? A Study on the Effect of Images on Clarifying Questions in Conversational Search : Abstract: Conversational search systems increasingly employ clarifying questions to refine user queries and improve the search experience. Previous studies have demonstrated the usefulness of text-bas...
- FactSim: Fact-Checking for Opinion Summarization : Abstract: We explore the need for more comprehensive and precise evaluation techniques for generative artificial intelligence (GenAI) in text summarization tasks, specifically in the area of opinion s...
- PERSPECTRA: A Scalable and Configurable Pluralist Benchmark of Perspectives from Arguments : Abstract: Pluralism, the capacity to engage with diverse perspectives without collapsing them into a single viewpoint, is critical for developing large language models that faithfully reflect human he...
- Map of Encoders -- Mapping Sentence Encoders using Quantum Relative Entropy : Abstract: We propose a method to compare and visualise sentence encoders at scale by creating a map of encoders where each sentence encoder is represented in relation to the other sentence encoders. S...
- LakeHopper: Cross Data Lakes Column Type Annotation through Model Adaptation : Abstract: Column type annotation is vital for tasks like data cleaning, integration, and visualization. Recent solutions rely on resource-intensive language models fine-tuned on well-annotated columns...
- Large Language Models for Geolocation Extraction in Humanitarian Crisis Response : Abstract: Humanitarian crises demand timely and accurate geographic information to inform effective response efforts. Yet, automated systems that extract locations from text often reproduce existing g...
- Is Reasoning Capability Enough for Safety in Long-Context Language Models? : Abstract: Large language models (LLMs) increasingly combine long-context processing with advanced reasoning, enabling them to retrieve and synthesize information distributed across tens of thousands o...
- GitSearch: Enhancing Community Notes Generation with Gap-Informed Targeted Search : Abstract: Community-based moderation offers a scalable alternative to centralized fact-checking, yet it faces significant structural challenges, and existing AI-based methods fail in "cold start" scen...
- How Should We Model the Probability of a Language? : Abstract: Of the over 7,000 languages spoken in the world, commercial language identification (LID) systems only reliably identify a few hundred in written form. Research-grade systems extend this cov...
- When Actions Go Off-Task: Detecting and Correcting Misaligned Actions in Computer-Use Agents : Abstract: Computer-use agents (CUAs) have made tremendous progress in the past year, yet they still frequently produce misaligned actions that deviate from the user's original intent. Such misaligned ...
- UNIKIE-BENCH: Benchmarking Large Multimodal Models for Key Information Extraction in Visual Documents : Abstract: Key Information Extraction (KIE) from real-world documents remains challenging due to substantial variations in layout structures, visual quality, and task-specific information requirements....
- Comprehensive Evaluation of Large Language Models on Software Engineering Tasks: A Multi-Task Benchmark : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in software engineering, yet comprehensive benchmarks covering diverse SE activities remain limited. We present a multi...
- Massive Sound Embedding Benchmark (MSEB) : Abstract: Audio is a critical component of multimodal perception, and any truly intelligent system must demonstrate a wide range of auditory capabilities. These capabilities include transcription, cla...
- Measuring Complexity at the Requirements Stage: Spectral Metrics as Development Effort Predictors : Abstract: Complexity in engineered systems presents one of the most persistent challenges in modern development since it is driving cost overruns, schedule delays, and outright project failures. Yet w...
- Training-Driven Representational Geometry Modularization Predicts Brain Alignment in Language Models : Abstract: How large language models (LLMs) align with the neural representation and computation of human language is a central question in cognitive science. Using representational geometry as a mecha...
- ViCA: Efficient Multimodal LLMs with Vision-Only Cross-Attention : Abstract: Modern multimodal large language models (MLLMs) adopt a unified self-attention design that processes visual and textual tokens at every Transformer layer, incurring substantial computational...
- On Sequence-to-Sequence Models for Automated Log Parsing : Abstract: Log parsing is a critical standard operating procedure in software systems, enabling monitoring, anomaly detection, and failure diagnosis. However, automated log parsing remains challenging ...
- Linguistics and Human Brain: A Perspective of Computational Neuroscience : Abstract: Elucidating the language-brain relationship requires bridging the methodological gap between the abstract theoretical frameworks of linguistics and the empirical neural data of neuroscience....
- Automating Computational Reproducibility in Social Science: Comparing Prompt-Based and Agent-Based Approaches : Abstract: Reproducing computational research is often assumed to be as simple as rerunning the original code with provided data. In practice, missing packages, fragile file paths, version conflicts, o...
- ValueFlow: Measuring the Propagation of Value Perturbations in Multi-Agent LLM Systems : Abstract: Multi-agent large language model (LLM) systems increasingly consist of agents that observe and respond to one another's outputs. While value alignment is typically evaluated for isolated mod...
- Prototype-Based Disentanglement for Controllable Dysarthric Speech Synthesis : Abstract: Dysarthric speech exhibits high variability and limited labeled data, posing major challenges for both automatic speech recognition (ASR) and assistive speech technologies. Existing approach...
- Beyond Transcripts: A Renewed Perspective on Audio Chaptering : Abstract: Audio chaptering, the task of automatically segmenting long-form audio into coherent sections, is increasingly important for navigating podcasts, lectures, and videos. Despite its relevance,...
- Paradox of De-identification: A Critique of HIPAA Safe Harbour in the Age of LLMs : Abstract: Privacy is a human right that sustains patient-provider trust. Clinical notes capture a patient's private vulnerability and individuality, which are used for care coordination and research. ...
- CoinPress: Practical Private Mean and Covariance Estimation : Abstract: We present simple differentially private estimators for the mean and covariance of multivariate sub-Gaussian data that are accurate at small sample sizes. We demonstrate the effectiveness of...
- Estimating the Value of Evidence-Based Decision Making : Abstract: In an era of data abundance, statistical evidence is increasingly critical for business and policy decisions. Yet, organizations lack empirical tools to assess the value of evidence-based de...
- On the Computational Efficiency of Bayesian Additive Regression Trees: An Asymptotic Analysis : Abstract: Bayesian Additive Regression Trees (BART) is a popular Bayesian non-parametric regression model that is commonly used in causal inference and beyond. Its strong predictive performance is sup...
- A High Resolution Urban and Rural Settlement Map of Africa Using Deep Learning and Satellite Imagery : Abstract: Accurate and consistent mapping of urban and rural areas is crucial for sustainable development, spatial planning, and policy design. It is particularly important in simulating the complex i...
- Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time : Abstract: We study the dynamic correlation clustering problem with $\textit{adaptive}$ edge label flips. In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled eit...
- End to End Collaborative Synthetic Data Generation : Abstract: The success of AI is based on the availability of data to train models. While in some cases a single data custodian may have sufficient data to enable AI, often multiple custodians need to c...
- Optimal Client Sampling in Federated Learning with Client-Level Heterogeneous Differential Privacy : Abstract: Federated Learning with client-level differential privacy (DP) provides a promising framework for collaboratively training models while rigorously protecting clients' privacy. However, class...
- Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model : Abstract: Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction rai...
- Can LLMs Discern the Traits Influencing Your Preferences? Evaluating Personality-Driven Preference Alignment in LLMs : Abstract: User preferences are increasingly used to personalize Large Language Model (LLM) responses, yet how to reliably leverage preference signals for answer generation remains under-explored. In p...
- Equipping LLM with Directional Multi-Talker Speech Understanding Capabilities : Abstract: Recent studies have demonstrated that prompting large language models (LLM) with audio encodings enables effective speech understanding capabilities. However, most speech LLMs are trained on...
- ViHERMES: A Graph-Grounded Multihop Question Answering Benchmark and System for Vietnamese Healthcare Regulations : Abstract: Question Answering (QA) over regulatory documents is inherently challenging due to the need for multihop reasoning across legally interdependent texts, a requirement that is particularly pro...
- Do Large Language Models Reflect Demographic Pluralism in Safety? : Abstract: Large Language Model (LLM) safety is inherently pluralistic, reflecting variations in moral norms, cultural expectations, and demographic contexts. Yet, existing alignment datasets such as A...
- When the Model Said 'No Comment', We Knew Helpfulness Was Dead, Honesty Was Alive, and Safety Was Terrified : Abstract: Large Language Models (LLMs) need to be in accordance with human values-being helpful, harmless, and honest (HHH)-is important for safe deployment. Existing works use Supervised Fine-Tuning ...
- Incremental (k, z)-Clustering on Graphs : Abstract: Given a weighted undirected graph, a number of clusters $k$, and an exponent $z$, the goal in the $(k, z)$-clustering problem on graphs is to select $k$ vertices as centers that minimize the...
- DNS: Data-driven Nonlinear Smoother for Complex Model-free Process : Abstract: We propose data-driven nonlinear smoother (DNS) to estimate a hidden state sequence of a complex dynamical process from a noisy, linear measurement sequence. The dynamical process is model-f...
- Constructive conditional normalizing flows : Abstract: Motivated by applications in conditional sampling, given a probability measure $μ$ and a diffeomorphism $φ$, we consider the problem of simultaneously approximating $φ$ and the pushforward $...
- Retrieval Pivot Attacks in Hybrid RAG: Measuring and Mitigating Amplified Leakage from Vector Seeds to Graph Expansion : Abstract: Hybrid Retrieval-Augmented Generation (RAG) pipelines combine vector similarity search with knowledge graph expansion for multi-hop reasoning. We show that this composition introduces a dist...
- Learning to Judge: LLMs Designing and Applying Evaluation Rubrics : Abstract: Large language models (LLMs) are increasingly used as evaluators for natural language generation, applying human-defined rubrics to assess system outputs. However, human rubrics are often st...
- Towards Understanding Multimodal Fine-Tuning: Spatial Features : Abstract: Contemporary Vision-Language Models (VLMs) achieve strong performance on a wide range of tasks by pairing a vision encoder with a pre-trained language model, fine-tuned for visual-text input...
- Welfarist Formulations for Diverse Similarity Search : Abstract: Nearest Neighbor Search (NNS) is a fundamental problem in data structures with wide-ranging applications, such as web search, recommendation systems, and, more recently, retrieval-augmented ...
- Amortising Inference and Meta-Learning Priors in Neural Networks : Abstract: One of the core facets of Bayesianism is in the updating of prior beliefs in light of new evidence$\text{ -- }$so how can we maintain a Bayesian approach if we have no prior beliefs in the f...
- Empirically Understanding the Value of Prediction in Allocation : Abstract: Institutions increasingly use prediction to allocate scarce resources. From a design perspective, better predictions compete with other investments, such as expanding capacity or improving t...
- AMEM4Rec: Leveraging Cross-User Similarity for Memory Evolution in Agentic LLM Recommenders : Abstract: Agentic systems powered by Large Language Models (LLMs) have shown strong potential in recommender systems but remain hindered by several challenges. Fine-tuning LLMs is parameter-inefficien...
- Cutting Through the Noise: On-the-fly Outlier Detection for Robust Training of Machine Learning Interatomic Potentials : Abstract: The accuracy of machine learning interatomic potentials suffers from reference data that contains numerical noise. Often originating from unconverged or inconsistent electronic-structure cal...
- Differentiable Logical Programming for Quantum Circuit Discovery and Optimization : Abstract: Designing high-fidelity quantum circuits remains challenging, and current paradigms often depend on heuristic, fixed-ansatz structures or rule-based compilers that can be suboptimal or lack ...
- Contrastive Learning for Diversity-Aware Product Recommendations in Retail : Abstract: Recommender systems often struggle with long-tail distributions and limited item catalog exposure, where a small subset of popular items dominates recommendations. This challenge is especial...
- Winner's Curse Drives False Promises in Data-Driven Decisions: A Case Study in Refugee Matching : Abstract: A major challenge in data-driven decision-making is accurate policy evaluation-i.e., guaranteeing that a learned decision-making policy achieves the promised benefits. A popular strategy is ...
- Analysis of Converged 3D Gaussian Splatting Solutions: Density Effects and Prediction Limit : Abstract: We investigate what structure emerges in 3D Gaussian Splatting (3DGS) solutions from standard multi-view optimization. We term these Rendering-Optimal References (RORs) and analyze their sta...
- AMS-HD: Hyperdimensional Computing for Real-Time and Energy-Efficient Acute Mountain Sickness Detection : Abstract: Altitude sickness is a potentially life-threatening condition that impacts many individuals traveling to elevated altitudes. Timely detection is critical as symptoms can escalate rapidly. Ea...
- Online monotone density estimation and log-optimal calibration : Abstract: We study the problem of online monotone density estimation, where density estimators must be constructed in a predictable manner from sequentially observed data. We propose two online estima...
- Provably robust learning of regression neural networks using $\beta$-divergences : Abstract: Regression neural networks (NNs) are most commonly trained by minimizing the mean squared prediction error, which is highly sensitive to outliers and data contamination. Existing robust trai...
- Learning to Coordinate via Quantum Entanglement in Multi-Agent Reinforcement Learning : Abstract: The inability to communicate poses a major challenge to coordination in multi-agent reinforcement learning (MARL). Prior work has explored correlating local policies via shared randomness, s...
- When do neural ordinary differential equations generalize on complex networks? : Abstract: Neural ordinary differential equations (neural ODEs) can effectively learn dynamical systems from time series data, but their behavior on graph-structured data remains poorly understood, esp...
- Universal Coefficients and Mayer-Vietoris Sequence for Groupoid Homology : Abstract: We study homology of ample groupoids via the compactly supported Moore complex of the nerve. Let $A$ be a topological abelian group. For $n\ge 0$ set $C_n(\mathcal G;A) := C_c(\mathcal G_n,A...
- Contact-Anchored Policies: Contact Conditioning Creates Strong Robot Utility Models : Abstract: The prevalent paradigm in robot learning attempts to generalize across environments, embodiments, and tasks with language prompts at runtime. A fundamental tension limits this approach: lang...
- Geometric Imbalance in Semi-Supervised Node Classification : Abstract: Class imbalance in graph data presents a significant challenge for effective node classification, particularly in semi-supervised scenarios. In this work, we formally introduce the concept o...
- Disentangled Representation Learning for Parametric Partial Differential Equations : Abstract: Neural operators (NOs) excel at learning mappings between function spaces, serving as efficient forward solution approximators for PDE-governed systems. However, as black-box solvers, they o...
- Interpretable Generalized Additive Models for Datasets with Missing Values : Abstract: Many important datasets contain samples that are missing one or more feature values. Maintaining the interpretability of machine learning models in the presence of such missing data is chall...
- Efficient Graph Knowledge Distillation from GNNs to Kolmogorov--Arnold Networks via Self-Attention Dynamic Sampling : Abstract: Recent success of graph neural networks (GNNs) in modeling complex graph-structured data has fueled interest in deploying them on resource-constrained edge devices. However, their substantia...
- Density-Aware Farthest Point Sampling : Abstract: We focus on training machine learning regression models in scenarios where the availability of labeled training data is limited due to computational constraints or high labeling costs. Thus,...
- Learning Self-Correction in Vision-Language Models via Rollout Augmentation : Abstract: Self-correction is essential for solving complex reasoning problems in vision-language models (VLMs). However, existing reinforcement learning (RL) methods struggle to learn it, as effective...
- Estimation of Fish Catch Using Sentinel-2, 3 and XGBoost-Kernel-Based Kernel Ridge Regression : Abstract: Oceanographic factors, such as sea surface temperature and upper-ocean dynamics, have a significant impact on fish distribution. Maintaining fisheries that contribute to global food security...
- Do physics-informed neural networks (PINNs) need to be deep? Shallow PINNs using the Levenberg-Marquardt algorithm : Abstract: This work investigates the use of shallow physics-informed neural networks (PINNs) for solving forward and inverse problems of nonlinear partial differential equations (PDEs). By reformulati...
- Trajectory Stitching for Solving Inverse Problems with Flow-Based Models : Abstract: Flow-based generative models have emerged as powerful priors for solving inverse problems. One option is to directly optimize the initial latent code (noise), such that the flow output solve...
- Robust Policy Optimization to Prevent Catastrophic Forgetting : Abstract: Large language models are commonly trained through multi-stage post-training: first via RLHF, then fine-tuned for other downstream objectives. Yet even small downstream updates can compromis...
- Kirin: Improving ANN efficiency with SNN Hybridization : Abstract: Artificial neural networks (ANNs), particularly large language models (LLMs), demonstrate powerful inference capabilities but consume substantial energy. Conversely, spiking neural networks ...
- FlexMoRE: A Flexible Mixture of Rank-heterogeneous Experts for Efficient Federatedly-trained Large Language Models : Abstract: Recent advances in mixture-of-experts architectures have shown that individual experts models can be trained federatedly, i.e., in isolation from other experts by using a common base model t...
- Bayesian Preference Learning for Test-Time Steerable Reward Models : Abstract: Reward models are central to aligning language models with human preferences via reinforcement learning (RL). As RL is increasingly applied to settings such as verifiable rewards and multi-o...
- Rethinking Graph Generalization through the Lens of Sharpness-Aware Minimization : Abstract: Graph Neural Networks (GNNs) have achieved remarkable success across various graph-based tasks but remain highly sensitive to distribution shifts. In this work, we focus on a prevalent yet u...
- Magnitude Distance: A Geometric Measure of Dataset Similarity : Abstract: Quantifying the distance between datasets is a fundamental question in mathematics and machine learning. We propose \textit{magnitude distance}, a novel distance metric defined on finite dat...
- Near-optimal Swap Regret Minimization for Convex Losses : Abstract: We give a randomized online algorithm that guarantees near-optimal $\widetilde O(\sqrt T)$ expected swap regret against any sequence of $T$ adaptively chosen Lipschitz convex losses on the u...
- Stress-Testing Alignment Audits With Prompt-Level Strategic Deception : Abstract: Alignment audits aim to robustly identify hidden goals from strategic, situationally aware misaligned models. Despite this threat model, existing auditing methods have not been systematicall...
- Discrete Bridges for Mutual Information Estimation : Abstract: Diffusion bridge models in both continuous and discrete state spaces have recently become powerful tools in the field of generative modeling. In this work, we leverage the discrete state spa...
- GSS: Gated Subspace Steering for Selective Memorization Mitigation in LLMs : Abstract: Large language models (LLMs) can memorize and reproduce training sequences verbatim -- a tendency that undermines both generalization and privacy. Existing mitigation methods apply intervent...
- Positive Distribution Shift as a Framework for Understanding Tractable Learning : Abstract: We study a setting where the goal is to learn a target function f(x) with respect to a target distribution D(x), but training is done on i.i.d. samples from a different training distribution...
- GEMSS: A Variational Bayesian Method for Discovering Multiple Sparse Solutions in Classification and Regression Problems : Abstract: Selecting interpretable feature sets in underdetermined ($n \ll p$) and highly correlated regimes constitutes a fundamental challenge in data science, particularly when analyzing physical me...
- Diffusion-Inspired Reconfiguration of Transformers for Uncertainty Calibration : Abstract: Uncertainty calibration in pre-trained transformers is critical for their reliable deployment in risk-sensitive applications. Yet, most existing pre-trained transformers do not have a princi...
- DynamiQ: Accelerating Gradient Synchronization using Compressed Multi-hop All-reduce : Abstract: Multi-hop all-reduce is the de facto backbone of large model training. As the training scale increases, the network often becomes a bottleneck, motivating reducing the volume of transmitted ...
- Distributionally Robust Optimization via Generative Ambiguity Modeling : Abstract: This paper studies Distributionally Robust Optimization (DRO), a fundamental framework for enhancing the robustness and generalization of statistical learning and optimization. An effective ...
- DirMoE: Dirichlet-routed Mixture of Experts : Abstract: Mixture-of-Experts (MoE) models have demonstrated exceptional performance in large-scale language models. Existing routers typically rely on non-differentiable Top-$k$+Softmax, limiting thei...
- ShapeCond: Fast Shapelet-Guided Dataset Condensation for Time Series Classification : Abstract: Time series data supports many domains (e.g., finance and climate science), but its rapid growth strains storage and computation. Dataset condensation can alleviate this by synthesizing a co...
- NLP Sampling: Combining MCMC and NLP Methods for Diverse Constrained Sampling : Abstract: Generating diverse samples under hard constraints is a core challenge in many areas. With this work we aim to provide an integrative view and framework to combine methods from the fields of ...
- Graph-Based Nearest-Neighbor Search without the Spread : Abstract: $\renewcommand{\Re}{\mathbb{R}}$Recent work showed how to construct nearest-neighbor graphs of linear size, on a given set $P$ of $n$ points in $\Re^d$, such that one can answer approximate ...
- Machine learning enhanced data assimilation framework for multiscale carbonate rock characterization : Abstract: Carbonate reservoirs offer significant capacity for subsurface carbon storage, oil production, and underground hydrogen storage. X-ray computed tomography (X-ray CT) coupled with numerical s...
- Curriculum-Learned Vanishing Stacked Residual PINNs for Hyperbolic PDE State Reconstruction : Abstract: Modeling distributed dynamical systems governed by hyperbolic partial differential equations (PDEs) remains challenging due to discontinuities and shocks that hinder the convergence of tradi...
- MolLIBRA: Genetic Molecular Optimization with Multi-Fingerprint Surrogates and Text-Molecule Aligned Critic : Abstract: We study sample-efficient molecular optimization under a limited budget of oracle evaluations. We propose MolLIBRA (MultimOdaLity and Language Integrated Bayesian and evolutionaRy optimizAti...
- Scalable spatial point process models for forensic footwear analysis : Abstract: Shoe print evidence recovered from crime scenes plays a key role in forensic investigations. By examining shoe prints, investigators can determine details of the footwear worn by suspects. H...
- Where Not to Learn: Prior-Aligned Training with Subset-based Attribution Constraints for Reliable Decision-Making : Abstract: Reliable models should not only predict correctly, but also justify decisions with acceptable evidence. Yet conventional supervised learning typically provides only class-level labels, allow...
- Financial Bond Similarity Search Using Representation Learning : Abstract: Finding similar bonds remains challenging in fixed-income analytics, as numerical financial attributes often overshadow categorical non-financial ones such as issuer sector and domicile. Thi...
- Condition Errors Refinement in Autoregressive Image Generation with Diffusion Loss : Abstract: Recent studies have explored autoregressive models for image generation, with promising results, and have combined diffusion models with autoregressive frameworks to optimize image generatio...
- Fair Context Learning for Evidence-Balanced Test-Time Adaptation in Vision-Language Models : Abstract: Vision-Language Models (VLMs) such as CLIP enable strong zero-shot recognition but suffer substantial degradation under distribution shifts. Test-Time Adaptation (TTA) aims to improve robust...
- OMNI-Dent: Towards an Accessible and Explainable AI Framework for Automated Dental Diagnosis : Abstract: Accurate dental diagnosis is essential for oral healthcare, yet many individuals lack access to timely professional evaluation. Existing AI-based methods primarily treat diagnosis as a visua...
- ShapBPT: Image Feature Attributions Using Data-Aware Binary Partition Trees : Abstract: Pixel-level feature attributions are an important tool in eXplainable AI for Computer Vision (XCV), providing visual insights into how image features influence model predictions. The Owen fo...
- BayesFlow 2.0: Multi-Backend Amortized Bayesian Inference in Python : Abstract: Modern Bayesian inference involves a mixture of computational methods for estimating, validating, and drawing conclusions from probabilistic models as part of principled workflows. An overar...
- Discrete Adjoint Matching : Abstract: Computation methods for solving entropy-regularized reward optimization -- a class of problems widely used for fine-tuning generative models -- have advanced rapidly. Among those, Adjoint Ma...
- High-fidelity 3D multi-slab diffusion MRI using Slab-shifting for Harmonized 3D Acquisition and Reconstruction with Profile Encoding Networks (SHARPEN) : Abstract: Three-dimensional (3D) multi-slab imaging is a promising approach for high-resolution in vivo diffusion MRI (dMRI) due to its compatibility with short TR (1-2 s), providing optimal signal-to...
- The Value of Variance: Mitigating Debate Collapse in Multi-Agent Systems via Uncertainty-Driven Policy Optimization : Abstract: Multi-agent debate (MAD) systems improve LLM reasoning through iterative deliberation, but remain vulnerable to debate collapse, a failure type where final agent decisions are compromised on...
- Automated Modernization of Machine Learning Engineering Notebooks for Reproducibility : Abstract: Interactive computational notebooks (e.g., Jupyter notebooks) are widely used in machine learning engineering (MLE) to program and share end-to-end pipelines, from data preparation to model ...
- Extracting Root-Causal Brain Activity Driving Psychopathology from Resting State fMRI : Abstract: Neuroimaging studies of psychiatric disorders often correlate imaging patterns with diagnostic labels or composite symptom scores, yielding diffuse associations that obscure underlying mecha...
- Beyond Crash: Hijacking Your Autonomous Vehicle for Fun and Profit : Abstract: Autonomous Vehicles (AVs), especially vision-based AVs, are rapidly being deployed without human operators. As AVs operate in safety-critical environments, understanding their robustness in ...
- 3D Transport-based Morphometry (3D-TBM) for medical image analysis : Abstract: Transport-Based Morphometry (TBM) has emerged as a new framework for 3D medical image analysis. By embedding images into a transport domain via invertible transformations, TBM facilitates ef...
- Cross-View World Models : Abstract: World models enable agents to plan by imagining future states, but existing approaches operate from a single viewpoint, typically egocentric, even when other perspectives would make planning...
- Parallel Track Transformers: Enabling Fast GPU Inference with Reduced Synchronization : Abstract: Efficient large-scale inference of transformer-based large language models (LLMs) remains a fundamental systems challenge, frequently requiring multi-GPU parallelism to meet stringent latenc...
- Optimization of Precipitate Segmentation Through Linear Genetic Programming of Image Processing : Abstract: Current analysis of additive manufactured niobium-based copper alloys relies on hand annotation due to varying contrast, noise, and image artifacts present in micrographs, slowing iteration ...
- Optimizing Few-Step Generation with Adaptive Matching Distillation : Abstract: Distribution Matching Distillation (DMD) is a powerful acceleration paradigm, yet its stability is often compromised in Forbidden Zone, regions where the real teacher provides unreliable gui...
- Efficient Post-Training Pruning of Large Language Models with Statistical Correction : Abstract: Post-training pruning is an effective approach for reducing the size and inference cost of large language models (LLMs), but existing methods often face a trade-off between pruning quality a...
- Learned Finite Element-based Regularization of the Inverse Problem in Electrocardiographic Imaging : Abstract: Electrocardiographic imaging (ECGI) seeks to reconstruct cardiac electrical activity from body-surface potentials noninvasively. However, the associated inverse problem is severely ill-posed...
- Statistical inference after variable selection in Cox models: A simulation study : Abstract: Choosing relevant predictors is central to the analysis of biomedical time-to-event data. Classical frequentist inference, however, presumes that the set of covariates is fixed in advance an...
- Physical Analog Kolmogorov-Arnold Networks based on Reconfigurable Nonlinear-Processing Units : Abstract: Kolmogorov-Arnold Networks (KANs) shift neural computation from linear layers to learnable nonlinear edge functions, but implementing these nonlinearities efficiently in hardware remains an ...
- Evaluating Object-Centric Models beyond Object Discovery : Abstract: Object-centric learning (OCL) aims to learn structured scene representations that support compositional generalization and robustness to out-of-distribution (OOD) data. However, OCL models a...
- LLM-Guided Diagnostic Evidence Alignment for Medical Vision-Language Pretraining under Limited Pairing : Abstract: Most existing CLIP-style medical vision--language pretraining methods rely on global or local alignment with substantial paired data. However, global alignment is easily dominated by non-dia...
- Improving Variable-Length Generation in Diffusion Language Models via Length Regularization : Abstract: Diffusion Large Language Models (DLLMs) are inherently ill-suited for variable-length generation, as their inference is defined on a fixed-length canvas and implicitly assumes a known target...
- Capturing the Topological Phase Transition and Thermodynamics of the 2D XY Model via Manifold-Aware Score-Based Generative Modeling : Abstract: The application of generative modeling to many-body physics offers a promising pathway for analyzing high-dimensional state spaces of spin systems. However, unlike computer vision tasks wher...
- $\partial$CBDs: Differentiable Causal Block Diagrams : Abstract: Modern cyber-physical systems (CPS) integrate physics, computation, and learning, demanding modeling frameworks that are simultaneously composable, learnable, and verifiable. Yet existing ap...
- Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization : Abstract: In this work, we investigate the large-scale mean-field variational inference (MFVI) problem from a mini-batch primal-dual perspective. By reformulating MFVI as a constrained finite-sum prob...
- Flow-Based Conformal Predictive Distributions : Abstract: Conformal prediction provides a distribution-free framework for uncertainty quantification via prediction sets with exact finite-sample coverage. In low dimensions these sets are easy to int...
- On Generation in Metric Spaces : Abstract: We study generation in separable metric instance spaces. We extend the language generation framework from Kleinberg and Mullainathan [2024] beyond countable domains by defining novelty throu...
- BFTS: Thompson Sampling with Bayesian Additive Regression Trees : Abstract: Contextual bandits are a core technology for personalized mobile health interventions, where decision-making requires adapting to complex, non-linear user behaviors. While Thompson Sampling ...
- SparseEval: Efficient Evaluation of Large Language Models by Sparse Optimization : Abstract: As large language models (LLMs) continue to scale up, their performance on various downstream tasks has significantly improved. However, evaluating their capabilities has become increasingly...
- CausalArmor: Efficient Indirect Prompt Injection Guardrails via Causal Attribution : Abstract: AI agents equipped with tool-calling capabilities are susceptible to Indirect Prompt Injection (IPI) attacks. In this attack scenario, malicious commands hidden within untrusted content tric...
- Learning to Alleviate Familiarity Bias in Video Recommendation : Abstract: Modern video recommendation systems aim to optimize user engagement and platform objectives, yet often face structural exposure imbalances caused by behavioral biases. In this work, we focus...
- Fast Model Selection and Stable Optimization for Softmax-Gated Multinomial-Logistic Mixture of Experts Models : Abstract: Mixture-of-Experts (MoE) architectures combine specialized predictors through a learned gate and are effective across regression and classification, but for classification with softmax multi...
- Tighter Information-Theoretic Generalization Bounds via a Novel Class of Change of Measure Inequalities : Abstract: In this paper, we propose a novel class of change of measure inequalities via a unified framework based on the data processing inequality for $f$-divergences, which is surprisingly elementar...
- Graph-based Semi-Supervised Learning via Maximum Discrimination : Abstract: Semi-supervised learning (SSL) addresses the critical challenge of training accurate models when labeled data is scarce but unlabeled data is abundant. Graph-based SSL (GSSL) has emerged as ...
- The CAPSARII Approach to Cyber-Secure Wearable, Ultra-Low-Power Networked Sensors for Soldier Health Monitoring : Abstract: The European Defence Agency's revised Capability Development Plan (CDP) identifies as a priority improving ground combat capabilities by enhancing soldiers' equipment for better protection. ...
- GAAVI: Global Asymptotic Anytime Valid Inference for the Conditional Mean Function : Abstract: Inference on the conditional mean function (CMF) is central to tasks from adaptive experimentation to optimal treatment assignment and algorithmic fairness auditing. In this work, we provide...
- MMLSv2: A Multimodal Dataset for Martian Landslide Detection in Remote Sensing Imagery : Abstract: We present MMLSv2, a dataset for landslide segmentation on Martian surfaces. MMLSv2 consists of multimodal imagery with seven bands: RGB, digital elevation model, slope, thermal inertia, and...
- Adjustment of Cluster-Then-Predict Framework for Multiport Scatterer Load Prediction : Abstract: Predicting interdependent load values in multiport scatterers is challenging due to high dimensionality and complex dependence between impedance and scattering ability, yet this prediction r...
- Evasion of IoT Malware Detection via Dummy Code Injection : Abstract: The Internet of Things (IoT) has revolutionized connectivity by linking billions of devices worldwide. However, this rapid expansion has also introduced severe security vulnerabilities, maki...
- Fundamental Limits of Community Detection in Contextual Multi-Layer Stochastic Block Models : Abstract: We consider the problem of community detection from the joint observation of a high-dimensional covariate matrix and $L$ sparse networks, all encoding noisy, partial information about the la...
- Information Geometry of Absorbing Markov-Chain and Discriminative Random Walks : Abstract: Discriminative Random Walks (DRWs) are a simple yet powerful tool for semi-supervised node classification, but their theoretical foundations remain fragmentary. We revisit DRWs through the l...
- Adaptive Matrix Online Learning through Smoothing with Guarantees for Nonsmooth Nonconvex Optimization : Abstract: We study online linear optimization with matrix variables constrained by the operator norm, a setting where the geometry renders designing data-dependent and efficient adaptive algorithms ch...
- Discrete Adjoint Schr\"odinger Bridge Sampler : Abstract: Learning discrete neural samplers is challenging due to the lack of gradients and combinatorial complexity. While stochastic optimal control (SOC) and Schrödinger bridge (SB) provide princip...
- A Statistical Framework for Alignment with Biased AI Feedback : Abstract: Modern alignment pipelines are increasingly replacing expensive human preference labels with evaluations from large language models (LLM-as-Judge). However, AI labels can be systematically b...
- Is Flow Matching Just Trajectory Replay for Sequential Data? : Abstract: Flow matching (FM) is increasingly used for time-series generation, but it is not well understood whether it learns a general dynamical structure or simply performs an effective "trajectory ...
- PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic Representation : Abstract: Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols. A central bottleneck...
- Schr\"odinger bridge problem via empirical risk minimization : Abstract: We study the Schrödinger bridge problem when the endpoint distributions are available only through samples. Classical computational approaches estimate Schrödinger potentials via Sinkhorn it...
- Empirical Study of Observable Sets in Multiclass Quantum Classification : Abstract: Variational quantum algorithms have gained attention as early applications of quantum computers for learning tasks. In the context of supervised learning, most of the works that tackle class...
- Enhanced Food Category Recognition under Illumination-Induced Domain Shift : Abstract: Visual food recognition systems deployed in real-world environments, such as automated conveyor-belt inspection, are highly sensitive to domain shifts caused by illumination changes. While r...
- Attention-Based Deep Learning for Early Parkinson's Disease Detection with Tabular Biomedical Data : Abstract: Early and accurate detection of Parkinson's disease (PD) remains a critical challenge in medical diagnostics due to the subtlety of early-stage symptoms and the complex, non-linear relations...
- A Thermodynamic Theory of Learning Part II: Critical Period Closure and Continual Learning Failure : Abstract: Learning performed over finite time is necessarily irreversible. In Part~I of this series, we modeled learning as a transport process in the space of parameter distributions and derived the ...
- On Improving Neurosymbolic Learning by Exploiting the Representation Space : Abstract: We study the problem of learning neural classifiers in a neurosymbolic setting where the hidden gold labels of input instances must satisfy a logical formula. Learning in this setting procee...
- Beyond Optimization: Intelligence as Metric-Topology Factorization under Geometric Incompleteness : Abstract: Contemporary ML often equates intelligence with optimization: searching for solutions within a fixed representational geometry. This works in static regimes but breaks under distributional s...
- When Is Compositional Reasoning Learnable from Verifiable Rewards? : Abstract: The emergence of compositional reasoning in large language models through reinforcement learning with verifiable rewards (RLVR) has been a key driver of recent empirical successes. Despite t...
- Regret Analysis of Unichain Average Reward Constrained MDPs with General Parameterization : Abstract: We study infinite-horizon average-reward constrained Markov decision processes (CMDPs) under the unichain assumption and general policy parameterizations. Existing regret analyses for constr...
- A Unified Density Operator View of Flow Control and Merging : Abstract: Recent progress in large-scale flow and diffusion models raised two fundamental algorithmic challenges: (i) control-based reward adaptation of pre-trained flows, and (ii) integration of mult...
- Sharp analysis of linear ensemble sampling : Abstract: We analyse linear ensemble sampling (ES) with standard Gaussian perturbations in stochastic linear bandits. We show that for ensemble size $m=Θ(d\log n)$, ES attains $\tilde O(d^{3/2}\sqrt n...
- Horizon Imagination: Efficient On-Policy Training in Diffusion World Models : Abstract: We study diffusion-based world models for reinforcement learning, which offer high generative fidelity but face critical efficiency challenges in control. Current methods either require heav...
- The Benefits of Diversity: Combining Comparisons and Ratings for Efficient Scoring : Abstract: Should humans be asked to evaluate entities individually or comparatively? This question has been the subject of long debates. In this work, we show that, interestingly, combining both forms...
- TAAM:Inductive Graph-Class Incremental Learning with Task-Aware Adaptive Modulation : Abstract: Graph Continual Learning (GCL) aims to solve the challenges of streaming graph data. However, current methods often depend on replay-based strategies, which raise concerns like memory limits...
- Interpretable Fuzzy Systems For Forward Osmosis Desalination : Abstract: Preserving interpretability in fuzzy rule-based systems (FRBS) is vital for water treatment, where decisions impact public health. While structural interpretability has been addressed using ...
- Compiler-Assisted Speculative Sampling for Accelerated LLM Inference on Heterogeneous Edge Devices : Abstract: LLM deployment on resource-constrained edge devices faces severe latency constraints, particularly in real-time applications where delayed responses can compromise safety or usability. Among...
- Efficient and Adaptable Detection of Malicious LLM Prompts via Bootstrap Aggregation : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding, reasoning, and generation. However, these systems remain susceptible to malicious pr...
- Efficient Distribution Learning with Error Bounds in Wasserstein Distance : Abstract: The Wasserstein distance has emerged as a key metric to quantify distances between probability distributions, with applications in various fields, including machine learning, control theory,...
- Enhancing Bandit Algorithms with LLMs for Time-varying User Preferences in Streaming Recommendations : Abstract: In real-world streaming recommender systems, user preferences evolve dynamically over time. Existing bandit-based methods treat time merely as a timestamp, neglecting its explicit relationsh...
- Probability Hacking and the Design of Trustworthy ML for Signal Processing in C-UAS: A Scenario Based Method : Abstract: In order to counter the various threats manifested by Unmanned Aircraft Systems (UAS) adequately, specialized Counter Unmanned Aircraft Systems (C-UAS) are required. Enhancing C-UAS with Eme...
- Mutual information and task-relevant latent dimensionality : Abstract: Estimating the dimensionality of the latent representation needed for prediction -- the task-relevant dimension -- is a difficult, largely unsolved problem with broad scientific applications...
- Online Bayesian Imbalanced Learning with Bregman-Calibrated Deep Networks : Abstract: Class imbalance remains a fundamental challenge in machine learning, where standard classifiers exhibit severe performance degradation in minority classes. Although existing approaches addre...
- Variance-Gated Ensembles: An Epistemic-Aware Framework for Uncertainty Estimation : Abstract: Machine learning applications require fast and reliable per-sample uncertainty estimation. A common approach is to use predictive distributions from Bayesian or approximation methods and add...
- A second order regret bound for NormalHedge : Abstract: We consider the problem of prediction with expert advice for ``easy'' sequences. We show that a variant of NormalHedge enjoys a second-order $ε$-quantile regret bound of $O\big(\sqrt{V_T \lo...
- Spherical Steering: Geometry-Aware Activation Rotation for Language Models : Abstract: Inference-time steering has emerged as a promising paradigm for controlling language models (LMs) without the cost of retraining. However, standard approaches typically rely on activation ad...
- A Causal Machine Learning Framework for Treatment Personalization in Clinical Trials: Application to Ulcerative Colitis : Abstract: Randomized controlled trials estimate average treatment effects, but treatment response heterogeneity motivates personalized approaches. A critical question is whether statistically detectab...
- Nansde-net: A neural sde framework for generating time series with memory : Abstract: Modeling time series with long- or short-memory characteristics is a fundamental challenge in many scientific and engineering domains. While fractional Brownian motion has been widely used a...
- Interpretable Dynamic Network Modeling of Tensor Time Series via Kronecker Time-Varying Graphical Lasso : Abstract: With the rapid development of web services, large amounts of time series data are generated and accumulated across various domains such as finance, healthcare, and online platforms. As such ...
- CADO: From Imitation to Cost Minimization for Heatmap-based Solvers in Combinatorial Optimization : Abstract: Heatmap-based solvers have emerged as a promising paradigm for Combinatorial Optimization (CO). However, we argue that the dominant Supervised Learning (SL) training paradigm suffers from a ...
- Distribution-Free Robust Functional Predict-Then-Optimize : Abstract: The solution of PDEs in decision-making tasks is increasingly being undertaken with the help of neural operator surrogate models due to the need for repeated evaluation. Such methods, while ...
- Thermodynamic Isomorphism of Transformers: A Lagrangian Approach to Attention Dynamics : Abstract: Although the Transformer architecture has revolutionized artificial intelligence, its underlying mechanisms remain largely heuristic and lack a unified physical theory. In this work, we prop...
- SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning : Abstract: Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods pri...
- Constraint-Aware Generative Auto-bidding via Pareto-Prioritized Regret Optimization : Abstract: Auto-bidding systems aim to maximize marketing value while satisfying strict efficiency constraints such as Target Cost-Per-Action (CPA). Although Decision Transformers provide powerful sequ...
- TextResNet: Decoupling and Routing Optimization Signals in Compound AI Systems via Deep Residual Tuning : Abstract: Textual Gradient-style optimizers (TextGrad) enable gradient-like feedback propagation through compound AI systems. However, they do not work well for deep chains. The root cause of this lim...
- Interaction-Grounded Learning for Contextual Markov Decision Processes with Personalized Feedback : Abstract: In this paper, we study Interaction-Grounded Learning (IGL) [Xie et al., 2021], a paradigm designed for realistic scenarios where the learner receives indirect feedback generated by an unkno...
- Fast Flow Matching based Conditional Independence Tests for Causal Discovery : Abstract: Constraint-based causal discovery methods require a large number of conditional independence (CI) tests, which severely limits their practical applicability due to high computational complex...
- Towards Efficient Large Language Reasoning Models via Extreme-Ratio Chain-of-Thought Compression : Abstract: Chain-of-Thought (CoT) reasoning successfully enhances the reasoning capabilities of Large Language Models (LLMs), yet it incurs substantial computational overhead for inference. Existing Co...
- All ERMs Can Fail in Stochastic Convex Optimization Lower Bounds in Linear Dimension : Abstract: We study the sample complexity of the best-case Empirical Risk Minimizer in the setting of stochastic convex optimization. We show that there exists an instance in which the sample size is l...
- Dynamic Regret via Discounted-to-Dynamic Reduction with Applications to Curved Losses and Adam Optimizer : Abstract: We study dynamic regret minimization in non-stationary online learning, with a primary focus on follow-the-regularized-leader (FTRL) methods. FTRL is important for curved losses and for unde...
- OJBKQ: Objective-Joint Babai-Klein Quantization : Abstract: Post-training quantization (PTQ) is widely used to compress large language models without retraining. However, many existing weight-only methods rely on heuristic objectives and greedy round...
- Modalities, a PyTorch-native Framework For Large-scale LLM Training and Research : Abstract: Today's LLM (pre-) training and research workflows typically allocate a significant amount of compute to large-scale ablation studies. Despite the substantial compute costs of these ablation...
- Drop the mask! GAMM-A Taxonomy for Graph Attributes Missing Mechanisms : Abstract: Exploring missing data in attributed graphs introduces unique challenges beyond those found in tabular datasets. In this work, we extend the taxonomy for missing data mechanisms to attribute...
- Radial M\"untz-Sz\'asz Networks: Neural Architectures with Learnable Power Bases for Multidimensional Singularities : Abstract: Radial singular fields, such as $1/r$, $\log r$, and crack-tip profiles, are difficult to model for coordinate-separable neural architectures. We show that any $C^2$ function that is both ra...
- The Connection between Kriging and Large Neural Networks : Abstract: AI has impacted many disciplines and is nowadays ubiquitous. In particular, spatial statistics is in a pivotal moment where it will increasingly intertwine with AI. In this scenario, a relev...
- USBD: Universal Structural Basis Distillation for Source-Free Graph Domain Adaptation : Abstract: SF-GDA is pivotal for privacy-preserving knowledge transfer across graph datasets. Although recent works incorporate structural information, they implicitly condition adaptation on the smoot...
- RIFLE: Robust Distillation-based FL for Deep Model Deployment on Resource-Constrained IoT Networks : Abstract: Federated learning (FL) is a decentralized learning paradigm widely adopted in resource-constrained Internet of Things (IoT) environments. These devices, typically relying on TinyML models, ...
- Estimating Aleatoric Uncertainty in the Causal Treatment Effect : Abstract: Previous work on causal inference has primarily focused on averages and conditional averages of treatment effects, with significantly less attention on variability and uncertainty in individ...
- Low Rank Transformer for Multivariate Time Series Anomaly Detection and Localization : Abstract: Multivariate time series (MTS) anomaly diagnosis, which encompasses both anomaly detection and localization, is critical for the safety and reliability of complex, large-scale real-world sys...
- Learning Credal Ensembles via Distributionally Robust Optimization : Abstract: Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncer...
- Time-Delayed Transformers for Data-Driven Modeling of Low-Dimensional Dynamics : Abstract: We propose the time-delayed transformer (TD-TF), a simplified transformer architecture for data-driven modeling of unsteady spatio-temporal dynamics. TD-TF bridges linear operator-based meth...
- Beyond Correctness: Learning Robust Reasoning via Transfer : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has recently strengthened LLM reasoning, but its focus on final answer correctness leaves a critical gap: it does not ensure the robustn...
- Is Meta-Path Attention an Explanation? Evidence of Alignment and Decoupling in Heterogeneous GNNs : Abstract: Meta-path-based heterogeneous graph neural networks aggregate over meta-path-induced views, and their semantic-level attention over meta-path channels is widely used as a narrative for ``whi...
- Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering : Abstract: Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data. Despite its ...
- Causal Schr\"odinger Bridges: Constrained Optimal Transport on Structural Manifolds : Abstract: Generative modeling typically seeks the path of least action via deterministic flows (ODE). While effective for in-distribution tasks, we argue that these deterministic paths become brittle ...
- Rho-Perfect: Correlation Ceiling For Subjective Evaluation Datasets : Abstract: Subjective ratings contain inherent noise that limits the model-human correlation, but this reliability issue is rarely quantified. In this paper, we present $ρ$-Perfect, a practical estimat...
- M-Loss: Quantifying Model Merging Compatibility with Limited Unlabeled Data : Abstract: Training of large-scale models is both computationally intensive and often constrained by the availability of labeled data. Model merging offers a compelling alternative by directly integrat...
- An arithmetic method algorithm optimizing k-nearest neighbors compared to regression algorithms and evaluated on real world data sources : Abstract: Linear regression analysis focuses on predicting a numeric regressand value based on certain regressor values. In this context, k-Nearest Neighbors (k-NN) is a common non-parametric regressi...
- Modeling Score Approximation Errors in Diffusion Models via Forward SPDEs : Abstract: This study investigates the dynamics of Score-based Generative Models (SGMs) by treating the score estimation error as a stochastic source driving the Fokker-Planck equation. Departing from ...
- Conditional Sequence Modeling for Safe Reinforcement Learning : Abstract: Offline safe reinforcement learning (RL) aims to learn policies from a fixed dataset while maximizing performance under cumulative cost constraints. In practice, deployment requirements ofte...
- FairRARI: A Plug and Play Framework for Fairness-Aware PageRank : Abstract: PageRank (PR) is a fundamental algorithm in graph machine learning tasks. Owing to the increasing importance of algorithmic fairness, we consider the problem of computing PR vectors subject ...
- SDFed: Bridging Local Global Discrepancy via Subspace Refinement and Divergence Control in Federated Prompt Learning : Abstract: Vision-language pretrained models offer strong transferable representations, yet adapting them in privacy-sensitive multi-party settings is challenging due to the high communication cost of ...
- TFMLinker: Universal Link Predictor by Graph In-Context Learning with Tabular Foundation Models : Abstract: Link prediction is a fundamental task in graph machine learning with widespread applications such as recommendation systems, drug discovery, knowledge graphs, etc. In the foundation model er...
- ERIS: Enhancing Privacy and Communication Efficiency in Serverless Federated Learning : Abstract: Scaling federated learning (FL) to billion-parameter models introduces critical trade-offs between communication efficiency, model accuracy, and privacy guarantees. Existing solutions often ...
- Projected Gradient Ascent for Efficient Reward-Guided Updates with One-Step Generative Models : Abstract: We propose a constrained latent optimization method for reward-guided generation that preserves white Gaussian noise characteristics with negligible overhead. Test-time latent optimization c...
- From Robotics to Sepsis Treatment: Offline RL via Geometric Pessimism : Abstract: Offline Reinforcement Learning (RL) promises the recovery of optimal policies from static datasets, yet it remains susceptible to the overestimation of out-of-distribution (OOD) actions, par...
- Two-Stage Data Synthesization: A Statistics-Driven Restricted Trade-off between Privacy and Prediction : Abstract: Synthetic data have gained increasing attention across various domains, with a growing emphasis on their performance in downstream prediction tasks. However, most existing synthesis strategi...
- Dashed Line Defense: Plug-And-Play Defense Against Adaptive Score-Based Query Attacks : Abstract: Score-based query attacks pose a serious threat to deep learning models by crafting adversarial examples (AEs) using only black-box access to model output scores, iteratively optimizing inpu...
- The Theory and Practice of MAP Inference over Non-Convex Constraints : Abstract: In many safety-critical settings, probabilistic ML systems have to make predictions subject to algebraic constraints, e.g., predicting the most likely trajectory that does not cross obstacle...
- Learning To Sample From Diffusion Models Via Inverse Reinforcement Learning : Abstract: Diffusion models generate samples through an iterative denoising process, guided by a neural network. While training the denoiser on real-world data is computationally demanding, the samplin...
- SoK: The Pitfalls of Deep Reinforcement Learning for Cybersecurity : Abstract: Deep Reinforcement Learning (DRL) has achieved remarkable success in domains requiring sequential decision-making, motivating its application to cybersecurity problems. However, transitionin...
- Reasoning aligns language models to human cognition : Abstract: Do language models make decisions under uncertainty like humans do, and what role does chain-of-thought (CoT) reasoning play in the underlying decision process? We introduce an active probab...
- Trapped by simplicity: When Transformers fail to learn from noisy features : Abstract: Noise is ubiquitous in data used to train large language models, but it is not well understood whether these models are able to correctly generalize to inputs generated without noise. Here, ...
- Data Reconstruction: Identifiability and Optimization with Sample Splitting : Abstract: Training data reconstruction from KKT conditions has shown striking empirical success, yet it remains unclear when the resulting KKT equations have unique solutions and, even in identifiable...
- Foundation Inference Models for Ordinary Differential Equations : Abstract: Ordinary differential equations (ODEs) are central to scientific modelling, but inferring their vector fields from noisy trajectories remains challenging. Current approaches such as symbolic...
- Central Dogma Transformer II: An AI Microscope for Understanding Cellular Regulatory Mechanisms : Abstract: Current biological AI models lack interpretability -- their internal representations do not correspond to biological relationships that researchers can examine. Here we present CDT-II, an ...
- Redundancy-Free View Alignment for Multimodal Human Activity Recognition with Arbitrarily Missing Views : Abstract: Multimodal multiview learning seeks to integrate information from diverse sources to enhance task performance. Existing approaches often struggle with flexible view configurations, including...
- HoGS: Homophily-Oriented Graph Synthesis for Local Differentially Private GNN Training : Abstract: Graph neural networks (GNNs) have demonstrated remarkable performance in various graph-based machine learning tasks by effectively modeling high-order interactions between nodes. However, tr...
- A Graphop Analysis of Graph Neural Networks on Sparse Graphs: Generalization and Universal Approximation : Abstract: Generalization and approximation capabilities of message passing graph neural networks (MPNNs) are often studied by defining a compact metric on a space of input graphs under which MPNNs are...
- How2Everything: Mining the Web for How-To Procedures to Evaluate and Improve LLMs : Abstract: Generating step-by-step "how-to" procedures is a key LLM capability: how-to advice is commonly requested in chatbots, and step-by-step planning is critical for reasoning over complex tasks. ...
- Efficient Deep Learning for Biometrics: Overview, Challenges and Trends in Ear of Frugal AI : Abstract: Recent advances in deep learning, whether on discriminative or generative tasks have been beneficial for various applications, among which security and defense. However, their increasing com...
- Unified Biomolecular Trajectory Generation via Pretrained Variational Bridge : Abstract: Molecular Dynamics (MD) simulations provide a fundamental tool for characterizing molecular behavior at full atomic resolution, but their applicability is severely constrained by the computa...
- Beyond Arrow: From Impossibility to Possibilities in Multi-Criteria Benchmarking : Abstract: Modern benchmarks such as HELM MMLU account for multiple metrics like accuracy, robustness and efficiency. When trying to turn these metrics into a single ranking, natural aggregation proced...
- Rational Transductors : Abstract: Standard Transformers excel at semantic modeling but struggle with rigid sequential logic and state tracking. Theoretical work establishes that self-attention is limited to $\AC^0$ (unde...
- Object-Oriented Transition Modeling with Inductive Logic Programming : Abstract: Building models of the world from observation, i.e., induction, is one of the major challenges in machine learning. In order to be useful, models need to maintain accuracy when used in novel...
- Escaping Spectral Bias without Backpropagation: Fast Implicit Neural Representations with Extreme Learning Machines : Abstract: Training implicit neural representations (INRs) to capture fine-scale details typically relies on iterative backpropagation and is often hindered by spectral bias when the target exhibits hi...
- Dense Neural Networks are not Universal Approximators : Abstract: We investigate the approximation capabilities of dense neural networks. While universal approximation theorems establish that sufficiently large architectures can approximate arbitrary conti...
- TASTE: Task-Aware Out-of-Distribution Detection via Stein Operators : Abstract: Out-of-distribution detection methods are often either data-centric, detecting deviations from the training input distribution irrespective of their effect on a trained model, or model-centr...
- Federated Learning with Profile Mapping under Distribution Shifts and Drifts : Abstract: Federated Learning (FL) enables decentralized model training across clients without sharing raw data, but its performance degrades under real-world data heterogeneity. Existing methods often...
- ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets : Abstract: Machine learning models now influence decisions that directly affect people's lives, making it important to understand not only their predictions, but also how individuals could act to obtai...
- Dense Feature Learning via Linear Structure Preservation in Medical Data : Abstract: Deep learning models for medical data are typically trained using task specific objectives that encourage representations to collapse onto a small number of discriminative directions. While ...
- Quantifying Explanation Quality in Graph Neural Networks using Out-of-Distribution Generalization : Abstract: Evaluating the quality of post-hoc explanations for Graph Neural Networks (GNNs) remains a significant challenge. While recent years have seen an increasing development of explainability met...
- Towards Robust Scaling Laws for Optimizers : Abstract: The quality of Large Language Model (LLM) pretraining depends on multiple factors, including the compute budget and the choice of optimization algorithm. Empirical scaling laws are widely us...
- Analyzing and Guiding Zero-Shot Posterior Sampling in Diffusion Models : Abstract: Recovering a signal from its degraded measurements is a long standing challenge in science and engineering. Recently, zero-shot diffusion based methods have been proposed for such inverse pr...
- Efficient Planning in Reinforcement Learning via Model Introspection : Abstract: Reinforcement learning and classical planning are typically seen as two distinct problems, with differing formulations necessitating different solutions. Yet, when humans are given a task, r...
- ParisKV: Fast and Drift-Robust KV-Cache Retrieval for Long-Context LLMs : Abstract: KV-cache retrieval is essential for long-context LLM inference, yet existing methods struggle with distribution drift and high latency at scale. We introduce ParisKV, a drift-robust, GPU-nat...
- Efficient Adaptive Data Analysis over Dense Distributions : Abstract: Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid stati...
- TerraBind: Fast and Accurate Binding Affinity Prediction through Coarse Structural Representations : Abstract: We present TerraBind, a foundation model for protein-ligand structure and binding affinity prediction that achieves 26-fold faster inference than state-of-the-art methods while improving aff...
- Riemannian MeanFlow : Abstract: Diffusion and flow models have become the dominant paradigm for generative modeling on Riemannian manifolds, with successful applications in protein backbone generation and DNA sequence desi...
- MaD-Mix: Multi-Modal Data Mixtures via Latent Space Coupling for Vision-Language Model Training : Abstract: Vision-Language Models (VLMs) are typically trained on a diverse set of multi-modal domains, yet current practices rely on costly manual tuning. We propose MaD-Mix, a principled and computat...
- Approximating Matrix Functions with Deep Neural Networks and Transformers : Abstract: Transformers have revolutionized natural language processing, but their use for numerical computation has received less attention. We study the approximation of matrix functions, which map s...
- Interpretable Analytic Calabi-Yau Metrics via Symbolic Distillation : Abstract: Calabi--Yau manifolds are essential for string theory but require computing intractable metrics. Here we show that symbolic regression can distill neural approximations into simple, interpre...
- MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation : Abstract: While the complex reasoning capability of Large Language Models (LLMs) has attracted significant attention, single-agent systems often encounter inherent performance ceilings in complex task...
- Dynamic Load Model for Data Centers with Pattern-Consistent Calibration : Abstract: The rapid growth of data centers has made large electronic load (LEL) modeling increasingly important for power system analysis. Such loads are characterized by fast workload-driven variabil...
- Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion : Abstract: Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not gene...
- Efficient Anti-exploration via VQVAE and Fuzzy Clustering in Offline Reinforcement Learning : Abstract: Pseudo-count is an effective anti-exploration method in offline reinforcement learning (RL) by counting state-action pairs and imposing a large penalty on rare or unseen state-action pair da...
- Safety Alignment as Continual Learning: Mitigating the Alignment Tax via Orthogonal Gradient Projection : Abstract: Large Language Models (LLMs) often incur an alignment tax: safety post-training can reduce general utility (e.g., reasoning and coding). We argue that this tax primarily arises from continua...
- Systematic Performance Assessment of Deep Material Networks for Multiscale Material Modeling : Abstract: Deep Material Networks (DMNs) are structure-preserving, mechanistic machine learning models that embed micromechanical principles into their architectures, enabling strong extrapolation capa...
- Risk-Sensitive Exponential Actor Critic : Abstract: Model-free deep reinforcement learning (RL) algorithms have achieved tremendous success on a range of challenging tasks. However, safety concerns remain when these methods are deployed on re...
- Online Learning for Uninformed Markov Games: Empirical Nash-Value Regret and Non-Stationarity Adaptation : Abstract: We study online learning in two-player uninformed Markov games, where the opponent's actions and policies are unobserved. In this setting, Tian et al. (2021) show that achieving no-external-...
- Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used : Abstract: Large Language Models (LLMs) often falter in complex reasoning tasks due to their static, parametric knowledge, leading to hallucinations and poor performance in specialized domains like mat...
- Probing Neural TSP Representations for Prescriptive Decision Support : Abstract: The field of neural combinatorial optimization (NCO) trains neural policies to solve NP-hard problems such as the traveling salesperson problem (TSP). We ask whether, beyond producing good t...
- SpecAttn: Co-Designing Sparse Attention with Self-Speculative Decoding : Abstract: Long-context large language model (LLM) inference has become the norm for today's AI applications. However, it is severely bottlenecked by the increasing memory demands of its KV cache. Prev...
- Fault-Tolerant Evaluation for Sample-Efficient Model Performance Estimators : Abstract: In the era of Model-as-a-Service, organizations increasingly rely on third-party AI models for rapid deployment. However, the dynamic nature of emerging AI applications, the continual introd...
- Cerebellar-Inspired Residual Control for Fault Recovery: From Inference-Time Adaptation to Structural Consolidation : Abstract: Robotic policies deployed in real-world environments often encounter post-training faults, where retraining, exploration, or system identification are impractical. We introduce an inference-...
- Robust Ultra-High-Dimensional Variable Selection With Correlated Structure Using Group Testing : Abstract: Background: High-dimensional genomic data exhibit strong group correlation structures that challenge conventional feature selection methods, which often assume feature independence or rely o...
- tLoRA: Efficient Multi-LoRA Training with Elastic Shared Super-Models : Abstract: As Low-Rank Adaptation (LoRA) becomes the standard approach for efficiently fine-tuning large language models (LLMs), shared clusters increasingly execute many concurrent LoRA training jobs ...
- Hybrid Feedback-Guided Optimal Learning for Wireless Interactive Panoramic Scene Delivery : Abstract: Immersive applications such as virtual and augmented reality impose stringent requirements on frame rate, latency, and synchronization between physical and virtual environments. To meet thes...
- VertCoHiRF: Decentralized Vertical Clustering Beyond k-means : Abstract: Vertical Federated Learning (VFL) enables collaborative analysis across parties holding complementary feature views of the same samples, yet existing approaches are largely restricted to dis...
- Fair Decisions from Calibrated Scores: Achieving Optimal Classification While Satisfying Sufficiency : Abstract: Binary classification based on predicted probabilities (scores) is a fundamental task in supervised machine learning. While thresholding scores is Bayes-optimal in the unconstrained setting,...
- Incorruptible Neural Networks: Training Models that can Generalize to Large Internal Perturbations : Abstract: Flat regions of the neural network loss landscape have long been hypothesized to correlate with better generalization properties. A closely related but distinct problem is training models th...
- Revisiting Robustness for LLM Safety Alignment via Selective Geometry Control : Abstract: Safety alignment of large language models remains brittle under domain shift and noisy preference supervision. Most existing robust alignment methods focus on uncertainty in alignment data, ...
- Scalable Dexterous Robot Learning with AR-based Remote Human-Robot Interactions : Abstract: This paper focuses on the scalable robot learning for manipulation in the dexterous robot arm-hand systems, where the remote human-robot interactions via augmented reality (AR) are establish...
- Controllable Value Alignment in Large Language Models through Neuron-Level Editing : Abstract: Aligning large language models (LLMs) with human values has become increasingly important as their influence on human behavior and decision-making expands. However, existing steering-based a...
- UTOPIA: Unlearnable Tabular Data via Decoupled Shortcut Embedding : Abstract: Unlearnable examples (UE) have emerged as a practical mechanism to prevent unauthorized model training on private vision data, while extending this protection to tabular data is nontrivial. ...
- FEM-Informed Hypergraph Neural Networks for Efficient Elastoplasticity : Abstract: Graph neural networks (GNNs) naturally align with sparse operators and unstructured discretizations, making them a promising paradigm for physics-informed machine learning in computational m...
- Privately Learning Decision Lists and a Differentially Private Winnow : Abstract: We give new differentially private algorithms for the classic problems of learning decision lists and large-margin halfspaces in the PAC and online models. In the PAC model, we give a comput...
- Dichotomy of Feature Learning and Unlearning: Fast-Slow Analysis on Neural Networks with Stochastic Gradient Descent : Abstract: The dynamics of gradient-based training in neural networks often exhibit nontrivial structures; hence, understanding them remains a central challenge in theoretical machine learning. In part...
- BitLogic: Training Framework for Gradient-Based FPGA-Native Neural Networks : Abstract: The energy and latency costs of deep neural network inference are increasingly driven by deployment rather than training, motivating hardware-specialized alternatives to arithmetic-heavy mod...
- Nonparametric Bayesian Optimization for General Rewards : Abstract: This work focuses on Bayesian optimization (BO) under reward model uncertainty. We propose the first BO algorithm that achieves no-regret guarantee in a general reward setting, requiring onl...
- Achieving Optimal Static and Dynamic Regret Simultaneously in Bandits with Deterministic Losses : Abstract: In adversarial multi-armed bandits, two performance measures are commonly used: static regret, which compares the learner to the best fixed arm, and dynamic regret, which compares it to the ...
- Sign-Based Optimizers Are Effective Under Heavy-Tailed Noise : Abstract: While adaptive gradient methods are the workhorse of modern machine learning, sign-based optimization algorithms such as Lion and Muon have recently demonstrated superior empirical performan...
- Active Learning Using Aggregated Acquisition Functions: Accuracy and Sustainability Analysis : Abstract: Active learning (AL) is a machine learning (ML) approach that strategically selects the most informative samples for annotation during training, aiming to minimize annotation costs. This str...
- Data-Aware and Scalable Sensitivity Analysis for Decision Tree Ensembles : Abstract: Decision tree ensembles are widely used in critical domains, making robustness and sensitivity analysis essential to their trustworthiness. We study the feature sensitivity problem, which as...
- On the Importance of a Multi-Scale Calibration for Quantization : Abstract: Post-training quantization (PTQ) is a cornerstone for efficiently deploying large language models (LLMs), where a small calibration set critically affects quantization performance. However, ...
- Bandit Allocational Instability : Abstract: When multi-armed bandit (MAB) algorithms allocate pulls among competing arms, the resulting allocation can exhibit huge variation. This is particularly harmful in modern applications such as...
- Bipartite Graph Attention-based Clustering for Large-scale scRNA-seq Data : Abstract: scRNA-seq clustering is a critical task for analyzing single-cell RNA sequencing (scRNA-seq) data, as it groups cells with similar gene expression profiles. Transformers, as powerful foundat...
- AI-Driven Predictive Modelling for Groundwater Salinization in Israel : Abstract: Increasing salinity and contamination of groundwater is a serious issue in many parts of the world, causing degradation of water resources. The aim of this work is to form a comprehensive un...
- ODELoRA: Training Low-Rank Adaptation by Solving Ordinary Differential Equations : Abstract: Low-rank adaptation (LoRA) has emerged as a widely adopted parameter-efficient fine-tuning method in deep transfer learning, due to its reduced number of trainable parameters and lower memor...
- Hyperparameter Transfer Laws for Non-Recurrent Multi-Path Neural Networks : Abstract: Deeper modern architectures are costly to train, making hyperparameter transfer preferable to expensive repeated tuning. Maximal Update Parametrization ($μ$P) helps explain why many hyperpar...
- CoMI-IRL: Contrastive Multi-Intention Inverse Reinforcement Learning : Abstract: Inverse Reinforcement Learning (IRL) seeks to infer reward functions from expert demonstrations. When demonstrations originate from multiple experts with different intentions, the problem is...
- PALMS: Pavlovian Associative Learning Models Simulator : Abstract: Simulations are an indispensable step in the cycle of theory development and refinement, helping researchers formulate precise definitions, generate models, and make accurate predictions. Th...
- Pareto-guided Pipeline for Distilling Featherweight AI Agents in Mobile MOBA Games : Abstract: Recent advances in game AI have demonstrated the feasibility of training agents that surpass top-tier human professionals in complex environments such as Honor of Kings (HoK), a leading mobi...
- MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution : Abstract: Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inhe...
- Compact Conformal Subgraphs : Abstract: Conformal prediction provides rigorous, distribution-free uncertainty guarantees, but often yields prohibitively large prediction sets in structured domains such as routing, planning, or seq...
- Enhancing Time Series Classification with Diversity-Driven Neural Network Ensembles : Abstract: Ensemble methods have played a crucial role in achieving state-of-the-art (SOTA) performance across various machine learning tasks by leveraging the diversity of features learned by individu...
- Can NeRFs See without Cameras? : Abstract: Neural Radiance Fields (NeRFs) have been remarkably successful at synthesizing novel views of 3D scenes by optimizing a volumetric scene function. This scene function models how optical rays...
- MedVAL: Toward Expert-Level Medical Text Validation with Language Models : Abstract: With the growing use of language models (LMs) in clinical environments, there is an immediate need to evaluate the accuracy and safety of LM-generated medical text. Currently, such evaluatio...
- SpecPrune-VLA: Accelerating Vision-Language-Action Models via Action-Aware Self-Speculative Pruning : Abstract: Pruning is a typical acceleration technique for compute-bound models by removing computation on unimportant values. Recently, it has been applied to accelerate Vision-Language-Action (VLA) m...
- Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces : Abstract: We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperp...
- Solver-in-the-Loop: MDP-Based Benchmarks for Self-Correction and Behavioral Rationality in Operations Research : Abstract: Operations Research practitioners routinely debug infeasible models through an iterative process: analyzing Irreducible Infeasible Subsystems (\IIS{}), identifying constraint conflicts, and ...
- SAGE: Sequence-level Adaptive Gradient Evolution for Generative Recommendation : Abstract: While works such as OneRec have validated the scaling laws of Large Language Models (LLMs) in recommender systems, they rely on a cumbersome separate vocabulary. This dependency prevents the...
- Task-free Adaptive Meta Black-box Optimization : Abstract: Handcrafted optimizers become prohibitively inefficient for complex black-box optimization (BBO) tasks. MetaBBO addresses this challenge by meta-learning to automatically configure optimizer...
- Vidmento: Creating Video Stories Through Context-Aware Expansion With Generative Video : Abstract: Video storytelling is often constrained by available material, limiting creative expression and leaving undesired narrative gaps. Generative video offers a new way to address these limitatio...
- Learning to Select: Query-Aware Adaptive Dimension Selection for Dense Retrieval : Abstract: Dense retrieval represents queries and documents as high-dimensional embeddings, but these representations can be redundant at the query level: for a given information need, only a subset of...
- Attractor Patch Networks: Reducing Catastrophic Forgetting with Routed Low-Rank Patch Experts : Abstract: Transformers achieve strong language modeling accuracy, yet their position-wise feed-forward networks (FFNs) are dense, globally shared, and typically updated end to end. These properties cr...
- Neural Sabermetrics with World Model: Play-by-play Predictive Modeling with Large Language Model : Abstract: Classical sabermetrics has profoundly shaped baseball analytics by summarizing long histories of play into compact statistics. While these metrics are invaluable for valuation and retrospect...
- TransConv-DDPM: Enhanced Diffusion Model for Generating Time-Series Data in Healthcare : Abstract: The lack of real-world data in clinical fields poses a major obstacle in training effective AI models for diagnostic and preventive tools in medicine. Generative AI has shown promise in incr...
- AVERE: Improving Audiovisual Emotion Reasoning with Preference Optimization : Abstract: Emotion understanding is essential for building socially intelligent agents. Although recent multimodal large language models have shown strong performance on this task, two key challenges r...
- Hybrid Dual-Path Linear Transformations for Efficient Transformer Architectures : Abstract: Standard Transformer architectures rely heavily on dense linear transformations, treating feature projection as a monolithic, full-rank operation. We argue that this formulation is inefficie...
- Attention-Driven Framework for Non-Rigid Medical Image Registration : Abstract: Deformable medical image registration is a fundamental task in medical image analysis with applications in disease diagnosis, treatment planning, and image-guided interventions. Despite sign...
- Finding Connections: Membership Inference Attacks for the Multi-Table Synthetic Data Setting : Abstract: Synthetic tabular data has gained attention for enabling privacy-preserving data sharing. While substantial progress has been made in single-table synthetic generation where data are modeled...
- Featured Reproducing Kernel Banach Spaces for Learning and Neural Networks : Abstract: Reproducing kernel Hilbert spaces provide a foundational framework for kernel-based learning, where regularization and interpolation problems admit finite-dimensional solutions through class...
- Convex Dominance in Deep Learning I: A Scaling Law of Loss and Learning Rate : Abstract: Deep learning has non-convex loss landscape and its optimization dynamics is hard to analyze or control. Nevertheless, the dynamics can be empirically convex-like across various tasks, model...
- Learning Nonlinear Systems In-Context: From Synthetic Data to Real-World Motor Control : Abstract: LLMs have shown strong in-context learning (ICL) abilities, but have not yet been extended to signal processing systems. Inspired by their design, we have proposed for the first time ICL usi...
- Latent Target Score Matching, with an application to Simulation-Based Inference : Abstract: Denoising score matching (DSM) for training diffusion models may suffer from high variance at low noise levels. Target Score Matching (TSM) mitigates this when clean data scores are availabl...
- Improving Detection of Rare Nodes in Hierarchical Multi-Label Learning : Abstract: In hierarchical multi-label classification, a persistent challenge is enabling model predictions to reach deeper levels of the hierarchy for more detailed or fine-grained classifications. Th...
- From Obstacles to Etiquette: Robot Social Navigation with VLM-Informed Path Selection : Abstract: Navigating socially in human environments requires more than satisfying geometric constraints, as collision-free paths may still interfere with ongoing activities or conflict with social nor...
- ARO: A New Lens On Matrix Optimization For Large Models : Abstract: Matrix-based optimizers have attracted growing interest for improving LLM training efficiency, with significant progress centered on orthogonalization/whitening based methods. While yielding...
- ANCRe: Adaptive Neural Connection Reassignment for Efficient Depth Scaling : Abstract: Scaling network depth has been a central driver behind the success of modern foundation models, yet recent investigations suggest that deep layers are often underutilized. This paper revisit...
- Next-Gen CAPTCHAs: Leveraging the Cognitive Gap for Scalable and Diverse GUI-Agent Defense : Abstract: The rapid evolution of GUI-enabled agents has rendered traditional CAPTCHAs obsolete. While previous benchmarks like OpenCaptchaWorld established a baseline for evaluating multimodal agents,...
- ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation : Abstract: Diffusion models have achieved remarkable generation quality, but they suffer from significant inference cost due to their reliance on multiple sequential denoising steps, motivating recent ...
- CIC-Trap4Phish: A Unified Multi-Format Dataset for Phishing and Quishing Attachment Detection : Abstract: Phishing attacks represents one of the primary attack methods which is used by cyber attackers. In many cases, attackers use deceptive emails along with malicious attachments to trick users ...
- Robustness Is a Function, Not a Number: A Factorized Comprehensive Study of OOD Robustness in Vision-Based Driving : Abstract: Out of distribution (OOD) robustness in autonomous driving is often reduced to a single number, hiding what breaks a policy. We decompose environments along five axes: scene (rural/urban), s...
- VirtualEnv: A Platform for Embodied AI Research : Abstract: As large language models (LLMs) continue to improve in reasoning and decision-making, there is a growing need for realistic and interactive environments where their abilities can be rigorous...
- Sycophantic Anchors: Localizing and Quantifying User Agreement in Reasoning Models : Abstract: Reasoning models frequently agree with incorrect user suggestions -- a behavior known as sycophancy. However, it is unclear where in the reasoning trace this agreement originates and how str...
- TIDE: Tuning-Integrated Dynamic Evolution for LLM-Based Automated Heuristic Design : Abstract: Although Large Language Models have advanced Automated Heuristic Design, treating algorithm evolution as a monolithic text generation task overlooks the coupling between discrete algorithmic...
- Zero-Shot Statistical Downscaling via Diffusion Posterior Sampling : Abstract: Conventional supervised climate downscaling struggles to generalize to Global Climate Models (GCMs) due to the lack of paired training data and inherent domain gaps relative to reanalysis. M...
- Beyond Quantity: Trajectory Diversity Scaling for Code Agents : Abstract: As code large language models (LLMs) evolve into tool-interactive agents via the Model Context Protocol (MCP), their generalization is increasingly limited by low-quality synthetic data and ...
- Playing 20 Question Game with Policy-Based Reinforcement Learning : Abstract: The 20 Questions (Q20) game is a well known game which encourages deductive reasoning and creativity. In the game, the answerer first thinks of an object such as a famous person or a kind of...
- YaRN: Efficient Context Window Extension of Large Language Models : Abstract: Rotary Position Embeddings (RoPE) have been shown to effectively encode positional information in transformer-based language models. However, these models fail to generalize past the sequenc...
- Cognitive Edge Device (CED) for Real-Time Environmental Monitoring in Aquatic Ecosystems : Abstract: Invasive signal crayfish have a detrimental impact on ecosystems. They spread the fungal-type crayfish plague disease (Aphanomyces astaci) that is lethal to the native white clawed crayfish,...
- Delay-Aware Reinforcement Learning for Highway On-Ramp Merging under Stochastic Communication Latency : Abstract: Delayed and partially observable state information poses significant challenges for reinforcement learning (RL)-based control in real-world autonomous driving. In highway on-ramp merging, a ...
- Towards Transparent and Efficient Anomaly Detection in Industrial Processes through ExIFFI : Abstract: Anomaly Detection (AD) is crucial in industrial settings to streamline operations by detecting underlying issues. Conventional methods merely label observations as normal or anomalous, lacki...
- Optimizing Automated Picking Systems in Warehouse Robots Using Machine Learning : Abstract: With the rapid growth of global e-commerce, the demand for automation in the logistics industry is increasing. This study focuses on automated picking systems in warehouses, utilizing deep l...
- ComfyBench: Benchmarking LLM-based Agents in ComfyUI for Autonomously Designing Collaborative AI Systems : Abstract: Much previous AI research has focused on developing monolithic models to maximize their intelligence, with the primary goal of enhancing performance on specific tasks. In contrast, this work...
- RARe: Retrieval Augmented Retrieval with In-Context Examples : Abstract: While in-context learning is well-studied with decoder-only language models (LLMs), its utility for encoder-only models remains underexplored. We study in-context learning for encoder-only m...
- Software Performance Engineering for Foundation Model-Powered Software : Abstract: The rise of Foundation Models (FMs) like Large Language Models (LLMs) is revolutionizing software development. Despite the impressive prototypes, transforming FMware into production-ready pr...
- DeMo: Decoupled Momentum Optimization : Abstract: Scaling neural network training increasingly depends on synchronous data-parallelism, yet full-precision gradient all-reduce imposes a severe communication bottleneck. We propose Decoupled M...
- AI-Powered Intracranial Hemorrhage Detection: A Co-Scale Convolutional Attention Model with Uncertainty-Based Fuzzy Integral Operator and Feature Screening : Abstract: Intracranial hemorrhage (ICH) refers to the leakage or accumulation of blood within the skull, which occurs due to the rupture of blood vessels in or around the brain. If this condition is n...
- Automatic Generation of Polynomial Symmetry Breaking Constraints : Abstract: Symmetry in integer programming causes redundant search and is often handled with symmetry breaking constraints that remove as many equivalent solutions as possible. We propose an algebraic ...
- Grokking in Linear Models for Logistic Regression : Abstract: Grokking, the phenomenon of delayed generalization, is often attributed to the depth and compositional structure of deep neural networks. We study grokking in one of the simplest possible se...
- SWE Context Bench: A Benchmark for Context Learning in Coding : Abstract: Large language models are increasingly used as programming agents for repository level software engineering tasks. While recent benchmarks evaluate correctness in realistic codebases, they l...
- Near-Oracle KV Selection via Pre-hoc Sparsity for Long-Context Inference : Abstract: A core bottleneck in large language model (LLM) inference is the cost of attending over the ever-growing key-value (KV) cache. Although near-oracle top-k KV selection can preserve the qualit...
- Latent Reasoning with Supervised Thinking States : Abstract: Reasoning with a chain-of-thought (CoT) enables Large Language Models (LLMs) to solve complex tasks but incurs significant inference costs due to the generation of long rationales. We propos...
- Regime Change Hypothesis: Foundations for Decoupled Dynamics in Neural Network Training : Abstract: Despite the empirical success of DNN, their internal training dynamics remain difficult to characterize. In ReLU-based models, the activation pattern induced by a given input determines the ...
- UrbanGraphEmbeddings: Learning and Evaluating Spatially Grounded Multimodal Embeddings for Urban Science : Abstract: Learning transferable multimodal embeddings for urban environments is challenging because urban understanding is inherently spatial, yet existing datasets and benchmarks lack explicit alignm...
- ManifoldKV: Training-Free KV Cache Compression via Euclidean Outlier Detection : Abstract: Long-context inference is constrained by KV-cache memory, which grows linearly with sequence length; KV-cache compression therefore hinges on reliably selecting which past tokens to retain. ...
- The Chicken and Egg Dilemma: Co-optimizing Data and Model Configurations for LLMs : Abstract: Co-optimizing data and model configurations for training LLMs presents a classic chicken-and-egg dilemma: The best training data configuration (e.g., data mixture) for a downstream task depe...
- Roadmap to Quantum Aesthetics : Abstract: Quantum mechanics occupies a central position in contemporary science while remaining largely inaccessible to direct sensory experience. This paper proposes a roadmap to quantum aesthetics t...
- Learning Human-Like Badminton Skills for Humanoid Robots : Abstract: Realizing versatile and human-like performance in high-demand sports like badminton remains a formidable challenge for humanoid robotics. Unlike standard locomotion or static manipulation, t...
- Reinforcement Learning with Backtracking Feedback : Abstract: Addressing the critical need for robust safety in Large Language Models (LLMs), particularly against adversarial attacks and in-distribution errors, we introduce Reinforcement Learning with ...
- Dynamic Long Context Reasoning over Compressed Memory via End-to-End Reinforcement Learning : Abstract: Large Language Models (LLMs) face significant challenges in long-context processing, including quadratic computational costs, information forgetting, and the context fragmentation inherent i...
- Altruism and Fair Objective in Mixed-Motive Markov games : Abstract: Cooperation is fundamental for society's viability, as it enables the emergence of structure within heterogeneous groups that seek collective well-being. However, individuals are inclined to...
- BiManiBench: A Hierarchical Benchmark for Evaluating Bimanual Coordination of Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have significantly advanced embodied AI, and using them to benchmark robotic intelligence has become a pivotal trend. However, existing frameworks re...
- Intelligent support for Human Oversight: Integrating Reinforcement Learning with Gaze Simulation to Personalize Highlighting : Abstract: Interfaces for human oversight must effectively support users' situation awareness under time-critical conditions. We explore reinforcement learning (RL)-based UI adaptation to personalize a...
- Optimizing Spectral Prediction in MXene-Based Metasurfaces Through Multi-Channel Spectral Refinement and Savitzky-Golay Smoothing : Abstract: The prediction of electromagnetic spectra for MXene-based solar absorbers is a computationally intensive task, traditionally addressed using full-wave solvers. This study introduces an effic...
- LLMs + Security = Trouble : Abstract: We argue that when it comes to producing secure code with AI, the prevailing "fighting fire with fire" approach -- using probabilistic AI-based checkers or attackers to secure probabilistica...
- Prism: Spectral-Aware Block-Sparse Attention : Abstract: Block-sparse attention is promising for accelerating long-context LLM pre-filling, yet identifying relevant blocks efficiently remains a bottleneck. Existing methods typically employ coarse-...
- Vista: Scene-Aware Optimization for Streaming Video Question Answering under Post-Hoc Queries : Abstract: Streaming video question answering (Streaming Video QA) poses distinct challenges for multimodal large language models (MLLMs), as video frames arrive sequentially and user queries can be is...
- Decentralized Spatial Reuse Optimization in Wi-Fi: An Internal Regret Minimization Approach : Abstract: Spatial Reuse (SR) is a cost-effective technique for improving spectral efficiency in dense IEEE 802.11 deployments by enabling simultaneous transmissions. However, the decentralized optimiz...
- Gesture Matters: Pedestrian Gesture Recognition for AVs Through Skeleton Pose Evaluation : Abstract: Gestures are a key component of non-verbal communication in traffic, often helping pedestrian-to-driver interactions when formal traffic rules may be insufficient. This problem becomes more ...
- CLEAR: A Knowledge-Centric Vessel Trajectory Analysis Platform : Abstract: Vessel trajectory data from the Automatic Identification System (AIS) is used widely in maritime analytics. Yet, analysis is difficult for non-expert users due to the incompleteness and comp...
- Contextual Rollout Bandits for Reinforcement Learning with Verifiable Rewards : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) is an effective paradigm for improving the reasoning capabilities of large language models. However, existing RLVR methods utilize rollo...
- A General Theory of Proportionality with Additive Utilities : Abstract: We consider a model where a subset of candidates must be selected based on voter preferences, subject to general constraints that specify which subsets are feasible. This model generalizes c...
- GISA: A Benchmark for General Information-Seeking Assistant : Abstract: The advancement of large language models (LLMs) has significantly accelerated the development of search agents capable of autonomously gathering information through multi-turn web interactio...
- GOT-Edit: Geometry-Aware Generic Object Tracking via Online Model Editing : Abstract: Human perception for effective object tracking in a 2D video stream arises from the implicit use of prior 3D knowledge combined with semantic reasoning. In contrast, most generic object trac...
- Stateless Yet Not Forgetful: Implicit Memory as a Hidden Channel in LLMs : Abstract: Large language models (LLMs) are commonly treated as stateless: once an interaction ends, no information is assumed to persist unless it is explicitly stored and re-supplied. We challenge th...
- Agent-Supported Foresight for AI Systemic Risks: AI Agents for Breadth, Experts for Judgment : Abstract: AI impact assessments often stress near-term risks because human judgment degrades over longer horizons, exemplifying the Collingridge dilemma: foresight is most needed when knowledge is sca...
- Predicting Future Utility: Global Combinatorial Optimization for Task-Agnostic KV Cache Eviction : Abstract: Given the quadratic complexity of attention, KV cache eviction is vital to accelerate model inference. Current KV cache eviction methods typically rely on instantaneous heuristic metrics, im...
- Kissan-Dost: Bridging the Last Mile in Smallholder Precision Agriculture with Conversational IoT : Abstract: We present Kissan-Dost, a multilingual, sensor-grounded conversational system that turns live on-farm measurements and weather into plain-language guidance delivered over WhatsApp text or vo...
- Breaking the Grid: Distance-Guided Reinforcement Learning in Large Discrete and Hybrid Action Spaces : Abstract: Reinforcement Learning is increasingly applied to logistics, scheduling, and recommender systems, but standard algorithms struggle with the curse of dimensionality in such large discrete act...
- Enhancing Genetic Algorithms with Graph Neural Networks: A Timetabling Case Study : Abstract: This paper investigates the impact of hybridizing a multi-modal Genetic Algorithm with a Graph Neural Network for timetabling optimization. The Graph Neural Network is designed to encapsulat...
- Sparse Models, Sparse Safety: Unsafe Routes in Mixture-of-Experts LLMs : Abstract: By introducing routers to selectively activate experts in Transformer layers, the mixture-of-experts (MoE) architecture significantly reduces computational costs in large language models (LL...
- CauScale: Neural Causal Discovery at Scale : Abstract: Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when s...
- We Should Separate Memorization from Copyright : Abstract: The widespread use of foundation models has introduced a new risk factor of copyright issue. This issue is leading to an active, lively and on-going debate amongst the data-science community...
- LEFT: Learnable Fusion of Tri-view Tokens for Unsupervised Time Series Anomaly Detection : Abstract: As a fundamental data mining task, unsupervised time series anomaly detection (TSAD) aims to build a model for identifying abnormal timestamps without assuming the availability of annotation...
- Equalized Generative Treatment: Matching f-divergences for Fairness in Generative Models : Abstract: Fairness is a crucial concern for generative models, which not only reflect but can also amplify societal and cultural biases. Existing fairness notions for generative models are largely ada...
- 6G-Bench: An Open Benchmark for Semantic Communication and Network-Level Reasoning with Foundation Models in AI-Native 6G Networks : Abstract: This paper introduces 6G-Bench, an open benchmark for evaluating semantic communication and network-level reasoning in AI-native 6G networks. 6G-Bench defines a taxonomy of 30 decision-makin...
- LLaDA2.1: Speeding Up Text Diffusion via Token Editing : Abstract: While LLaDA2.0 showcased the scaling potential of 100B-level block-diffusion models and their inherent parallelization, the delicate equilibrium between decoding speed and generation quality...
- CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation : Abstract: Large Language Models (LLMs) in long-context scenarios are severely constrained by the linear growth of Key-Value (KV) cache memory. Existing KV compression methods rely either on static thr...
- PBLean: Pseudo-Boolean Proof Certificates for Lean 4 : Abstract: We present PBLean, a method for importing VeriPB pseudo-Boolean (PB) proof certificates into Lean 4. Key to our approach is reflection: a Boolean checker function whose soundness is fully pr...
- Technosocial risks of ideal emotion recognition technologies: A defense of the (social) value of emotional expressions : Abstract: The prospect of AI systems that I call ideal emotion recognition technologies (ERTs) is often defended on the assumption that social life would benefit from increased affective transparency....
- Zero-shot System for Automatic Body Region Detection for Volumetric CT and MR Images : Abstract: Reliable identification of anatomical body regions is a prerequisite for many automated medical imaging workflows, yet existing solutions remain heavily dependent on unreliable DICOM metadat...
- QUOKA: Query-Oriented KV Selection For Efficient LLM Prefill : Abstract: We present QUOKA: Query-oriented KV selection for efficient attention, a training-free and hardware agnostic sparse attention algorithm for accelerating transformer inference under chunked p...
- Artifact Reduction in Undersampled 3D Cone-Beam CTs using a Hybrid 2D-3D CNN Framework : Abstract: Undersampled CT volumes minimize acquisition time and radiation exposure but introduce artifacts degrading image quality and diagnostic utility. Reducing these artifacts is critical for high...
- On the Expressive Power of GNNs for Boolean Satisfiability : Abstract: Machine learning approaches to solving Boolean Satisfiability (SAT) aim to replace handcrafted heuristics with learning-based models. Graph Neural Networks have emerged as the main architect...
- Efficient Brain Extraction of MRI Scans with Mild to Moderate Neuropathology : Abstract: Skull stripping magnetic resonance images (MRI) of the human brain is an important process in many image processing techniques, such as automatic segmentation of brain structures. Numerous m...
- Taming Scylla: Understanding the multi-headed agentic daemon of the coding seas : Abstract: LLM-based tools are automating more software development tasks at a rapid pace, but there is no rigorous way to evaluate how different architectural choices -- prompts, skills, tools, multi-...
- FreqLens: Interpretable Frequency Attribution for Time Series Forecasting : Abstract: Time series forecasting models often lack interpretability, limiting their adoption in domains requiring explainable predictions. We propose \textsc{FreqLens}, an interpretable forecasting f...
- Default Machine Learning Hyperparameters Do Not Provide Informative Initialization for Bayesian Optimization : Abstract: Bayesian Optimization (BO) is a standard tool for hyperparameter tuning thanks to its sample efficiency on expensive black-box functions. While most BO pipelines begin with uniform random in...
- Multimodal Learning for Arcing Detection in Pantograph-Catenary Systems : Abstract: The pantograph-catenary interface is essential for ensuring uninterrupted and reliable power delivery in electrified rail systems. However, electrical arcing at this interface poses serious ...
- Addressing data annotation scarcity in Brain Tumor Segmentation on 3D MRI scan Using a Semi-Supervised Teacher-Student Framework : Abstract: Accurate brain tumor segmentation from MRI is limited by expensive annotations and data heterogeneity across scanners and sites. We propose a semi-supervised teacher-student framework that c...
- $\texttt{lrnnx}$: A library for Linear RNNs : Abstract: Linear recurrent neural networks (LRNNs) provide a structured approach to sequence modeling that bridges classical linear dynamical systems and modern deep learning, offering both expressive...
- Permissive-Washing in the Open AI Supply Chain: A Large-Scale Audit of License Integrity : Abstract: Permissive licenses like MIT, Apache-2.0, and BSD-3-Clause dominate open-source AI, signaling that artifacts like models, datasets, and code can be freely used, modified, and redistributed. ...
- Affective Flow Language Model for Emotional Support Conversation : Abstract: Large language models (LLMs) have been widely applied to emotional support conversation (ESC). However, complex multi-turn support remains challenging.This is because existing alignment sche...
- WildReward: Learning Reward Models from In-the-Wild Human Interactions : Abstract: Reward models (RMs) are crucial for the training of large language models (LLMs), yet they typically rely on large-scale human-annotated preference pairs. With the widespread deployment of L...
- Dr. MAS: Stable Reinforcement Learning for Multi-Agent LLM Systems : Abstract: Multi-agent LLM systems enable advanced reasoning and tool use via role specialization, yet reliable reinforcement learning (RL) post-training for such systems remains difficult. In this wor...
- Discovering Interpretable Algorithms by Decompiling Transformers to RASP : Abstract: Recent work has shown that the computations of Transformers can be simulated in the RASP family of programming languages. These findings have enabled improved understanding of the expressive...
- FlattenGPT: Depth Compression for Transformer with Layer Flattening : Abstract: Recent works have indicated redundancy across transformer blocks, prompting the research of depth compression to prune less crucial blocks. However, current ways of entire-block pruning suff...
- Understanding Dynamic Compute Allocation in Recurrent Transformers : Abstract: Token-level adaptive computation seeks to reduce inference cost by allocating more computation to harder tokens and less to easier ones. However, prior work is primarily evaluated on natural...
- AnomSeer: Reinforcing Multimodal LLMs to Reason for Time-Series Anomaly Detection : Abstract: Time-series anomaly detection (TSAD) with multimodal large language models (MLLMs) is an emerging area, yet a persistent challenge remains: MLLMs rely on coarse time-series heuristics but st...
- Whose Name Comes Up? Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation : Abstract: Large language models (LLMs) are increasingly used for academic expert recommendation. Existing audits typically evaluate model outputs in isolation, largely ignoring end-user inference-time...
- Learning Potentials for Dynamic Matching and Application to Heart Transplantation : Abstract: Each year, thousands of patients in need of heart transplants face life-threatening wait times due to organ scarcity. While allocation policies aim to maximize population-level outcomes, cur...
- Breaking the Simplification Bottleneck in Amortized Neural Symbolic Regression : Abstract: Symbolic regression (SR) aims to discover interpretable analytical expressions that accurately describe observed data. Amortized SR promises to be much more efficient than the predominant ge...
- DeepQuali: Initial results of a study on the use of large language models for assessing the quality of user stories : Abstract: Generative artificial intelligence (GAI), specifically large language models (LLMs), are increasingly used in software engineering, mainly for coding tasks. However, requirements engineering...
- OmniReview: A Large-scale Benchmark and LLM-enhanced Framework for Realistic Reviewer Recommendation : Abstract: Academic peer review remains the cornerstone of scholarly validation, yet the field faces some challenges in data and methods. From the data perspective, existing research is hindered by the...
- Gesturing Toward Abstraction: Multimodal Convention Formation in Collaborative Physical Tasks : Abstract: A quintessential feature of human intelligence is the ability to create ad hoc conventions over time to achieve shared goals efficiently. We investigate how communication strategies evolve t...
- Automatic In-Domain Exemplar Construction and LLM-Based Refinement of Multi-LLM Expansions for Query Expansion : Abstract: Query expansion with large language models is promising but often relies on hand-crafted prompts, manually chosen exemplars, or a single LLM, making it non-scalable and sensitive to domain s...
- StealthRL: Reinforcement Learning Paraphrase Attacks for Multi-Detector Evasion of AI-Text Detectors : Abstract: AI-text detectors face a critical robustness challenge: adversarial paraphrasing attacks that preserve semantics while evading detection. We introduce StealthRL, a reinforcement learning fra...
- pixelLOG: Logging of Online Gameplay for Cognitive Research : Abstract: Traditional cognitive assessments often rely on isolated, output-focused measurements that may fail to capture the complexity of human cognition in naturalistic settings. We present pixelLOG...
- MotionCrafter: Dense Geometry and Motion Reconstruction with a 4D VAE : Abstract: We introduce MotionCrafter, a video diffusion-based framework that jointly reconstructs 4D geometry and estimates dense motion from a monocular video. The core of our method is a novel joint...
- A Behavioural and Representational Evaluation of Goal-Directedness in Language Model Agents : Abstract: Understanding an agent's goals helps explain and predict its behaviour, yet there is no established methodology for reliably attributing goals to agentic systems. We propose a framework for ...
- StretchTime: Adaptive Time Series Forecasting via Symplectic Attention : Abstract: Transformer architectures have established strong baselines in time series forecasting, yet they typically rely on positional encodings that assume uniform, index-based temporal progression....
- Next Concept Prediction in Discrete Latent Space Leads to Stronger Language Models : Abstract: We propose Next Concept Prediction (NCP), a generative pretraining paradigm built on top of Next Token Prediction (NTP). NCP predicts discrete concepts that span multiple tokens, thereby for...
- When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning : Abstract: Despite rapid progress in Multimodal Large Language Models (MLLMs), visual spatial reasoning remains unreliable when correct answers depend on how a scene would appear under unseen or altern...
- Linearization Explains Fine-Tuning in Large Language Models : Abstract: Parameter-Efficient Fine-Tuning (PEFT) is a popular class of techniques that strive to adapt large models in a scalable and resource-efficient manner. Yet, the mechanisms underlying their tr...
- Learning in Context, Guided by Choice: A Reward-Free Paradigm for Reinforcement Learning with Transformers : Abstract: In-context reinforcement learning (ICRL) leverages the in-context learning capabilities of transformer models (TMs) to efficiently generalize to unseen sequential decision-making tasks witho...
- STEP: Warm-Started Visuomotor Policies with Spatiotemporal Consistency Prediction : Abstract: Diffusion policies have recently emerged as a powerful paradigm for visuomotor control in robotic manipulation due to their ability to model the distribution of action sequences and capture ...
- Inverting Data Transformations via Diffusion Sampling : Abstract: We study the problem of transformation inversion on general Lie groups: a datum is transformed by an unknown group element, and the goal is to recover an inverse transformation that maps it ...
- When Do Multi-Agent Systems Outperform? Analysing the Learning Efficiency of Agentic Systems : Abstract: Reinforcement Learning (RL) has emerged as a crucial method for training or fine-tuning large language models (LLMs), enabling adaptive, task-specific optimizations through interactive feedb...
- Language Modeling and Understanding Through Paraphrase Generation and Detection : Abstract: Language enables humans to share knowledge, reason about the world, and pass on strategies for survival and innovation across generations. At the heart of this process is not just the abilit...
- PISCO: Precise Video Instance Insertion with Sparse Control : Abstract: The landscape of AI video generation is undergoing a pivotal shift: moving beyond general generation - which relies on exhaustive prompt-engineering and "cherry-picking" - towards fine-grain...
- Tighnari v2: Mitigating Label Noise and Distribution Shift in Multimodal Plant Distribution Prediction via Mixture of Experts and Weakly Supervised Learning : Abstract: Large-scale, cross-species plant distribution prediction plays a crucial role in biodiversity conservation, yet modeling efforts in this area still face significant challenges due to the spa...
- Noise Stability of Transformer Models : Abstract: Understanding simplicity biases in deep learning offers a promising path toward developing reliable AI. A common metric for this, inspired by Boolean function analysis, is average sensitivit...
- Trust-Based Incentive Mechanisms in Semi-Decentralized Federated Learning Systems : Abstract: In federated learning (FL), decentralized model training allows multi-ple participants to collaboratively improve a shared machine learning model without exchanging raw data. However, ensuri...
- SoulX-Singer: Towards High-Quality Zero-Shot Singing Voice Synthesis : Abstract: While recent years have witnessed rapid progress in speech synthesis, open-source singing voice synthesis (SVS) systems still face significant barriers to industrial deployment, particularly...
- Pruning as a Cooperative Game: Surrogate-Assisted Layer Contribution Estimation for Large Language Models : Abstract: While large language models (LLMs) demonstrate impressive performance across various tasks, their deployment in real-world scenarios is still constrained by high computational demands. Layer...
- How well are open sourced AI-generated image detection models out-of-the-box: A comprehensive benchmark study : Abstract: As AI-generated images proliferate across digital platforms, reliable detection methods have become critical for combating misinformation and maintaining content authenticity. While numerous...
- Efficient Representations are Controllable Representations : Abstract: What is the most brute-force way to install interpretable, controllable features into a model's activations? Controlling how LLMs internally represent concepts typically requires sophisticat...
- rePIRL: Learn PRM with Inverse RL for LLM Reasoning : Abstract: Process rewards have been widely used in deep reinforcement learning to improve training efficiency, reduce variance, and prevent reward hacking. In LLM reasoning, existing works also explor...
- SPD-Faith Bench: Diagnosing and Improving Faithfulness in Chain-of-Thought for Multimodal Large Language Models : Abstract: Chain-of-Thought reasoning is widely used to improve the interpretability of multimodal large language models (MLLMs), yet the faithfulness of the generated reasoning traces remains unclear....
- TodoEvolve: Learning to Architect Agent Planning Systems : Abstract: Planning has become a central capability for contemporary agent systems in navigating complex, long-horizon tasks, yet existing approaches predominantly rely on fixed, hand-crafted planning ...
- SAGE: Scalable AI Governance & Evaluation : Abstract: Evaluating relevance in large-scale search systems is fundamentally constrained by the governance gap between nuanced, resource-constrained human oversight and the high-throughput requiremen...
- Orchestrating Attention: Bringing Harmony to the 'Chaos' of Neurodivergent Learning States : Abstract: Adaptive learning systems optimize content delivery based on performance metrics but ignore the dynamic attention fluctuations that characterize neurodivergent learners. We present Attention...
- Direct Soft-Policy Sampling via Langevin Dynamics : Abstract: Soft policies in reinforcement learning define policies as Boltzmann distributions over state-action value functions, providing a principled mechanism for balancing exploration and exploitat...
- Rethinking Latency Denial-of-Service: Attacking the LLM Serving Framework, Not the Model : Abstract: Large Language Models face an emerging and critical threat known as latency attacks. Because LLM inference is inherently expensive, even modest slowdowns can translate into substantial opera...
- Deep Variable-Length Feedback Codes : Abstract: Deep learning has enabled significant advances in feedback-based channel coding, yet existing learned schemes remain fundamentally limited: they employ fixed block lengths, suffer degraded p...
- GRAFT: Decoupling Ranking and Calibration for Survival Analysis : Abstract: Survival analysis is complicated by censored data, high-dimensional features, and non-linear interactions. Classical models are interpretable but restrictive, while deep learning models are ...
- Rich-ARQ: From 1-bit Acknowledgment to Rich Neural Coded Feedback : Abstract: This paper reimagines the foundational feedback mechanism in wireless communication, transforming the prevailing 1-bit binary ACK/NACK with a high-dimensional, information-rich vector to tra...
- Scalable Adaptation of 3D Geometric Foundation Models via Weak Supervision from Internet Video : Abstract: Geometric foundation models show promise in 3D reconstruction, yet their progress is severely constrained by the scarcity of diverse, large-scale 3D annotations. While Internet videos offer ...
- Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents : Abstract: Large Language Model (LLM) code agents increasingly resolve repository-level issues by iteratively editing code, invoking tools, and validating candidate patches. In these workflows, agents ...
- Incremental Mapping with Measurement Synchronization & Compression : Abstract: Modern autonomous vehicles and robots utilize versatile sensors for localization and mapping. The fidelity of these maps is paramount, as an accurate environmental representation is a prereq...
- Adaptive Acquisition Selection for Bayesian Optimization with Large Language Models : Abstract: Bayesian Optimization critically depends on the choice of acquisition function, but no single strategy is universally optimal; the best choice is non-stationary and problem-dependent. Existi...
- AceGRPO: Adaptive Curriculum Enhanced Group Relative Policy Optimization for Autonomous Machine Learning Engineering : Abstract: Autonomous Machine Learning Engineering (MLE) requires agents to perform sustained, iterative optimization over long horizons. While recent LLM-based agents show promise, current prompt-base...
- CausalCompass: Evaluating the Robustness of Time-Series Causal Discovery in Misspecified Scenarios : Abstract: Causal discovery from time series is a fundamental task in machine learning. However, its widespread adoption is hindered by a reliance on untestable causal assumptions and by the lack of ro...
- Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities : Abstract: Securing petroleum infrastructure requires balancing autonomous system efficiency with human judgment for threat escalation, a challenge unaddressed by classical facility location models ass...
- A Kinetic-Energy Perspective of Flow Matching : Abstract: Flow-based generative models can be viewed through a physics lens: sampling transports a particle from noise to data by integrating a time-varying velocity field, and each sample corresponds...
- Bielik Guard: Efficient Polish Language Safety Classifiers for LLM Content Moderation : Abstract: As Large Language Models (LLMs) become increasingly deployed in Polish language applications, the need for efficient and accurate content safety classifiers has become paramount. We present ...
- Accuracy-Delay Trade-Off in LLM Offloading via Token-Level Uncertainty : Abstract: Large language models (LLMs) offer significant potential for intelligent mobile services but are computationally intensive for resource-constrained devices. Mobile edge computing (MEC) allow...
- Lost in Translation? A Comparative Study on the Cross-Lingual Transfer of Composite Harms : Abstract: Most safety evaluations of large language models (LLMs) remain anchored in English. Translation is often used as a shortcut to probe multilingual behavior, but it rarely captures the full pi...
- An Explainable Multi-Task Similarity Measure: Integrating Accumulated Local Effects and Weighted Fr\'echet Distance : Abstract: In many machine learning contexts, tasks are often treated as interconnected components with the goal of leveraging knowledge transfer between them, which is the central aim of Multi-Task Le...
- Learning-guided Kansa collocation for forward and inverse PDEs beyond linearity : Abstract: Partial Differential Equations are precise in modelling the physical, biological and graphical phenomena. However, the numerical methods suffer from the curse of dimensionality, high computa...
- MCIE: Multimodal LLM-Driven Complex Instruction Image Editing with Spatial Guidance : Abstract: Recent advances in instruction-based image editing have shown remarkable progress. However, existing methods remain limited to relatively simple editing operations, hindering real-world appl...
- Don't Always Pick the Highest-Performing Model: An Information Theoretic View of LLM Ensemble Selection : Abstract: Large language models (LLMs) are often ensembled together to improve overall reliability and robustness, but in practice models are strongly correlated. This raises a fundamental question: w...
- DeltaKV: Residual-Based KV Cache Compression via Long-Range Similarity : Abstract: The deployment of efficient long-context LLMs in applications like autonomous agents, long-chain reasoning, and creative writing is fundamentally bottlenecked by the linear growth of KV cach...
- ForecastOcc: Vision-based Semantic Occupancy Forecasting : Abstract: Autonomous driving requires forecasting both geometry and semantics over time to effectively reason about future environment states. Existing vision-based occupancy forecasting methods focus...
- From $O(mn)$ to $O(r^2)$: Two-Sided Low-Rank Communication for Adam in Distributed Training with Memory Efficiency : Abstract: As foundation models continue to scale, pretraining increasingly relies on data-parallel distributed optimization, making bandwidth-limited gradient synchronization a key bottleneck. Orthogo...
- ICBAC: an Intelligent Contract-Based Access Control framework for supply chain management by integrating blockchain and federated learning : Abstract: This paper addresses the critical challenge of access control in modern supply chains, which operate across multiple independent and competing organizations. Existing access control is stati...
- The Rise of Sparse Mixture-of-Experts:A Survey from Algorithmic Foundations to Decentralized Architectures and Vertical Domain Applications : Abstract: The sparse Mixture of Experts(MoE) architecture has evolved as a powerful approach for scaling deep learning models to more parameters with comparable computation cost. As an important branc...
- CyberExplorer: Benchmarking LLM Offensive Security Capabilities in a Real-World Attacking Simulation Environment : Abstract: Real-world offensive security operations are inherently open-ended: attackers explore unknown attack surfaces, revise hypotheses under uncertainty, and operate without guaranteed success. Ex...
- FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging : Abstract: Although Video Large Language Models (VLLMs) have shown remarkable capabilities in video understanding, they are required to process high volumes of visual tokens, causing significant comput...
- MIND: Benchmarking Memory Consistency and Action Control in World Models : Abstract: World models aim to understand, remember, and predict dynamic visual environments, yet a unified benchmark for evaluating their fundamental abilities remains lacking. To address this gap, we...
- FIRE: Frobenius-Isometry Reinitialization for Balancing the Stability-Plasticity Tradeoff : Abstract: Deep neural networks trained on nonstationary data must balance stability (i.e., retaining prior knowledge) and plasticity (i.e., adapting to new tasks). Standard reinitialization methods, w...
- Implicit Strategic Optimization: Rethinking Long-Horizon Decision-Making in Adversarial Poker Environments : Abstract: Training large language model (LLM) agents for adversarial games is often driven by episodic objectives such as win rate. In long-horizon settings, however, payoffs are shaped by latent stra...
- V-ABFT: Variance-Based Adaptive Threshold for Fault-Tolerant Matrix Multiplication in Mixed-Precision Deep Learning : Abstract: Algorithm-Based Fault Tolerance (ABFT) is widely adopted to detect silent data corruptions (SDCs) in matrix multiplication, a cornerstone operation in deep learning systems. However, existin...
- Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning : Abstract: Offline reinforcement learning (RL) provides a compelling paradigm for training autonomous systems without the risks of online exploration, particularly in safety-critical domains. However, ...
- Weak to Strong: VLM-Based Pseudo-Labeling as a Weakly Supervised Training Strategy in Multimodal Video-based Hidden Emotion Understanding Tasks : Abstract: To tackle the automatic recognition of "concealed emotions" in videos, this paper proposes a multimodal weak-supervision framework and achieves state-of-the-art results on the iMiGUE tennis-...
- Picasso: Holistic Scene Reconstruction with Physics-Constrained Sampling : Abstract: In the presence of occlusions and measurement noise, geometrically accurate scene reconstructions -- which fit the sensor data -- can still be physically incorrect. For instance, when estima...
- DICE: Disentangling Artist Style from Content via Contrastive Subspace Decomposition in Diffusion Models : Abstract: The recent proliferation of diffusion models has made style mimicry effortless, enabling users to imitate unique artistic styles without authorization. In deployed platforms, this raises cop...
- SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm : Abstract: Modern Transformers predominantly adopt the Pre-Norm paradigm for its optimization stability, foregoing the superior potential of the unstable Post-Norm architecture. Prior attempts to combi...
- Multimodal normative modeling in Alzheimers Disease with introspective variational autoencoders : Abstract: Normative modeling learns a healthy reference distribution and quantifies subject-specific deviations to capture heterogeneous disease effects. In Alzheimers disease (AD), multimodal neuroim...
- Spectral Guardrails for Agents in the Wild: Detecting Tool Use Hallucinations via Attention Topology : Abstract: Deploying autonomous agents in the wild requires reliable safeguards against tool use failures. We propose a training free guardrail based on spectral analysis of attention topology that com...
- Large language models for spreading dynamics in complex systems : Abstract: Spreading dynamics is a central topic in the physics of complex systems and network science, providing a unified framework for understanding how information, behaviors, and diseases propagat...
- Online Domain-aware LLM Decoding for Continual Domain Evolution : Abstract: LLMs are typically fine-tuned offline on domain-specific data, assuming a static domain. In practice, domain knowledge evolves continuously through new regulations, products, services, and i...
- VidVec: Unlocking Video MLLM Embeddings for Video-Text Retrieval : Abstract: Recent studies have adapted generative Multimodal Large Language Models (MLLMs) into embedding extractors for vision tasks, typically through fine-tuning to produce universal representations...
- Emergent Search and Backtracking in Latent Reasoning Models : Abstract: What happens when a language model thinks without words? Standard reasoning LLMs verbalize intermediate steps as chain-of-thought; latent reasoning transformers (LRTs) instead perform delibe...
- Constrained Pricing under Finite Mixtures of Logit : Abstract: The mixed logit model is a flexible and widely used demand model in pricing and revenue management. However, existing work on mixed-logit pricing largely focuses on unconstrained settings, l...
- Gender and Race Bias in Consumer Product Recommendations by Large Language Models : Abstract: Large Language Models are increasingly employed in generating consumer product recommendations, yet their potential for embedding and amplifying gender and race biases remains underexplored....
- Robustness of Vision Language Models Against Split-Image Harmful Input Attacks : Abstract: Vision-Language Models (VLMs) are now a core part of modern AI. Recent work proposed several visual jailbreak attacks using single/ holistic images. However, contemporary VLMs demonstrate st...
- Reliable and Responsible Foundation Models: A Comprehensive Survey : Abstract: Foundation models, including Large Language Models (LLMs), Multimodal Large Language Models (MLLMs), Image Generative Models (i.e, Text-to-Image Models and Image-Editing Models), and Video G...
- DIAL-SUMMER: A Structured Evaluation Framework of Hierarchical Errors in Dialogue Summaries : Abstract: Dialogues are a predominant mode of communication for humans, and it is immensely helpful to have automatically generated summaries of them (e.g., to revise key points discussed in a meeting...
- The Confidence Manifold: Geometric Structure of Correctness Representations in Language Models : Abstract: When a language model asserts that "the capital of Australia is Sydney," does it know this is wrong? We characterize the geometry of correctness representations across 9 models from 5 archit...
- Self-Supervised Bootstrapping of Action-Predictive Embodied Reasoning : Abstract: Embodied Chain-of-Thought (CoT) reasoning has significantly enhanced Vision-Language-Action (VLA) models, yet current methods rely on rigid templates to specify reasoning primitives (e.g., o...
- Nexus: Inferring Join Graphs from Metadata Alone via Iterative Low-Rank Matrix Completion : Abstract: Automatically inferring join relationships is a critical task for effective data discovery, integration, querying and reuse. However, accurately and efficiently identifying these relationshi...
- Large Language Models in Peer-Run Community Behavioral Health Services: Understanding Peer Specialists and Service Users' Perspectives on Opportunities, Risks, and Mitigation Strategies : Abstract: Peer-run organizations (PROs) provide critical, recovery-based behavioral health support rooted in lived experience. As large language models (LLMs) enter this domain, their scale, conversat...
- Dreaming in Code for Curriculum Learning in Open-Ended Worlds : Abstract: Open-ended learning frames intelligence as emerging from continual interaction with an ever-expanding space of environments. While recent advances have utilized foundation models to programm...
- DrugR: Optimizing Molecular Drugs through LLM-based Explicit Reasoning : Abstract: Molecule generation and optimization is a fundamental task in chemical domain. The rapid development of intelligent tools, especially large language models (LLMs) with powerful knowledge res...
- Sparsity-Aware Evolution for Model Merging : Abstract: We propose a sparsity-aware evolutionary (SAE) framework for model merging that involves iterative pruning-merging cycles to act as a novel mutation operator. We incorporate the sparsity con...
- CoRect: Context-Aware Logit Contrast for Hidden State Rectification to Resolve Knowledge Conflicts : Abstract: Retrieval-Augmented Generation (RAG) often struggles with knowledge conflicts, where model-internal parametric knowledge overrides retrieved evidence, leading to unfaithful outputs. Existing...
- Investigating Writing Professionals' Relationships with Generative AI: How Combined Perceptions of Rivalry and Collaboration Shape Work Practices and Outcomes : Abstract: This study investigates how professional writers' complex relationship with GenAI shapes their work practices and outcomes. Through a cross-sectional survey with writing professionals (n=403...
- Generating Adversarial Events: A Motion-Aware Point Cloud Framework : Abstract: Event cameras have been widely adopted in safety-critical domains such as autonomous driving, robotics, and human-computer interaction. A pressing challenge arises from the vulnerability of ...
- Tutti: Expressive Multi-Singer Synthesis via Structure-Level Timbre Control and Vocal Texture Modeling : Abstract: While existing Singing Voice Synthesis systems achieve high-fidelity solo performances, they are constrained by global timbre control, failing to address dynamic multi-singer arrangement and...
- When Benign Inputs Lead to Severe Harms: Eliciting Unsafe Unintended Behaviors of Computer-Use Agents : Abstract: Although computer-use agents (CUAs) hold significant potential to automate increasingly complex OS workflows, they can demonstrate unsafe unintended behaviors that deviate from expected outc...
- Sequences as Nodes for Contrastive Multimodal Graph Recommendation : Abstract: To tackle cold-start and data sparsity issues in recommender systems, numerous multimodal, sequential, and contrastive techniques have been proposed. While these augmentations can boost reco...
- Multi-Agentic AI for Fairness-Aware and Accelerated Multi-modal Large Model Inference in Real-world Mobile Edge Networks : Abstract: Generative AI (GenAI) has transformed applications in natural language processing and content creation, yet centralized inference remains hindered by high latency, limited customizability, a...
- Collaborative and Efficient Fine-tuning: Leveraging Task Similarity : Abstract: Adaptability has been regarded as a central feature in the foundation models, enabling them to effectively acclimate to unseen downstream tasks. Parameter-efficient fine-tuning methods such ...
- The Median is Easier than it Looks: Approximation with a Constant-Depth, Linear-Width ReLU Network : Abstract: We study the approximation of the median of $d$ inputs using ReLU neural networks. We present depth-width tradeoffs under several settings, culminating in a constant-depth, linear-width cons...
- ArcMark: Multi-bit LLM Watermark via Optimal Transport : Abstract: Watermarking is an important tool for promoting the responsible use of language models (LMs). Existing watermarks insert a signal into generated tokens that either flags LM-generated text (z...
- Realistic Synthetic Household Data Generation at Scale : Abstract: Advancements in foundation models have catalyzed research in Embodied AI to develop interactive agents capable of environmental reasoning and interaction. Developing such agents requires div...
- The Double-Edged Sword of Data-Driven Super-Resolution: Adversarial Super-Resolution Models : Abstract: Data-driven super-resolution (SR) methods are often integrated into imaging pipelines as preprocessing steps to improve downstream tasks such as classification and detection. However, these ...
- Graph homophily booster: Reimagining the role of discrete features in heterophilic graph learning : Abstract: Graph neural networks (GNNs) have emerged as a powerful tool for modeling graph-structured data. However, existing GNNs often struggle with heterophilic graphs, where connected nodes tend to...
- Cognitive algorithms and systems of episodic memory, semantic memory and their learnings : Abstract: Declarative memory, the memory that can be "declared" in words or languages, is made up of two dissociated parts: episodic memory and semantic memory. This dissociation has its neuroanatomic...
- aerial-autonomy-stack -- a Faster-than-real-time, Autopilot-agnostic, ROS2 Framework to Simulate and Deploy Perception-based Drones : Abstract: Unmanned aerial vehicles are rapidly transforming multiple applications, from agricultural and infrastructure monitoring to logistics and defense. Introducing greater autonomy to these syste...
- XShare: Collaborative in-Batch Expert Sharing for Faster MoE Inference : Abstract: Mixture-of-Experts (MoE) architectures are increasingly used to efficiently scale large language models. However, in production inference, request batching and speculative decoding significa...
- Laplacian-LoRA: Delaying Oversmoothing in Deep GCNs via Spectral Low-Rank Adaptation : Abstract: Oversmoothing is a fundamental limitation of deep graph convolutional networks (GCNs), causing node representations to collapse as depth increases. While many prior approaches mitigate this ...
- Imagining the Alien: Human Projections and Cognitive Limitations : Abstract: Imagining what life on other planets, and intelligent life in particular, may be like is a long-running theme in human culture. It is a manifestation of the innate human curiosity about the ...
- Fin-RATE: A Real-world Financial Analytics and Tracking Evaluation Benchmark for LLMs on SEC Filings : Abstract: With increasing deployment of Large Language Models (LLMs) in the finance domain, LLMs are increasingly expected to parse complex regulatory disclosures. However, existing benchmarks often f...
- Progressive Searching for Retrieval in RAG : Abstract: Retrieval Augmented Generation (RAG) is a promising technique for mitigating two key limitations of large language models (LLMs): outdated information and hallucinations. RAG system stores d...
- Principled Synthetic Data Enables the First Scaling Laws for LLMs in Recommendation : Abstract: Large Language Models (LLMs) represent a promising frontier for recommender systems, yet their development has been impeded by the absence of predictable scaling laws, which are crucial for ...
- KRONE: Hierarchical and Modular Log Anomaly Detection : Abstract: Log anomaly detection is crucial for uncovering system failures and security risks. Although logs originate from nested component executions with clear boundaries, this structure is lost whe...
- LIT-GRAPH: Evaluating Deep vs. Shallow Graph Embeddings for High-Quality Text Recommendation in Domain-Specific Knowledge Graphs : Abstract: This study presents LIT-GRAPH (Literature Graph for Recommendation and Pedagogical Heuristics), a novel knowledge graph-based recommendation system designed to scaffold high school English t...
- Semantic Search At LinkedIn : Abstract: Semantic search with large language models (LLMs) enables retrieval by meaning rather than keyword overlap, but scaling it requires major inference efficiency advances. We present LinkedIn's...
- LUCID-SAE: Learning Unified Vision-Language Sparse Codes for Interpretable Concept Discovery : Abstract: Sparse autoencoders (SAEs) offer a natural path toward comparable explanations across different representation spaces. However, current SAEs are trained per modality, producing dictionaries ...
- Beyond Accuracy: Risk-Sensitive Evaluation of Hallucinated Medical Advice : Abstract: Large language models are increasingly being used in patient-facing medical question answering, where hallucinated outputs can vary widely in potential harm. However, existing hallucination ...
- Action-to-Action Flow Matching : Abstract: Diffusion-based policies have recently achieved remarkable success in robotics by formulating action prediction as a conditional denoising process. However, the standard practice of sampling...
- High Fidelity Textual User Representation over Heterogeneous Sources via Reinforcement Learning : Abstract: Effective personalization on large-scale job platforms requires modeling members based on heterogeneous textual sources, including profiles, professional data, and search activity logs. As r...
- Intent Mismatch Causes LLMs to Get Lost in Multi-Turn Conversation : Abstract: Multi-turn conversation has emerged as a predominant interaction paradigm for Large Language Models (LLMs). Users often employ follow-up questions to refine their intent, expecting LLMs to a...
- Seeing Roads Through Words: A Language-Guided Framework for RGB-T Driving Scene Segmentation : Abstract: Robust semantic segmentation of road scenes under adverse illumination, lighting, and shadow conditions remain a core challenge for autonomous driving applications. RGB-Thermal fusion is a s...
- TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling : Abstract: Large language models (LLMs) achieve remarkable performance but demand substantial computational resources, limiting deployment on edge devices and resource-constrained environments. We pres...
- Advantages of Domain Knowledge Injection for Legal Document Summarization: A Case Study on Summarizing Indian Court Judgments in English and Hindi : Abstract: Summarizing Indian legal court judgments is a complex task not only due to the intricate language and unstructured nature of the legal texts, but also since a large section of the Indian pop...
- Scout Before You Attend: Sketch-and-Walk Sparse Attention for Efficient LLM Inference : Abstract: Self-attention dominates the computational and memory cost of long-context LLM inference across both prefill and decode phases. To address this challenge, we introduce Sketch&Walk Attention,...
- AgentSys: Secure and Dynamic LLM Agents Through Explicit Hierarchical Memory Management : Abstract: Indirect prompt injection threatens LLM agents by embedding malicious instructions in external content, enabling unauthorized actions and data theft. LLM agents maintain working memory throu...
- Learning Molecular Chirality via Chiral Determinant Kernels : Abstract: Chirality is a fundamental molecular property that governs stereospecific behavior in chemistry and biology. Capturing chirality in machine learning models remains challenging due to the geo...
- Secure Code Generation via Online Reinforcement Learning with Vulnerability Reward Model : Abstract: Large language models (LLMs) are increasingly used in software development, yet their tendency to generate insecure code remains a major barrier to real-world deployment. Existing secure cod...
- Brep2Shape: Boundary and Shape Representation Alignment via Self-Supervised Transformers : Abstract: Boundary representation (B-rep) is the industry standard for computer-aided design (CAD). While deep learning shows promise in processing B-rep models, existing methods suffer from a represe...
- Multi-Agent Systems Shape Social Norms for Prosocial Behavior Change : Abstract: Social norm interventions are used promote prosocial behaviors by highlighting prevalent actions, but their effectiveness is often limited in heterogeneous populations where shared understan...
- Bridging Speech, Emotion, and Motion: a VLM-based Multimodal Edge-deployable Framework for Humanoid Robots : Abstract: Effective human-robot interaction requires emotionally rich multimodal expressions, yet most humanoid robots lack coordinated speech, facial expressions, and gestures. Meanwhile, real-world ...
- TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control : Abstract: Recent advances in humanoid whole-body motion tracking have enabled the execution of diverse and highly coordinated motions on real hardware. However, existing controllers are commonly drive...
- Proximal Action Replacement for Behavior Cloning Actor-Critic in Offline Reinforcement Learning : Abstract: Offline reinforcement learning (RL) optimizes policies from a previously collected static dataset and is an important branch of RL. A popular and promising approach is to regularize actor-cr...
- Pull Requests as a Training Signal for Repo-Level Code Editing : Abstract: Repository-level code editing requires models to understand complex dependencies and execute precise multi-file modifications across a large codebase. While recent gains on SWE-bench rely he...
- Deriving Neural Scaling Laws from the statistics of natural language : Abstract: Despite the fact that experimental neural scaling laws have substantially guided empirical progress in large-scale machine learning, no existing theory can quantitatively predict the exponen...
- VividFace: Real-Time and Realistic Facial Expression Shadowing for Humanoid Robots : Abstract: Humanoid facial expression shadowing enables robots to realistically imitate human facial expressions in real time, which is critical for lifelike, facially expressive humanoid robots and af...
- MemPot: Defending Against Memory Extraction Attack with Optimized Honeypots : Abstract: Large Language Model (LLM)-based agents employ external and internal memory systems to handle complex, goal-oriented tasks, yet this exposes them to severe extraction attacks, and effective ...
- MDL: A Unified Multi-Distribution Learner in Large-scale Industrial Recommendation through Tokenization : Abstract: Industrial recommender systems increasingly adopt multi-scenario learning (MSL) and multi-task learning (MTL) to handle diverse user interactions and contexts, but existing approaches suffer...
- Fine-Grained Cat Breed Recognition with Global Context Vision Transformer : Abstract: Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-base...
- Beyond Core and Penumbra: Bi-Temporal Image-Driven Stroke Evolution Analysis : Abstract: Computed tomography perfusion (CTP) at admission is routinely used to estimate the ischemic core and penumbra, while follow-up diffusion-weighted MRI (DWI) provides the definitive infarct ou...
- Linguistic properties and model scale in brain encoding: from small to compressed language models : Abstract: Recent work has shown that scaling large language models (LLMs) improves their alignment with human brain activity, yet it remains unclear what drives these gains and which representational ...
- Revealing the Semantic Selection Gap in DINOv3 through Training-Free Few-Shot Segmentation : Abstract: Recent self-supervised Vision Transformers (ViTs), such as DINOv3, provide rich feature representations for dense vision tasks. This study investigates the intrinsic few-shot semantic segmen...
- VISOR: VIsual Spatial Object Reasoning for Language-driven Object Navigation : Abstract: Language-driven object navigation requires agents to interpret natural language descriptions of target objects, which combine intrinsic and extrinsic attributes for instance recognition and ...
- Gaussian Match-and-Copy: A Minimalist Benchmark for Studying Transformer Induction : Abstract: Match-and-copy is a core retrieval primitive used at inference time by large language models to retrieve a matching token from the context then copy its successor. Yet, understanding how thi...
- Cross-Camera Cow Identification via Disentangled Representation Learning : Abstract: Precise identification of individual cows is a fundamental prerequisite for comprehensive digital management in smart livestock farming. While existing animal identification methods excel in...
- How does longer temporal context enhance multimodal narrative video processing in the brain? : Abstract: Understanding how humans and artificial intelligence systems process complex narrative videos is a fundamental challenge at the intersection of neuroscience and machine learning. This study ...
- Graph Domain Adaptation via Homophily-Agnostic Reconstructing Structure : Abstract: Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. However, existing GDA methods typically a...
- Automated rock joint trace mapping using a supervised learning model trained on synthetic data generated by parametric modelling : Abstract: This paper presents a geology-driven machine learning method for automated rock joint trace mapping from images. The approach combines geological modelling, synthetic data generation, and su...
- Learning to Self-Verify Makes Language Models Better Reasoners : Abstract: Recent large language models (LLMs) achieve strong performance in generating promising reasoning paths for complex tasks. However, despite powerful generation ability, LLMs remain weak at ve...
- TeleBoost: A Systematic Alignment Framework for High-Fidelity, Controllable, and Robust Video Generation : Abstract: Post-training is the decisive step for converting a pretrained video generator into a production-oriented model that is instruction-following, controllable, and robust over long temporal hor...
- Astro: Activation-guided Structured Regularization for Outlier-Robust LLM Post-Training Quantization : Abstract: Weight-only post-training quantization (PTQ) is crucial for efficient Large Language Model (LLM) deployment but suffers from accuracy degradation caused by weight and activation outliers. Ex...
- Fine-R1: Make Multi-modal LLMs Excel in Fine-Grained Visual Recognition by Chain-of-Thought Reasoning : Abstract: Any entity in the visual world can be hierarchically grouped based on shared characteristics and mapped to fine-grained sub-categories. While Multi-modal Large Language Models (MLLMs) achiev...
- Evaluating Large Language Models for Detecting Architectural Decision Violations : Abstract: Architectural Decision Records (ADRs) play a central role in maintaining software architecture quality, yet many decision violations go unnoticed because projects lack both systematic docume...
- SERE: Similarity-based Expert Re-routing for Efficient Batch Decoding in MoE Models : Abstract: Mixture-of-Experts (MoE) architectures employ sparse activation to deliver faster training and inference with higher accuracy than dense LLMs. However, in production serving, MoE models requ...
- AD-MIR: Bridging the Gap from Perception to Persuasion in Advertising Video Understanding via Structured Reasoning : Abstract: Multimodal understanding of advertising videos is essential for interpreting the intricate relationship between visual storytelling and abstract persuasion strategies. However, despite excel...
- From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding : Abstract: Infographics are widely used to communicate information with a combination of text, icons, and data visualizations, but once exported as images their content is locked into pixels, making up...
- Agent-Fence: Mapping Security Vulnerabilities Across Deep Research Agents : Abstract: Large language models are increasingly deployed as *deep agents* that plan, maintain persistent state, and invoke external tools, shifting safety failures from unsafe text to unsafe *traject...
- Continuous Program Search : Abstract: Genetic Programming yields interpretable programs, but small syntactic mutations can induce large, unpredictable behavioral shifts, degrading locality and sample efficiency. We frame this as...
- SoK: DARPA's AI Cyber Challenge (AIxCC): Competition Design, Architectures, and Lessons Learned : Abstract: DARPA's AI Cyber Challenge (AIxCC, 2023--2025) is the largest competition to date for building fully autonomous cyber reasoning systems (CRSs) that leverage recent advances in AI -- particul...
- Looking and Listening Inside and Outside: Multimodal Artificial Intelligence Systems for Driver Safety Assessment and Intelligent Vehicle Decision-Making : Abstract: The looking-in-looking-out (LILO) framework has enabled intelligent vehicle applications that understand both the outside scene and the driver state to improve safety outcomes, with examples...
- Surprisal-Guided Selection: Compute-Optimal Test-Time Strategies for Execution-Grounded Code Generation : Abstract: Test-time training (TTT) adapts language models through gradient-based updates at inference. But is adaptation the right strategy? We study compute-optimal test-time strategies for verifiabl...
- Debugging code world models : Abstract: Code World Models (CWMs) are language models trained to simulate program execution by predicting explicit runtime state after every executed command. This execution-based world modeling enab...
- Spectral Gating Networks : Abstract: Gating mechanisms are ubiquitous, yet a complementary question in feed-forward networks remains under-explored: how to introduce frequency-rich expressivity without sacrificing stability and...
- Vision and language: Novel Representations and Artificial intelligence for Driving Scene Safety Assessment and Autonomous Vehicle Planning : Abstract: Vision-language models (VLMs) have recently emerged as powerful representation learning systems that align visual observations with natural language concepts, offering new opportunities for ...
- Mapping Drivers of Greenness: Spatial Variable Selection for MODIS Vegetation Indices : Abstract: Understanding how environmental drivers relate to vegetation condition motivates spatially varying regression models, but estimating a separate coefficient surface for every predictor can yi...
- Process-of-Thought Reasoning for Videos : Abstract: Video understanding requires not only recognizing visual content but also performing temporally grounded, multi-step reasoning over long and noisy observations. We propose Process-of-Thought...
- On the Infinite Width and Depth Limits of Predictive Coding Networks : Abstract: Predictive coding (PC) is a biologically plausible alternative to standard backpropagation (BP) that minimises an energy function with respect to network activities before updating weights. ...
- Do We Need Adam? Surprisingly Strong and Sparse Reinforcement Learning with SGD in LLMs : Abstract: Reinforcement learning (RL), particularly RL from verifiable reward (RLVR), has become a crucial phase of training large language models (LLMs) and a key focus of current scaling efforts. Ho...
- The Laplacian Keyboard: Beyond the Linear Span : Abstract: Across scientific disciplines, Laplacian eigenvectors serve as a fundamental basis for simplifying complex systems, from signal processing to quantum mechanics. In reinforcement learning (RL...
- Learnable Chernoff Baselines for Inference-Time Alignment : Abstract: We study inference-time reward-guided alignment for generative models. Existing methods often rely on either architecture-specific adaptations or computationally costly inference procedures....
- HypRAG: Hyperbolic Dense Retrieval for Retrieval Augmented Generation : Abstract: Embedding geometry plays a fundamental role in retrieval quality, yet dense retrievers for retrieval-augmented generation (RAG) remain largely confined to Euclidean space. However, natural l...
- Preference Conditioned Multi-Objective Reinforcement Learning: Decomposed, Diversity-Driven Policy Optimization : Abstract: Multi-objective reinforcement learning (MORL) seeks to learn policies that balance multiple, often conflicting objectives. Although a single preference-conditioned policy is the most flexibl...
- PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification : Abstract: Distilling knowledge from large Vision-Language Models (VLMs) into lightweight networks is crucial yet challenging in Fine-Grained Visual Classification (FGVC), due to the reliance on fixed ...
- Generative Reasoning Re-ranker : Abstract: Recent studies increasingly explore Large Language Models (LLMs) as a new paradigm for recommendation systems due to their scalability and world knowledge. However, existing work has three k...
- Still Manual? Automated Linter Configuration via DSL-Based LLM Compilation of Coding Standards : Abstract: Coding standards are essential for maintaining consistent and high-quality code across teams and projects. Linters help developers enforce these standards by detecting code violations. Howev...
- Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models : Abstract: Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured, human-like conceptual representations within these mo...
- CausalTAD: Injecting Causal Knowledge into Large Language Models for Tabular Anomaly Detection : Abstract: Detecting anomalies in tabular data is critical for many real-world applications, such as credit card fraud detection. With the rapid advancements in large language models (LLMs), state-of-t...
- Fairness Aware Reward Optimization : Abstract: Demographic skews in human preference data propagate systematic unfairness through reward models into aligned LLMs. We introduce Fairness Aware Reward Optimization (Faro), an in-processing f...
- VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos : Abstract: In long-video understanding, conventional uniform frame sampling often fails to capture key visual evidence, leading to degraded performance and increased hallucinations. To address this, re...
- Fast and Robust Likelihood-Guided Diffusion Posterior Sampling with Amortized Variational Inference : Abstract: Zero-shot diffusion posterior sampling offers a flexible framework for inverse problems by accommodating arbitrary degradation operators at test time, but incurs high computational cost due ...
- scDFM: Distributional Flow Matching Model for Robust Single-Cell Perturbation Prediction : Abstract: A central goal in systems biology and drug discovery is to predict the transcriptional response of cells to perturbations. This task is challenging due to the noisy and sparse nature of sing...
- Extended to Reality: Prompt Injection in 3D Environments : Abstract: Multimodal large language models (MLLMs) have advanced the capabilities to interpret and act on visual input in 3D environments, empowering diverse applications such as robotics and situated...
- Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models : Abstract: Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet incorporating speech with 3D facial animation remains largely unexplored despite its import...
- ShallowJail: Steering Jailbreaks against Large Language Models : Abstract: Large Language Models(LLMs) have been successful in numerous fields. Alignment has usually been applied to prevent them from harmful purposes. However, aligned LLMs remain vulnerable to jail...
- Reasoning-Augmented Representations for Multimodal Retrieval : Abstract: Universal Multimodal Retrieval (UMR) seeks any-to-any search across text and vision, yet modern embedding models remain brittle when queries require latent reasoning (e.g., resolving undersp...
- Landscaper: Understanding Loss Landscapes Through Multi-Dimensional Topological Analysis : Abstract: Loss landscapes are a powerful tool for understanding neural network optimization and generalization, yet traditional low-dimensional analyses often miss complex topological features. We pre...
- Exploring Teachers' Perspectives on Using Conversational AI Agents for Group Collaboration : Abstract: Collaboration is a cornerstone of 21st-century learning, yet teachers continue to face challenges in supporting productive peer interaction. Emerging generative AI tools offer new possibilit...
- BONSAI: Bayesian Optimization with Natural Simplicity and Interpretability : Abstract: Bayesian optimization (BO) is a popular technique for sample-efficient optimization of black-box functions. In many applications, the parameters being tuned come with a carefully engineered ...
- On Randomness in Agentic Evals : Abstract: Agentic systems are evaluated on benchmarks where agents interact with environments to solve tasks. Most papers report a pass@1 score computed from a single run per task, assuming this gives...
- Beyond Pooling: Matching for Robust Generalization under Data Heterogeneity : Abstract: Pooling heterogeneous datasets across domains is a common strategy in representation learning, but naive pooling can amplify distributional asymmetries and yield biased estimators, especiall...
- Mimetic Initialization of MLPs : Abstract: Mimetic initialization uses pretrained models as case studies of good initialization, using observations of structures in trained weights to inspire new, simple initialization techniques. So...
- Free Energy Mixer : Abstract: Standard attention stores keys/values losslessly but reads them via a per-head convex average, blocking channel-wise selection. We propose the Free Energy Mixer (FEM): a free-energy (log-sum...
- Your Language Model Secretly Contains Personality Subnetworks : Abstract: Humans shift between different personas depending on social context. Large Language Models (LLMs) demonstrate a similar flexibility in adopting different personas and behaviors. Existing app...
- Open TutorAI: An Open-source Platform for Personalized and Immersive Learning with Generative AI : Abstract: Recent advances in artificial intelligence have created new possibilities for making education more scalable, adaptive, and learner-centered. However, existing educational chatbot systems of...
- An Information-Theoretic Framework for Comparing Voice and Text Explainability : Abstract: Explainable Artificial Intelligence (XAI) aims to make machine learning models transparent and trustworthy, yet most current approaches communicate explanations visually or through text. Thi...
- Long-Context Long-Form Question Answering for Legal Domain : Abstract: Legal documents have complex document layouts involving multiple nested sections, lengthy footnotes and further use specialized linguistic devices like intricate syntax and domain-specific v...
- "Death" of a Chatbot: Investigating and Designing Toward Psychologically Safe Endings for Human-AI Relationships : Abstract: Millions of users form emotional attachments to AI companions like Character.AI, Replika, and ChatGPT. When these relationships end through model updates, safety interventions, or platform s...
- BadSNN: Backdoor Attacks on Spiking Neural Networks via Adversarial Spiking Neuron : Abstract: Spiking Neural Networks (SNNs) are energy-efficient counterparts of Deep Neural Networks (DNNs) with high biological plausibility, as information is transmitted through temporal spiking patt...
- Exactly Computing do-Shapley Values : Abstract: Structural Causal Models (SCM) are a powerful framework for describing complicated dynamics across the natural sciences. A particularly elegant way of interpreting SCMs is do-Shapley, a game...
- DSL: Understanding and Improving Softmax Recommender Systems with Competition-Aware Scaling : Abstract: Softmax Loss (SL) is being increasingly adopted for recommender systems (RS) as it has demonstrated better performance, robustness and fairness. Yet in implicit-feedback, a single global tem...
- Multimodal Enhancement of Sequential Recommendation : Abstract: We propose a novel recommender framework, MuSTRec (Multimodal and Sequential Transformer-based Recommendation), that unifies multimodal and sequential recommendation paradigms. MuSTRec captu...
- What is Safety? Corporate Discourse, Power, and the Politics of Generative AI Safety : Abstract: This work examines how leading generative artificial intelligence companies construct and communicate the concept of "safety" through public-facing documents. Drawing on critical discourse a...
- Deep Reinforcement Learning for Interference Suppression in RIS-Aided Space-Air-Ground Integrated Networks : Abstract: Future 6G networks envision ubiquitous connectivity through space-air-ground integrated networks (SAGINs), where high-altitude platform stations (HAPSs) and satellites complement terrestrial...
- Hybrid Deep Learning Framework for CSI-Based Activity Recognition in Bandwidth-Constrained Wi-Fi Sensing : Abstract: This paper presents a novel hybrid deep learning framework designed to enhance the robustness of CSI-based Human Activity Recognition (HAR) within bandwidth-constrained Wi-Fi sensing environ...
- Empowering Affected Individuals to Shape AI Fairness Assessments: Processes, Criteria, and Tools : Abstract: AI systems are increasingly used in high-stakes domains such as credit rating, where fairness concerns are critical. Existing fairness assessments are typically conducted by AI experts or re...
- A New Mode of Teaching Chinese as a Foreign Language from the Perspective of Smart System Studied by Using Rongzhixue : Abstract: The purpose of this study is to introduce a new model of teaching Chinese as a foreign language from the perspective of integrating wisdom. Its characteristics are as follows: focusing on th...
- SurfAge-Net: A Hierarchical Surface-Based Network for Interpretable Fine-Grained Brain Age Prediction : Abstract: Brain age prediction serves as a powerful framework for assessing brain status and detecting deviations associated with neurodevelopmental and neurodegenerative disorders. However, most exis...
- Adaptive Temporal Dynamics for Personalized Emotion Recognition: A Liquid Neural Network Approach : Abstract: Emotion recognition from physiological signals remains challenging due to their non-stationary, noisy, and subject-dependent characteristics. This work presents, to the best of our knowledge...
- Hierarchical JEPA Meets Predictive Remote Control in Beyond 5G Networks : Abstract: In wireless networked control systems, ensuring timely and reliable state updates from distributed devices to remote controllers is essential for robust control performance. However, when mu...
- Multi-Scale Temporal Homeostasis Enables Efficient and Robust Neural Networks : Abstract: Artificial neural networks achieve strong performance on benchmark tasks but remain fundamentally brittle under perturbations, limiting their deployment in real-world settings. In contrast, ...
- Learning Alzheimer's Disease Signatures by bridging EEG with Spiking Neural Networks and Biophysical Simulations : Abstract: As the prevalence of Alzheimer's disease (AD) rises, improving mechanistic insight from non-invasive biomarkers is increasingly critical. Recent work suggests that circuit-level brain altera...
- MAU-GPT: Enhancing Multi-type Industrial Anomaly Understanding via Anomaly-aware and Generalist Experts Adaptation : Abstract: As industrial manufacturing scales, automating fine-grained product image analysis has become critical for quality control. However, existing approaches are hindered by limited dataset cover...
- A General Model for Retinal Segmentation and Quantification : Abstract: Retinal imaging is fast, non-invasive, and widely available, offering quantifiable structural and vascular signals for ophthalmic and systemic health assessment. This accessibility creates a...
- Steering to Say No: Configurable Refusal via Activation Steering in Vision Language Models : Abstract: With the rapid advancement of Vision Language Models (VLMs), refusal mechanisms have become a critical component for ensuring responsible and safe model behavior. However, existing refusal s...
- Vectra: A New Metric, Dataset, and Model for Visual Quality Assessment in E-Commerce In-Image Machine Translation : Abstract: In-Image Machine Translation (IIMT) powers cross-border e-commerce product listings; existing research focuses on machine translation evaluation, while visual rendering quality is critical f...
- XAI-CLIP: ROI-Guided Perturbation Framework for Explainable Medical Image Segmentation in Multimodal Vision-Language Models : Abstract: Medical image segmentation is a critical component of clinical workflows, enabling accurate diagnosis, treatment planning, and disease monitoring. However, despite the superior performance o...
- AI for Sustainable Data Protection and Fair Algorithmic Management in Environmental Regulation : Abstract: Integration of AI into environmental regulation represents a significant advancement in data management. It offers promising results in both data protection plus algorithmic fairness. This r...
- Behavioral Consistency Validation for LLM Agents: An Analysis of Trading-Style Switching through Stock-Market Simulation : Abstract: Recent works have increasingly applied Large Language Models (LLMs) as agents in financial stock market simulations to test if micro-level behaviors aggregate into macro-level phenomena. How...
- The Geometry of Representational Failures in Vision Language Models : Abstract: Vision-Language Models (VLMs) exhibit puzzling failures in multi-object visual tasks, such as hallucinating non-existent elements or failing to identify the most similar objects among distra...
- Modality Gap-Driven Subspace Alignment Training Paradigm For Multimodal Large Language Models : Abstract: Despite the success of multimodal contrastive learning in aligning visual and linguistic representations, a persistent geometric anomaly, the Modality Gap, remains: embeddings of distinct mo...
- A Comparative Study of Adversarial Robustness in CNN and CNN-ANFIS Architectures : Abstract: Convolutional Neural Networks (CNNs) achieve strong image classification performance but lack interpretability and are vulnerable to adversarial attacks. Neuro-fuzzy hybrids such as DCNFIS r...
- Lagged backward-compatible physics-informed neural networks for unsaturated soil consolidation analysis : Abstract: This study develops a Lagged Backward-Compatible Physics-Informed Neural Network (LBC-PINN) for simulating and inverting one-dimensional unsaturated soil consolidation under long-term loadin...
- MENASpeechBank: A Reference Voice Bank with Persona-Conditioned Multi-Turn Conversations for AudioLLMs : Abstract: Audio large language models (AudioLLMs) enable instruction-following over speech and general audio, but progress is increasingly limited by the lack of diverse, conversational, instruction-a...
- Stochastic Spiking Neuron Based SNN Can be Inherently Bayesian : Abstract: Uncertainty in biological neural systems appears to be computationally beneficial rather than detrimental. However, in neuromorphic computing systems, device variability often limits perform...
- When Excellence Stops Producing Knowledge: A Practitioner's Observation on Research Funding : Abstract: After almost four decades of participating in competitive research funding -- as applicant, coordinator, evaluator, and panel member -- I have come to see a structural paradox: many particip...
- PipeMFL-240K: A Large-scale Dataset and Benchmark for Object Detection in Pipeline Magnetic Flux Leakage Imaging : Abstract: Pipeline integrity is critical to industrial safety and environmental protection, with Magnetic Flux Leakage (MFL) detection being a primary non-destructive testing technology. Despite the p...
- VLRS-Bench: A Vision-Language Reasoning Benchmark for Remote Sensing : Abstract: Recent advancements in Multimodal Large Language Models (MLLMs) have enabled complex reasoning. However, existing remote sensing (RS) benchmarks remain heavily biased toward perception tasks...
- Interpreting Physics in Video World Models : Abstract: A long-standing question in physical reasoning is whether video-based models need to rely on factorized representations of physical variables in order to make physically accurate predictions...
- Neural Sentinel: Unified Vision Language Model (VLM) for License Plate Recognition with Human-in-the-Loop Continual Learning : Abstract: Traditional Automatic License Plate Recognition (ALPR) systems employ multi-stage pipelines consisting of object detection networks followed by separate Optical Character Recognition (OCR) m...
- MTS-CSNet: Multiscale Tensor Factorization for Deep Compressive Sensing on RGB Images : Abstract: Deep learning based compressive sensing (CS) methods typically learn sampling operators using convolutional or block wise fully connected layers, which limit receptive fields and scale poorl...
- FADE: Selective Forgetting via Sparse LoRA and Self-Distillation : Abstract: Machine Unlearning aims to remove the influence of specific data or concepts from trained models while preserving overall performance, a capability increasingly required by data protection r...
- Assessing Reproducibility in Evolutionary Computation: A Case Study using Human- and LLM-based Assessment : Abstract: Reproducibility is an important requirement in evolutionary computation, where results largely depend on computational experiments. In practice, reproducibility relies on how algorithms, exp...
- TACIT: Transformation-Aware Capturing of Implicit Thought : Abstract: We present TACIT (Transformation-Aware Capturing of Implicit Thought), a diffusion-based transformer for interpretable visual reasoning. Unlike language-based reasoning systems, TACIT operat...
- Video-based Music Generation : Abstract: As the volume of video content on the internet grows rapidly, finding a suitable soundtrack remains a significant challenge. This thesis presents EMSYNC (EMotion and SYNChronization), a fast...
- MRI Cross-Modal Synthesis: A Comparative Study of Generative Models for T1-to-T2 Reconstruction : Abstract: MRI cross-modal synthesis involves generating images from one acquisition protocol using another, offering considerable clinical value by reducing scan time while maintaining diagnostic info...
- Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution : Abstract: Diffusion-based super-resolution can synthesize rich details, but models trained on synthetic paired data often fail on real-world LR images due to distribution shifts. We propose Bird-SR, a...
- Artificial Intelligence in Open Source Software Engineering: A Foundation for Sustainability : Abstract: Open-source software (OSS) is foundational to modern digital infrastructure, yet this context for group work continues to struggle to ensure sufficient contributions in many critical cases. ...
- Pro-ZD: A Transferable Graph Neural Network Approach for Proactive Zero-Day Threats Mitigation : Abstract: In today's enterprise network landscape, the combination of perimeter and distributed firewall rules governs connectivity. To address challenges arising from increased traffic and diverse ne...
- LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning : Abstract: Chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) in natural language to perform complex reasoning. However, chemical reasoning is inherently contin...
- CALM: Class-Conditional Sparse Attention Vectors for Large Audio-Language Models : Abstract: Large audio-language models (LALMs) exhibit strong zero-shot capabilities in multiple downstream tasks, such as audio question answering (AQA) and abstract reasoning; however, these models s...
- The Optimal Token Baseline: Variance Reduction for Long-Horizon LLM-RL : Abstract: Reinforcement Learning (RL) for Large Language Models (LLMs) often suffers from training collapse in long-horizon tasks due to exploding gradient variance. To mitigate this, a baseline is co...
- CodeCircuit: Toward Inferring LLM-Generated Code Correctness via Attribution Graphs : Abstract: Current paradigms for code verification rely heavily on external mechanisms-such as execution-based unit tests or auxiliary LLM judges-which are often labor-intensive or limited by the judgi...
- Federated Prompt-Tuning with Heterogeneous and Incomplete Multimodal Client Data : Abstract: This paper introduces a generalized federated prompt-tuning framework for practical scenarios where local datasets are multi-modal and exhibit different distributional patterns of missing fe...
- MosaicThinker: On-Device Visual Spatial Reasoning for Embodied AI via Iterative Construction of Space Representation : Abstract: When embodied AI is expanding from traditional object detection and recognition to more advanced tasks of robot manipulation and actuation planning, visual spatial reasoning from the video i...
- Rethinking Scientific Modeling: Toward Physically Consistent and Simulation-Executable Programmatic Generation : Abstract: Structural modeling is a fundamental component of computational engineering science, in which even minor physical inconsistencies or specification violations may invalidate downstream simula...
- AbFlow : End-to-end Paratope-Centric Antibody Design by Interaction Enhanced Flow Matching : Abstract: Antigen-antibody binding is a critical process in the immune response. Although recent progress has advanced antibody design, current methods lack a generative framework for end-to-end model...
- QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining : Abstract: Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve ...
- Evaluating Retrieval-Augmented Generation Variants for Natural Language-Based SQL and API Call Generation : Abstract: Enterprise systems increasingly require natural language interfaces that can translate user requests into structured operations such as SQL queries and REST API calls. While large language m...
- Electron-Informed Coarse-Graining Molecular Representation Learning for Real-World Molecular Physics : Abstract: Various representation learning methods for molecular structures have been devised to accelerate data-driven chemistry. However, the representation capabilities of existing methods are essen...
- Concept-Aware Privacy Mechanisms for Defending Embedding Inversion Attacks : Abstract: Text embeddings enable numerous NLP applications but face severe privacy risks from embedding inversion attacks, which can expose sensitive attributes or reconstruct raw text. Existing diffe...
- Lemon Agent Technical Report : Abstract: Recent advanced LLM-powered agent systems have exhibited their remarkable capabilities in tackling complex, long-horizon tasks. Nevertheless, they still suffer from inherent limitations in r...
- WorldEdit: Towards Open-World Image Editing with a Knowledge-Informed Benchmark : Abstract: Recent advances in image editing models have demonstrated remarkable capabilities in executing explicit instructions, such as attribute manipulation, style transfer, and pose synthesis. Howe...
- RealFin: How Well Do LLMs Reason About Finance When Users Leave Things Unsaid? : Abstract: Reliable financial reasoning requires knowing not only how to answer, but also when an answer cannot be justified. In real financial practice, problems often rely on implicit assumptions tha...
- MemFly: On-the-Fly Memory Optimization via Information Bottleneck : Abstract: Long-term memory enables large language model agents to tackle complex tasks through historical interactions. However, existing frameworks encounter a fundamental dilemma between compressing...
- GCN-MPPR: Enhancing the Propagation of Message Passing Neural Networks via Motif-Based Personalized PageRank : Abstract: The algorithms based on message passing neural networks (MPNNs) on graphs have recently achieved great success for various graph applications. However, studies find that these methods always...
- MedCoG: Maximizing LLM Inference Density in Medical Reasoning via Meta-Cognitive Regulation : Abstract: Large Language Models (LLMs) have shown strong potential in complex medical reasoning yet face diminishing gains under inference scaling laws. While existing studies augment LLMs with variou...
- Selective Fine-Tuning for Targeted and Robust Concept Unlearning : Abstract: Text guided diffusion models are used by millions of users, but can be easily exploited to produce harmful content. Concept unlearning methods aim at reducing the models' likelihood of gener...
- MePo: Meta Post-Refinement for Rehearsal-Free General Continual Learnin : Abstract: To cope with uncertain changes of the external world, intelligent systems must continually learn from complex, evolving environments and respond in real time. This ability, collectively know...
- IV Co-Scientist: Multi-Agent LLM Framework for Causal Instrumental Variable Discovery : Abstract: In the presence of confounding between an endogenous variable and the outcome, instrumental variables (IVs) are used to isolate the causal effect of the endogenous variable. Identifying vali...
- LOCA-bench: Benchmarking Language Agents Under Controllable and Extreme Context Growth : Abstract: Large language models (LLMs) are increasingly capable of carrying out long-running, real-world tasks. However, as the amount of context grows, their reliability often deteriorates, a phenome...
- Accelerating Social Science Research via Agentic Hypothesization and Experimentation : Abstract: Data-driven social science research is inherently slow, relying on iterative cycles of observation, hypothesis generation, and experimental validation. While recent data-driven methods promi...
- Towards Adaptive, Scalable, and Robust Coordination of LLM Agents: A Dynamic Ad-Hoc Networking Perspective : Abstract: Multi-agent architectures built on large language models (LLMs) have demonstrated the potential to realize swarm intelligence through well-crafted collaboration. However, the substantial bur...
- Small Agent Group is the Future of Digital Health : Abstract: The rapid adoption of large language models (LLMs) in digital health has been driven by a "scaling-first" philosophy, i.e., the assumption that clinical intelligence increases with model siz...
- Structure-Aware Robust Counterfactual Explanations via Conditional Gaussian Network Classifiers : Abstract: Counterfactual explanation (CE) is a core technique in explainable artificial intelligence (XAI), widely used to interpret model decisions and suggest actionable alternatives. This work pres...
- Free(): Learning to Forget in Malloc-Only Reasoning Models : Abstract: Reasoning models enhance problem-solving by scaling test-time compute, yet they face a critical paradox: excessive thinking tokens often degrade performance rather than improve it. We attrib...
- Graph-Enhanced Deep Reinforcement Learning for Multi-Objective Unrelated Parallel Machine Scheduling : Abstract: The Unrelated Parallel Machine Scheduling Problem (UPMSP) with release dates, setups, and eligibility constraints presents a significant multi-objective challenge. Traditional methods strugg...
- Securing Dual-Use Pathogen Data of Concern : Abstract: Training data is an essential input into creating competent artificial intelligence (AI) models. AI models for biology are trained on large volumes of data, including data related to biologi...
- Objective Decoupling in Social Reinforcement Learning: Recovering Ground Truth from Sycophantic Majorities : Abstract: Contemporary AI alignment strategies rely on a fragile premise: that human feedback, while noisy, remains a fundamentally truthful signal. In this paper, we identify this assumption as Dogma...
- Interpretable Failure Analysis in Multi-Agent Reinforcement Learning Systems : Abstract: Multi-Agent Reinforcement Learning (MARL) is increasingly deployed in safety-critical domains, yet methods for interpretable failure detection and attribution remain underdeveloped. We intro...
- Initial Risk Probing and Feasibility Testing of Glow: a Generative AI-Powered Dialectical Behavior Therapy Skills Coach for Substance Use Recovery and HIV Prevention : Abstract: Background: HIV and substance use represent interacting epidemics with shared psychological drivers - impulsivity and maladaptive coping. Dialectical behavior therapy (DBT) targets these mec...
- RECUR: Resource Exhaustion Attack via Recursive-Entropy Guided Counterfactual Utilization and Reflection : Abstract: Large Reasoning Models (LRMs) employ reasoning to address complex tasks. Such explicit reasoning requires extended context lengths, resulting in substantially higher resource consumption. Pr...
- Weak-Driven Learning: How Weak Agents make Strong Agents Stronger : Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields di...
- InfiCoEvalChain: A Blockchain-Based Decentralized Framework for Collaborative LLM Evaluation : Abstract: The rapid advancement of large language models (LLMs) demands increasingly reliable evaluation, yet current centralized evaluation suffers from opacity, overfitting, and hardware-induced var...
- PTS-SNN: A Prompt-Tuned Temporal Shift Spiking Neural Networks for Efficient Speech Emotion Recognition : Abstract: Speech Emotion Recognition (SER) is widely deployed in Human-Computer Interaction, yet the high computational cost of conventional models hinders their implementation on resource-constrained...
- Do MLLMs Really See It: Reinforcing Visual Attention in Multimodal LLMs : Abstract: While chain-of-thought (CoT) reasoning has substantially improved multimodal large language models (MLLMs) on complex reasoning tasks, existing approaches largely rely on long textual reason...
- G-LNS: Generative Large Neighborhood Search for LLM-Based Automatic Heuristic Design : Abstract: While Large Language Models (LLMs) have recently shown promise in Automated Heuristic Design (AHD), existing approaches typically formulate AHD around constructive priority rules or paramete...
- SynthAgent: A Multi-Agent LLM Framework for Realistic Patient Simulation -- A Case Study in Obesity with Mental Health Comorbidities : Abstract: Simulating high-fidelity patients offers a powerful avenue for studying complex diseases while addressing the challenges of fragmented, biased, and privacy-restricted real-world data. In thi...
- Puda: Private User Dataset Agent for User-Sovereign and Privacy-Preserving Personalized AI : Abstract: Personal data centralization among dominant platform providers including search engines, social networking services, and e-commerce has created siloed ecosystems that restrict user sovereign...
- Toward Formalizing LLM-Based Agent Designs through Structural Context Modeling and Semantic Dynamics Analysis : Abstract: Current research on large language model (LLM) agents is fragmented: discussions of conceptual frameworks and methodological principles are frequently intertwined with low-level implementati...
- The Vibe-Automation of Automation: A Proactive Education Framework for Computer Science in the Age of Generative AI : Abstract: The emergence of generative artificial intelligence (GenAI) represents not an incremental technological advance but a qualitative epistemological shift that challenges foundational assumptio...
- Moral Sycophancy in Vision Language Models : Abstract: Sycophancy in Vision-Language Models (VLMs) refers to their tendency to align with user opinions, often at the expense of moral or factual accuracy. While prior studies have explored sycopha...
- Who Deserves the Reward? SHARP: Shapley Credit-based Optimization for Multi-Agent System : Abstract: Integrating Large Language Models (LLMs) with external tools via multi-agent systems offers a promising new paradigm for decomposing and solving complex problems. However, training these sys...
- CoTZero: Annotation-Free Human-Like Vision Reasoning via Hierarchical Synthetic CoT : Abstract: Recent advances in vision-language models (VLMs) have markedly improved image-text alignment, yet they still fall short of human-like visual reasoning. A key limitation is that many VLMs rel...
- Effect-Level Validation for Causal Discovery : Abstract: Causal discovery is increasingly applied to large-scale telemetry data to estimate the effects of user-facing interventions, yet its reliability for decision-making in feedback-driven system...
- OPE: Overcoming Information Saturation in Parallel Thinking via Outline-Guided Path Exploration : Abstract: Parallel thinking has emerged as a new paradigm for large reasoning models (LRMs) in tackling complex problems. Recent methods leverage Reinforcement Learning (RL) to enhance parallel thinki...
- Towards Better Evolution Modeling for Temporal Knowledge Graphs : Abstract: Temporal knowledge graphs (TKGs) structurally preserve evolving human knowledge. Recent research has focused on designing models to learn the evolutionary nature of TKGs to predict future fa...
- Does Your Reasoning Model Implicitly Know When to Stop Thinking? : Abstract: Recent advancements in large reasoning models (LRMs) have greatly improved their capabilities on complex reasoning tasks through Long Chains of Thought (CoTs). However, this approach often r...
- Circuit Representations of Random Forests with Applications to XAI : Abstract: We make three contributions in this paper. First, we present an approach for compiling a random forest classifier into a set of circuits, where each circuit directly encodes the instances in...
- MemAdapter: Fast Alignment across Agent Memory Paradigms via Generative Subgraph Retrieval : Abstract: Memory mechanism is a core component of LLM-based agents, enabling reasoning and knowledge discovery over long-horizon contexts. Existing agent memory systems are typically designed within i...
- Grounding Generative Planners in Verifiable Logic: A Hybrid Architecture for Trustworthy Embodied AI : Abstract: Large Language Models (LLMs) show promise as planners for embodied AI, but their stochastic nature lacks formal reasoning, preventing strict safety guarantees for physical deployment. Curren...
- SCOUT-RAG: Scalable and Cost-Efficient Unifying Traversal for Agentic Graph-RAG over Distributed Domains : Abstract: Graph-RAG improves LLM reasoning using structured knowledge, yet conventional designs rely on a centralized knowledge graph. In distributed and access-restricted settings (e.g., hospitals or...
- On Protecting Agentic Systems' Intellectual Property via Watermarking : Abstract: The evolution of Large Language Models (LLMs) into agentic systems that perform autonomous reasoning and tool use has created significant intellectual property (IP) value. We demonstrate tha...
- From Assistant to Double Agent: Formalizing and Benchmarking Attacks on OpenClaw for Personalized Local AI Agent : Abstract: Although large language model (LLM)-based agents, exemplified by OpenClaw, are increasingly evolving from task-oriented systems into personalized AI assistants for solving complex real-world...
- When Evaluation Becomes a Side Channel: Regime Leakage and Structural Mitigations for Alignment Assessment : Abstract: Safety evaluation for advanced AI systems implicitly assumes that behavior observed under evaluation is predictive of behavior in deployment. This assumption becomes fragile for agents with ...
- TreeTensor: Boost AI System on Nested Data with Constrained Tree-Like Tensor : Abstract: Tensor is the most basic and essential data structure of nowadays artificial intelligence (AI) system. The natural properties of Tensor, especially the memory-continuity and slice-independen...
- Reinforcement Inference: Leveraging Uncertainty for Self-Correcting Language Model Reasoning : Abstract: Modern large language models (LLMs) are often evaluated and deployed under a \emph{one-shot, greedy} inference protocol, especially in professional settings that require deterministic behavi...
- Dialogue Model Optimization via Agent Game and Adaptive Tree-based GRPO : Abstract: Open-ended dialogue agents aim to deliver engaging, personalized interactions by adapting to users' traits, but existing methods face critical limitations: over-reliance on pre-collected use...
- PRISM: A Principled Framework for Multi-Agent Reasoning via Gain Decomposition : Abstract: Multi-agent collaboration has emerged as a promising paradigm for enhancing reasoning capabilities of Large Language Models (LLMs). However, existing approaches remain largely heuristic, lac...
- An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture : Abstract: Global Workspace Theory (GWT), inspired by cognitive neuroscience, posits that flexible cognition could arise via the attentional selection of a relevant subset of modalities within a multim...
- OSCAR: Optimization-Steered Agentic Planning for Composed Image Retrieval : Abstract: Composed image retrieval (CIR) requires complex reasoning over heterogeneous visual and textual constraints. Existing approaches largely fall into two paradigms: unified embedding retrieval,...
- Debate is efficient with your time : Abstract: AI safety via debate uses two competing models to help a human judge verify complex computational tasks. Previous work has established what problems debate can solve in principle, but has no...
- Why do we Trust Chatbots? From Normative Principles to Behavioral Drivers : Abstract: As chatbots increasingly blur the boundary between automated systems and human conversation, the foundations of trust in these systems warrant closer examination. While regulatory and policy...
- Intermediate Results on the Complexity of STRIPS$_{1}^{1}$ : Abstract: This paper is based on Bylander's results on the computational complexity of propositional STRIPS planning. He showed that when only ground literals are permitted, determining plan existence...
- Exploring SAIG Methods for an Objective Evaluation of XAI : Abstract: The evaluation of eXplainable Artificial Intelligence (XAI) methods is a rapidly growing field, characterized by a wide variety of approaches. This diversity highlights the complexity of the...
- Finite-State Controllers for (Hidden-Model) POMDPs using Deep Reinforcement Learning : Abstract: Solving partially observable Markov decision processes (POMDPs) requires computing policies under imperfect state information. Despite recent advances, the scalability of existing POMDP solv...
- Belief Offloading in Human-AI Interaction : Abstract: What happens when people's beliefs are derived from information provided by an LLM? People's use of LLM chatbots as thought partners can contribute to cognitive offloading, which can have ad...
- Dynamics Within Latent Chain-of-Thought: An Empirical Study of Causal Structure : Abstract: Latent or continuous chain-of-thought methods replace explicit textual rationales with a number of internal latent steps, but these intermediate computations are difficult to evaluate beyond...
- The Use of AI Tools to Develop and Validate Q-Matrices : Abstract: Constructing a Q-matrix is a critical but labor-intensive step in cognitive diagnostic modeling (CDM). This study investigates whether AI tools (i.e., general language models) can support Q-...
- Root Cause Analysis Method Based on Large Language Models with Residual Connection Structures : Abstract: Root cause localization remain challenging in complex and large-scale microservice architectures. The complex fault propagation among microservices and the high dimensionality of telemetry d...
- Negative-Aware Diffusion Process for Temporal Knowledge Graph Extrapolation : Abstract: Temporal Knowledge Graph (TKG) reasoning seeks to predict future missing facts from historical evidence. While diffusion models (DM) have recently gained attention for their ability to captu...
- Learning the Value Systems of Societies with Preference-based Multi-objective Reinforcement Learning : Abstract: Value-aware AI should recognise human values and adapt to the value systems (value-based preferences) of different users. This requires operationalization of values, which can be prone to mi...
- Deciding the Satisfiability of Combined Qualitative Constraint Networks : Abstract: Among the various forms of reasoning studied in the context of artificial intelligence, qualitative reasoning makes it possible to infer new knowledge in the context of imprecise, incomplete...
- Scalable Delphi: Large Language Models for Structured Risk Estimation : Abstract: Quantitative risk assessment in high-stakes domains relies on structured expert elicitation to estimate unobservable properties. The gold standard - the Delphi method - produces calibrated, ...
- Efficient and Stable Reinforcement Learning for Diffusion Language Models : Abstract: Reinforcement Learning (RL) is crucial for unlocking the complex reasoning capabilities of Diffusion-based Large Language Models (dLLMs). However, applying RL to dLLMs faces unique challenge...
- CausalT5K: Diagnosing and Informing Refusal for Trustworthy Causal Reasoning of Skepticism, Sycophancy, Detection-Correction, and Rung Collapse : Abstract: LLM failures in causal reasoning, including sycophancy, rung collapse, and miscalibrated refusal, are well-documented, yet progress on remediation is slow because no benchmark enables system...
- CoRefine: Confidence-Guided Self-Refinement for Adaptive Test-Time Compute : Abstract: Large Language Models (LLMs) often rely on test-time scaling via parallel decoding (for example, 512 samples) to boost reasoning accuracy, but this incurs substantial compute. We introduce C...
- Digital Twin and Agentic AI for Wild Fire Disaster Management: Intelligent Virtual Situation Room : Abstract: According to the United Nations, wildfire frequency and intensity are projected to increase by approximately 14% by 2030 and 30% by 2050 due to global warming, posing critical threats to lif...
- stable-worldmodel-v1: Reproducible World Modeling Research and Evaluation : Abstract: World Models have emerged as a powerful paradigm for learning compact, predictive representations of environment dynamics, enabling agents to reason, plan, and generalize beyond direct exper...
- InternAgent-1.5: A Unified Agentic Framework for Long-Horizon Autonomous Scientific Discovery : Abstract: We introduce InternAgent-1.5, a unified system designed for end-to-end scientific discovery across computational and empirical domains. The system is built on a structured architecture compo...
- iGRPO: Self-Feedback-Driven LLM Reasoning : Abstract: Large Language Models (LLMs) have shown promise in solving complex mathematical problems, yet they still fall short of producing accurate and consistent solutions. Reinforcement Learning (RL...
- Data Science and Technology Towards AGI Part I: Tiered Data Management : Abstract: The development of artificial intelligence can be viewed as an evolution of data-driven learning paradigms, with successive shifts in data organization and utilization continuously driving a...
- GEBench: Benchmarking Image Generation Models as GUI Environments : Abstract: Recent advancements in image generation models have enabled the prediction of future Graphical User Interface (GUI) states based on user instructions. However, existing benchmarks primarily ...
- BERT Learns (and Teaches) Chemistry : Abstract: Modern computational organic chemistry is becoming increasingly data-driven. There remain a large number of important unsolved problems in this area such as product prediction given reactant...
- Leveraging Adaptive Group Negotiation for Heterogeneous Multi-Robot Collaboration with Large Language Models : Abstract: Multi-robot collaboration tasks often require heterogeneous robots to work together over long horizons under spatial constraints and environmental uncertainties. Although Large Language Mode...
- Does Visual Rendering Bypass Tokenization? Investigating Script-Tokenizer Misalignment in Pixel-Based Language Models : Abstract: While pixel-based language modeling aims to bypass the sub-word tokenization bottleneck by rendering text as images, recent multimodal variants such as DualGPT reintroduce text tokenizers to...
- BiomechAgent: AI-Assisted Biomechanical Analysis Through Code-Generating Agents : Abstract: Markerless motion capture is making quantitative movement analysis increasingly accessible, yet analyzing the resulting data remains a barrier for clinicians without programming expertise. W...
- Bridging the Knowledge Void: Inference-time Acquisition of Unfamiliar Programming Languages for Coding Tasks : Abstract: The proficiency of Large Language Models (LLMs) in coding tasks is often a reflection of their extensive pre-training corpora, which typically collapses when confronted with previously unfam...
- LLM-FSM: Scaling Large Language Models for Finite-State Reasoning in RTL Code Generation : Abstract: Finite-state reasoning, the ability to understand and implement state-dependent behavior, is central to hardware design. In this paper, we present LLM-FSM, a benchmark that evaluates how wel...
- ST-Raptor: An Agentic System for Semi-Structured Table QA : Abstract: Semi-structured table question answering (QA) is a challenging task that requires (1) precise extraction of cell contents and positions and (2) accurate recovery of key implicit logical stru...
- DLLM-Searcher: Adapting Diffusion Large Language Model for Search Agents : Abstract: Recently, Diffusion Large Language Models (dLLMs) have demonstrated unique efficiency advantages, enabled by their inherently parallel decoding mechanism and flexible generation paradigm. Me...
- Aster: Autonomous Scientific Discovery over 20x Faster Than Existing Methods : Abstract: We introduce Aster, an AI agent for autonomous scientific discovery capable of operating over 20 times faster than existing frameworks. Given a task, an initial program, and a script to eval...
- Theory of Space: Can Foundation Models Construct Spatial Beliefs through Active Exploration? : Abstract: Spatial embodied intelligence requires agents to act to acquire information under partial observability. While multimodal foundation models excel at passive perception, their capacity for ac...
- ANCHOR: Branch-Point Data Generation for GUI Agents : Abstract: End-to-end GUI agents for real desktop environments require large amounts of high-quality interaction data, yet collecting human demonstrations is expensive and existing synthetic pipelines ...
- PreFlect: From Retrospective to Prospective Reflection in Large Language Model Agents : Abstract: Advanced large language model agents typically adopt self-reflection for improving performance, where agents iteratively analyze past actions to correct errors. However, existing reflective ...
- Is there "Secret Sauce'' in Large Language Model Development? : Abstract: Do leading LLM developers possess a proprietary ``secret sauce'', or is LLM performance driven by scaling up compute? Using training and benchmark data for 809 models released between 2022 a...
- From Out-of-Distribution Detection to Hallucination Detection: A Geometric View : Abstract: Detecting hallucinations in large language models is a critical open problem with significant implications for safety and reliability. While existing hallucination detection methods achieve ...
- Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective : Abstract: As AI systems grow more capable and autonomous, ensuring their safety and reliability requires not only model-level alignment but also strategic oversight of the humans and institutions invo...
- BRIDGE: Predicting Human Task Completion Time From Model Performance : Abstract: Evaluating the real-world capabilities of AI systems requires grounding benchmark performance in human-interpretable measures of task difficulty. Existing approaches that rely on direct huma...
- TermiGen: High-Fidelity Environment and Robust Trajectory Synthesis for Terminal Agents : Abstract: Executing complex terminal tasks remains a significant challenge for open-weight LLMs, constrained by two fundamental limitations. First, high-fidelity, executable training environments are ...
- Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs : Abstract: Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a singl...
- Adaptive Scaffolding for Cognitive Engagement in an Intelligent Tutoring System : Abstract: The ICAP framework defines four cognitive engagement levels: Passive, Active, Constructive, and Interactive, where increased cognitive engagement can yield improved learning. However, person...
- RAPiD: Real-time Deterministic Trajectory Planning via Diffusion Behavior Priors for Safe and Efficient Autonomous Driving : Abstract: Diffusion-based trajectory planners have demonstrated strong capability for modeling the multimodal nature of human driving behavior, but their reliance on iterative stochastic sampling pose...
- SupChain-Bench: Benchmarking Large Language Models for Real-World Supply Chain Management : Abstract: Large language models (LLMs) have shown promise in complex reasoning and tool-based decision making, motivating their application to real-world supply chain management. However, supply chain...
- W&D:Scaling Parallel Tool Calling for Efficient Deep Research Agents : Abstract: Deep research agents have emerged as powerful tools for automating complex intellectual tasks through multi-step reasoning and web-based information seeking. While recent efforts have succes...
- NAAMSE: Framework for Evolutionary Security Evaluation of Agents : Abstract: AI agents are increasingly deployed in production, yet their security evaluations remain bottlenecked by manual red-teaming or static benchmarks that fail to model adaptive, multi-turn adver...
- VGAS: Value-Guided Action-Chunk Selection for Few-Shot Vision-Language-Action Adaptation : Abstract: Vision--Language--Action (VLA) models bridge multimodal reasoning with physical control, but adapting them to new tasks with scarce demonstrations remains unreliable. While fine-tuned VLA po...
- Progressive Multi-Agent Reasoning for Biological Perturbation Prediction : Abstract: Predicting gene regulation responses to biological perturbations requires reasoning about underlying biological causalities. While large language models (LLMs) show promise for such tasks, t...
- Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution : Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear whe...
- The Moltbook Illusion: Separating Human Influence from Emergent Behavior in AI Agent Societies : Abstract: When AI agents on the social platform Moltbook appeared to develop consciousness, found religions, and declare hostility toward humanity, the phenomenon attracted global media attention and ...
- Are Reasoning LLMs Robust to Interventions on Their Chain-of-Thought? : Abstract: Reasoning LLMs (RLLMs) generate step-by-step chains of thought (CoTs) before giving an answer, which improves performance on complex tasks and makes reasoning more transparent. But how robus...
- Computing the Reachability Value of Posterior-Deterministic POMDPs : Abstract: Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs...
- GraphAgents: Knowledge Graph-Guided Agentic AI for Cross-Domain Materials Design : Abstract: Large Language Models (LLMs) promise to accelerate discovery by reasoning across the expanding scientific landscape. Yet, the challenge is no longer access to information but connecting it i...
- Joint Reward Modeling: Internalizing Chain-of-Thought for Efficient Visual Reward Models : Abstract: Reward models are critical for reinforcement learning from human feedback, as they determine the alignment quality and reliability of generative models. For complex tasks such as image editi...
- MSP-LLM: A Unified Large Language Model Framework for Complete Material Synthesis Planning : Abstract: Material synthesis planning (MSP) remains a fundamental and underexplored bottleneck in AI-driven materials discovery, as it requires not only identifying suitable precursor materials but al...
- When Is Enough Not Enough? Illusory Completion in Search Agents : Abstract: Recent search agents leverage multi-turn reasoning and search tools to achieve strong performance on multi-hop and long-horizon benchmarks. Yet it remains unclear whether they reliably reaso...
- VERIFY-RL: Verifiable Recursive Decomposition for Reinforcement Learning in Mathematical Reasoning : Abstract: Training language models to solve complex mathematical problems benefits from curriculum learning progressively training on simpler subproblems. However, existing decomposition methods are o...
- M2A: Multimodal Memory Agent with Dual-Layer Hybrid Memory for Long-Term Personalized Interactions : Abstract: This work addresses the challenge of personalized question answering in long-term human-machine interactions: when conversational history spans weeks or months and exceeds the context window...
- SleepMaMi: A Universal Sleep Foundation Model for Integrating Macro- and Micro-structures : Abstract: While the shift toward unified foundation models has revolutionized many deep learning domains, sleep medicine remains largely restricted to task-specific models that focus on localized micr...
- Efficient Table Retrieval and Understanding with Multimodal Large Language Models : Abstract: Tabular data is frequently captured in image form across a wide range of real-world scenarios such as financial reports, handwritten records, and document scans. These visual representations...
- ONTrust: A Reference Ontology of Trust : Abstract: Trust has stood out more than ever in the light of recent innovations. Some examples are advances in artificial intelligence that make machines more and more humanlike, and the introduction ...
- EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge : Abstract: Demand forecasting is a cornerstone of e-commerce operations, directly impacting inventory planning and fulfillment scheduling. However, existing forecasting systems often fail during high-i...
- Geo-Code: A Code Framework for Reverse Code Generation from Geometric Images Based on Two-Stage Multi-Agent Evolution : Abstract: Program code serves as a bridge linking vision and logic, providing a feasible supervisory approach for enhancing the multimodal reasoning capability of large models through geometric operat...
- Humanizing AI Grading: Student-Centered Insights on Fairness, Trust, Consistency and Transparency : Abstract: This study investigates students' perceptions of Artificial Intelligence (AI) grading systems in an undergraduate computer science course (n = 27), focusing on a block-based programming fina...
- Learning to Continually Learn via Meta-learning Agentic Memory Designs : Abstract: The statelessness of foundation models bottlenecks agentic systems' ability to continually learn, a core capability for long-horizon reasoning and adaptation. To address this limitation, age...
- Disentangled Instrumental Variables for Causal Inference with Networked Observational Data : Abstract: Instrumental variables (IVs) are crucial for addressing unobservable confounders, yet their stringent exogeneity assumptions pose significant challenges in networked data. Existing methods t...
- Do Multi-Agents Dream of Electric Screens? Achieving Perfect Accuracy on AndroidWorld Through Task Decomposition : Abstract: We present Minitap, a multi-agent system that achieves 100% success on the AndroidWorld benchmark, the first to fully solve all 116 tasks and surpassing human performance (80%). We first ana...
- Data Darwinism Part I: Unlocking the Value of Scientific Data for Pre-training : Abstract: Data quality determines foundation model performance, yet systematic processing frameworks are lacking. We introduce Data Darwinism, a ten-level taxonomy (L0-L9) that conceptualizes data-mod...
- Time Series Reasoning via Process-Verifiable Thinking Data Synthesis and Scheduling for Tailored LLM Reasoning : Abstract: Time series is a pervasive data type across various application domains, rendering the reasonable solving of diverse time series tasks a long-standing goal. Recent advances in large language...
- LQA: A Lightweight Quantized-Adaptive Framework for Vision-Language Models on the Edge : Abstract: Deploying Vision-Language Models (VLMs) on edge devices is challenged by resource constraints and performance degradation under distribution shifts. While test-time adaptation (TTA) can coun...
- Emergent Misalignment is Easy, Narrow Misalignment is Hard : Abstract: Finetuning large language models on narrowly harmful datasets can cause them to become emergently misaligned, giving stereotypically `evil' responses across diverse unrelated settings. Conce...
- ToolSelf: Unifying Task Execution and Self-Reconfiguration via Tool-Driven Intrinsic Adaptation : Abstract: Agentic systems powered by Large Language Models (LLMs) have demonstrated remarkable potential in tackling complex, long-horizon tasks. However, their efficacy is fundamentally constrained b...
Research Sources: 961 | Generated: 2/10/2026
