AI RESEARCH PAPERS & ACADEMIC SOURCES
- From Pre- to Intra-operative MRI: Predicting Brain Shift in Temporal Lobe Resection for Epilepsy Surgery : Abstract: Introduction: In neurosurgery, image-guided Neurosurgery Systems (IGNS) highly rely on preoperative brain magnetic resonance images (MRI) to assist surgeons in locating surgical targets and ...
- 3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation : Abstract: Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals. However, 2D poses rigidly bi...
- Progressive Checkerboards for Autoregressive Multiscale Image Generation : Abstract: A key challenge in autoregressive image generation is to efficiently sample independent locations in parallel, while still modeling mutual dependencies with serial conditioning. Some recent ...
- Continuous Control of Editing Models via Adaptive-Origin Guidance : Abstract: Diffusion-based editing models have emerged as a powerful tool for semantic image and video manipulation. However, existing models lack a mechanism for smoothly controlling the intensity of ...
- EventNeuS: 3D Mesh Reconstruction from a Single Event Camera : Abstract: Event cameras offer a considerable alternative to RGB cameras in many scenarios. While there are recent works on event-based novel-view synthesis, dense 3D mesh reconstruction remains scarce...
- Super-r\'esolution non supervis\'ee d'images hyperspectrales de t\'el\'ed\'etection utilisant un entra\^inement enti\`erement synth\'etique : Abstract: Hyperspectral single image super-resolution (SISR) aims to enhance spatial resolution while preserving the rich spectral information of hyperspectral images. Most existing methods rely on su...
- EchoJEPA: A Latent Predictive Foundation Model for Echocardiography : Abstract: Foundation models for echocardiography promise to reduce annotation burden and improve diagnostic consistency by learning generalizable representations from large unlabeled video archives. H...
- Perfusion Imaging and Single Material Reconstruction in Polychromatic Photon Counting CT : Abstract: Background: Perfusion computed tomography (CT) images the dynamics of a contrast agent through the body over time, and is one of the highest X-ray dose scans in medical imaging. Recently, a ...
- Physics-based generation of multilayer corneal OCT data via Gaussian modeling and MCML for AI-driven diagnostic and surgical guidance applications : Abstract: Training deep learning models for corneal optical coherence tomography (OCT) imaging is limited by the availability of large, well-annotated datasets. We present a configurable Monte Carlo s...
- Real-time topology-aware M-mode OCT segmentation for robotic deep anterior lamellar keratoplasty (DALK) guidance : Abstract: Robotic deep anterior lamellar keratoplasty (DALK) requires accurate real time depth feedback to approach Descemet's membrane (DM) without perforation. M-mode intraoperative optical coherenc...
- WebSplatter: Enabling Cross-Device Efficient Gaussian Splatting in Web Browsers via WebGPU : Abstract: We present WebSplatter, an end-to-end GPU rendering pipeline for the heterogeneous web ecosystem. Unlike naive ports, WebSplatter introduces a wait-free hierarchical radix sort that circumve...
- Time Is All It Takes: Spike-Retiming Attacks on Event-Driven Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) compute with discrete spikes and exploit temporal structure, yet most adversarial attacks change intensities or event counts instead of timing. We study a timi...
- Pi-GS: Sparse-View Gaussian Splatting with Dense {\pi}^3 Initialization : Abstract: Novel view synthesis has evolved rapidly, advancing from Neural Radiance Fields to 3D Gaussian Splatting (3DGS), which offers real-time rendering and rapid training without compromising visu...
- PlanTRansformer: Unified Prediction and Planning with Goal-conditioned Transformer : Abstract: Trajectory prediction and planning are fundamental yet disconnected components in autonomous driving. Prediction models forecast surrounding agent motion under unknown intentions, producing ...
- Origin Lens: A Privacy-First Mobile Framework for Cryptographic Image Provenance and AI Detection : Abstract: The proliferation of generative AI poses challenges for information integrity assurance, requiring systems that connect model governance with end-user verification. We present Origin Lens, a...
- HetroD: A High-Fidelity Drone Dataset and Benchmark for Autonomous Driving in Heterogeneous Traffic : Abstract: We present HetroD, a dataset and benchmark for developing autonomous driving systems in heterogeneous environments. HetroD targets the critical challenge of navi- gating real-world heterogen...
- AffordanceGrasp-R1:Leveraging Reasoning-Based Affordance Segmentation with Reinforcement Learning for Robotic Grasping : Abstract: We introduce AffordanceGrasp-R1, a reasoning-driven affordance segmentation framework for robotic grasping that combines a chain-of-thought (CoT) cold-start strategy with reinforcement learn...
- MVP-LAM: Learning Action-Centric Latent Action via Cross-Viewpoint Reconstruction : Abstract: Learning \emph{latent actions} from diverse human videos enables scaling robot learning beyond embodiment-specific robot datasets, and these latent actions have recently been used as pseudo-...
- BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks : Abstract: Embodied world models have emerged as a promising paradigm in robotics, most of which leverage large-scale Internet videos or pretrained video generation models to enrich visual and motion p...
- Split&Splat: Zero-Shot Panoptic Segmentation via Explicit Instance Modeling and 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting (GS) enables fast and high-quality scene reconstruction, but it lacks an object-consistent and semantically aware structure. We propose Split&Splat, a framework for pan...
- Deep-learning-based pan-phenomic data reveals the explosive evolution of avian visual disparity : Abstract: The evolution of biological morphology is critical for understanding the diversity of the natural world, yet traditional analyses often involve subjective biases in the selection and coding ...
- PrevizWhiz: Combining Rough 3D Scenes and 2D Video to Guide Generative Video Previsualization : Abstract: In pre-production, filmmakers and 3D animation experts must rapidly prototype ideas to explore a film's possibilities before fullscale production, yet conventional approaches involve trade-o...
- HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition : Abstract: Scene Text Recognition (STR) is challenging in extracting effective character representations from visual data when text is unreadable. Permutation language modeling (PLM) is introduced to r...
- Saliency-Guided DETR for Moment Retrieval and Highlight Detection : Abstract: Existing approaches for video moment retrieval and highlight detection are not able to align text and video features efficiently, resulting in unsatisfying performance and limited production...
- Rectification Reimagined: A Unified Mamba Model for Image Correction and Rectangling with Prompts : Abstract: Image correction and rectangling are valuable tasks in practical photography systems such as smartphones. Recent remarkable advancements in deep learning have undeniably brought about substa...
- Creative Image Generation with Diffusion Models : Abstract: Creative image generation has emerged as a compelling area of research, driven by the need to produce novel and high-quality images that expand the boundaries of imagination. In this work, w...
- AdaptMMBench: Benchmarking Adaptive Multimodal Reasoning for Mode Selection and Reasoning Process : Abstract: Adaptive multimodal reasoning has emerged as a promising frontier in Vision-Language Models (VLMs), aiming to dynamically modulate between tool-augmented visual reasoning and text reasoning ...
- End-to-end reconstruction of OCT optical properties and speckle-reduced structural intensity via physics-based learning : Abstract: Inverse scattering in optical coherence tomography (OCT) seeks to recover both structural images and intrinsic tissue optical properties, including refractive index, scattering coefficient, ...
- SVD-ViT: Does SVD Make Vision Transformers Attend More to the Foreground? : Abstract: Vision Transformers (ViT) have been established as large-scale foundation models. However, because self-attention operates globally, they lack an explicit mechanism to distinguish foreground...
- Self-Supervised Uncalibrated Multi-View Video Anonymization in the Operating Room : Abstract: Privacy preservation is a prerequisite for using video data in Operating Room (OR) research. Effective anonymization relies on the exhaustive localization of every individual; even a single ...
- ViThinker: Active Vision-Language Reasoning via Dynamic Perceptual Querying : Abstract: Chain-of-Thought (CoT) reasoning excels in language models but struggles in vision-language models due to premature visual-to-text conversion that discards continuous information such as geo...
- FaceLinkGen: Rethinking Identity Leakage in Privacy-Preserving Face Recognition with Identity Extraction : Abstract: Transformation-based privacy-preserving face recognition (PPFR) aims to verify identities while hiding facial data from attackers and malicious service providers. Existing evaluations mostly...
- SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation : Abstract: Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism....
- TRACE: Temporal Radiology with Anatomical Change Explanation for Grounded X-ray Report Generation : Abstract: Temporal comparison of chest X-rays is fundamental to clinical radiology, enabling detection of disease progression, treatment response, and new findings. While vision-language models have a...
- Dynamic High-frequency Convolution for Infrared Small Target Detection : Abstract: Infrared small targets are typically tiny and locally salient, which belong to high-frequency components (HFCs) in images. Single-frame infrared small target (SIRST) detection is challenging...
- Fisheye Stereo Vision: Depth and Range Error : Abstract: This study derives analytical expressions for the depth and range error of fisheye stereo vision systems as a function of object distance, specifically accounting for accuracy at large angle...
- SceneLinker: Compositional 3D Scene Generation via Semantic Scene Graph from RGB Sequences : Abstract: We introduce SceneLinker, a novel framework that generates compositional 3D scenes via semantic scene graph from RGB sequences. To adaptively experience Mixed Reality (MR) content based on e...
- SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation : Abstract: Novel view synthesis of dynamic scenes is fundamental to achieving photorealistic 4D reconstruction and immersive visual experiences. Recent progress in Gaussian-based representations has si...
- Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation : Abstract: Reinforcement learning has emerged as a principled post-training paradigm for Temporal Video Grounding (TVG) due to its on-policy optimization, yet existing GRPO-based methods remain fundame...
- Thinking inside the Convolution for Image Inpainting: Reconstructing Texture via Structure under Global and Local Side : Abstract: Image inpainting has earned substantial progress, owing to the encoder-and-decoder pipeline, which is benefited from the Convolutional Neural Networks (CNNs) with convolutional downsampling ...
- A Vision-Based Analysis of Congestion Pricing in New York City : Abstract: We examine the impact of New York City's congestion pricing program through automated analysis of traffic camera data. Our computer vision pipeline processes footage from over 900 cameras di...
- MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration : Abstract: Generating long-form audio-visual stories from a short user prompt remains challenging due to an intent-execution gap, where high-level narrative intent must be preserved across coherent, sh...
- Bongards at the Boundary of Perception and Reasoning: Programs or Language? : Abstract: Vision-Language Models (VLMs) have made great strides in everyday visual tasks, such as captioning a natural image, or answering commonsense questions about such images. But humans possess t...
- HP-GAN: Harnessing pretrained networks for GAN improvement with FakeTwins and discriminator consistency : Abstract: Generative Adversarial Networks (GANs) have made significant progress in enhancing the quality of image synthesis. Recent methods frequently leverage pretrained networks to calculate percept...
- IVC-Prune: Revealing the Implicit Visual Coordinates in LVLMs for Vision Token Pruning : Abstract: Large Vision-Language Models (LVLMs) achieve impressive performance across multiple tasks. A significant challenge, however, is their prohibitive inference cost when processing high-resoluti...
- JRDB-Pose3D: A Multi-person 3D Human Pose and Shape Estimation Dataset for Robotics : Abstract: Real-world scenes are inherently crowded. Hence, estimating 3D poses of all nearby humans, tracking their movements over time, and understanding their activities within social and environmen...
- Finding Optimal Video Moment without Training: Gaussian Boundary Optimization for Weakly Supervised Video Grounding : Abstract: Weakly supervised temporal video grounding aims to localize query-relevant segments in untrimmed videos using only video-sentence pairs, without requiring ground-truth segment annotations th...
- A generalizable large-scale foundation model for musculoskeletal radiographs : Abstract: Artificial intelligence (AI) has shown promise in detecting and characterizing musculoskeletal diseases from radiographs. However, most existing models remain task-specific, annotation-depen...
- Gromov Wasserstein Optimal Transport for Semantic Correspondences : Abstract: Establishing correspondences between image pairs is a long studied problem in computer vision. With recent large-scale foundation models showing strong zero-shot performance on downstream ta...
- Beyond Cropping and Rotation: Automated Evolution of Powerful Task-Specific Augmentations with Generative Models : Abstract: Data augmentation has long been a cornerstone for reducing overfitting in vision models, with methods like AutoAugment automating the design of task-specific augmentations. Recent advances i...
- Flexible Geometric Guidance for Probabilistic Human Pose Estimation with Diffusion Models : Abstract: 3D human pose estimation from 2D images is a challenging problem due to depth ambiguity and occlusion. Because of these challenges the task is underdetermined, where there exists multiple --...
- FinMTM: A Multi-Turn Multimodal Benchmark for Financial Reasoning and Agent Evaluation : Abstract: The financial domain poses substantial challenges for vision-language models (VLMs) due to specialized chart formats and knowledge-intensive reasoning requirements. However, existing financi...
- SwiftVLM: Efficient Vision-Language Model Inference via Cross-Layer Token Bypass : Abstract: Visual token pruning is a promising approach for reducing the computational cost of vision-language models (VLMs), and existing methods often rely on early pruning decisions to improve effic...
- FSOD-VFM: Few-Shot Object Detection with Vision Foundation Models and Graph Diffusion : Abstract: In this paper, we present FSOD-VFM: Few-Shot Object Detectors with Vision Foundation Models, a framework that leverages vision foundation models to tackle the challenge of few-shot object de...
- Diversity-Preserved Distribution Matching Distillation for Fast Visual Synthesis : Abstract: Distribution matching distillation (DMD) aligns a multi-step generator with its few-step counterpart to enable high-quality generation under low inference cost. However, DMD tends to suffer ...
- Human-in-the-loop Adaptation in Group Activity Feature Learning for Team Sports Video Retrieval : Abstract: This paper proposes human-in-the-loop adaptation for Group Activity Feature Learning (GAFL) without group activity annotations. This human-in-the-loop adaptation is employed in a group-activ...
- BinaryDemoire: Moir\'e-Aware Binarization for Image Demoir\'eing : Abstract: Image demoiréing aims to remove structured moiré artifacts in recaptured imagery, where degradations are highly frequency-dependent and vary across scales and directions. While recent deep n...
- LSGQuant: Layer-Sensitivity Guided Quantization for One-Step Diffusion Real-World Video Super-Resolution : Abstract: One-Step Diffusion Models have demonstrated promising capability and fast inference in video super-resolution (VSR) for real-world. Nevertheless, the substantial model size and high computat...
- From Single Scan to Sequential Consistency: A New Paradigm for LIDAR Relocalization : Abstract: LiDAR relocalization aims to estimate the global 6-DoF pose of a sensor in the environment. However, existing regression-based approaches are prone to dynamic or ambiguous scenarios, as they...
- Hand3R: Online 4D Hand-Scene Reconstruction in the Wild : Abstract: For Embodied AI, jointly reconstructing dynamic hands and the dense scene context is crucial for understanding physical interaction. However, most existing methods recover isolated hands in ...
- VIRAL: Visual In-Context Reasoning via Analogy in Diffusion Transformers : Abstract: Replicating In-Context Learning (ICL) in computer vision remains challenging due to task heterogeneity. We propose \textbf{VIRAL}, a framework that elicits visual reasoning from a pre-traine...
- ConsisDrive: Identity-Preserving Driving World Models for Video Generation by Instance Mask : Abstract: Autonomous driving relies on robust models trained on large-scale, high-quality multi-view driving videos. Although world models provide a cost-effective solution for generating realistic dr...
- FARTrack: Fast Autoregressive Visual Tracking with High Performance : Abstract: Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, mak...
- PokeFusion Attention: Enhancing Reference-Free Style-Conditioned Generation : Abstract: This paper studies reference-free style-conditioned character generation in text-to-image diffusion models, where high-quality synthesis requires both stable character structure and consiste...
- Spiral RoPE: Rotate Your Rotary Positional Embeddings in the 2D Plane : Abstract: Rotary Position Embedding (RoPE) is the de facto positional encoding in large language models due to its ability to encode relative positions and support length extrapolation. When adapted t...
- EventFlash: Towards Efficient MLLMs for Event-Based Vision : Abstract: Event-based multimodal large language models (MLLMs) enable robust perception in high-speed and low-light scenarios, addressing key limitations of frame-based MLLMs. However, current event-b...
- InstaDrive: Instance-Aware Driving World Models for Realistic and Consistent Video Generation : Abstract: Autonomous driving relies on robust models trained on high-quality, large-scale multi-view driving videos. While world models offer a cost-effective solution for generating realistic driving...
- LaVPR: Benchmarking Language and Vision for Place Recognition : Abstract: Visual Place Recognition (VPR) often fails under extreme environmental changes and perceptual aliasing. Furthermore, standard systems cannot perform "blind" localization from verbal descript...
- Global Geometry Is Not Enough for Vision Representations : Abstract: A common assumption in representation learning is that globally well-distributed embeddings support robust and generalizable representations. This focus has shaped both training objectives a...
- A3-TTA: Adaptive Anchor Alignment Test-Time Adaptation for Image Segmentation : Abstract: Test-Time Adaptation (TTA) offers a practical solution for deploying image segmentation models under domain shift without accessing source data or retraining. Among existing TTA strategies, ...
- LEVIO: Lightweight Embedded Visual Inertial Odometry for Resource-Constrained Devices : Abstract: Accurate, infrastructure-less sensor systems for motion tracking are essential for mobile robotics and augmented reality (AR) applications. The most popular state-of-the-art visual-inertial ...
- Full end-to-end diagnostic workflow automation of 3D OCT via foundation model-driven AI for retinal diseases : Abstract: Optical coherence tomography (OCT) has revolutionized retinal disease diagnosis with its high-resolution and three-dimensional imaging nature, yet its full diagnostic automation in clinical ...
- PQTNet: Pixel-wise Quantitative Thermography Neural Network for Estimating Defect Depth in Polylactic Acid Parts by Additive Manufacturing : Abstract: Defect depth quantification in additively manufactured (AM) components remains a significant challenge for non-destructive testing (NDT). This study proposes a Pixel-wise Quantitative Thermo...
- Invisible Clean-Label Backdoor Attacks for Generative Data Augmentation : Abstract: With the rapid advancement of image generative models, generative data augmentation has become an effective way to enrich training images, especially when only small-scale datasets are avail...
- MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning : Abstract: Medical image segmentation is evolving from task-specific models toward generalizable frameworks. Recent research leverages Multi-modal Large Language Models (MLLMs) as autonomous agents, em...
- PWAVEP: Purifying Imperceptible Adversarial Perturbations in 3D Point Clouds via Spectral Graph Wavelets : Abstract: Recent progress in adversarial attacks on 3D point clouds, particularly in achieving spatial imperceptibility and high attack performance, presents significant challenges for defenders. Curr...
- Composable Visual Tokenizers with Generator-Free Diagnostics of Learnability : Abstract: We introduce CompTok, a training framework for learning visual tokenizers whose tokens are enhanced for compositionality. CompTok uses a token-conditioned diffusion decoder. By employing an ...
- Z3D: Zero-Shot 3D Visual Grounding from Images : Abstract: 3D visual grounding (3DVG) aims to localize objects in a 3D scene based on natural language queries. In this work, we explore zero-shot 3DVG from multi-view images alone, without requiring a...
- Multi-Resolution Alignment for Voxel Sparsity in Camera-Based 3D Semantic Scene Completion : Abstract: Camera-based 3D semantic scene completion (SSC) offers a cost-effective solution for assessing the geometric occupancy and semantic labels of each voxel in the surrounding 3D scene with imag...
- SLIM-Diff: Shared Latent Image-Mask Diffusion with Lp loss for Data-Scarce Epilepsy FLAIR MRI : Abstract: Focal cortical dysplasia (FCD) lesions in epilepsy FLAIR MRI are subtle and scarce, making joint image--mask generative modeling prone to instability and memorization. We propose SLIM-Diff, ...
- Unifying Watermarking via Dimension-Aware Mapping : Abstract: Deep watermarking methods often share similar encoder-decoder architectures, yet differ substantially in their functional behaviors. We propose DiM, a new multi-dimensional watermarking fram...
- Seeing Through the Chain: Mitigate Hallucination in Multimodal Reasoning Models via CoT Compression and Contrastive Preference Optimization : Abstract: While multimodal reasoning models (MLRMs) have exhibited impressive capabilities, they remain prone to hallucinations, and effective solutions are still underexplored. In this paper, we expe...
- UnHype: CLIP-Guided Hypernetworks for Dynamic LoRA Unlearning : Abstract: Recent advances in large-scale diffusion models have intensified concerns about their potential misuse, particularly in generating realistic yet harmful or socially disruptive content. This ...
- Socratic-Geo: Synthetic Data Generation and Geometric Reasoning via Multi-Agent Interaction : Abstract: Multimodal Large Language Models (MLLMs) have significantly advanced vision-language understanding. However, even state-of-the-art models struggle with geometric reasoning, revealing a criti...
- ConsistentRFT: Reducing Visual Hallucinations in Flow-based Reinforcement Fine-Tuning : Abstract: Reinforcement Fine-Tuning (RFT) on flow-based models is crucial for preference alignment. However, they often introduce visual hallucinations like over-optimized details and semantic misalig...
- Hierarchical Concept-to-Appearance Guidance for Multi-Subject Image Generation : Abstract: Multi-subject image generation aims to synthesize images that faithfully preserve the identities of multiple reference subjects while following textual instructions. However, existing method...
- Contextualized Visual Personalization in Vision-Language Models : Abstract: Despite recent progress in vision-language models (VLMs), existing approaches often fail to generate personalized responses based on the user's specific experiences, as they lack the ability...
- Inlier-Centric Post-Training Quantization for Object Detection Models : Abstract: Object detection is pivotal in computer vision, yet its immense computational demands make deployment slow and power-hungry, motivating quantization. However, task-irrelevant morphologies su...
- Semantic Routing: Exploring Multi-Layer LLM Feature Weighting for Diffusion Transformers : Abstract: Recent DiT-based text-to-image models increasingly adopt LLMs as text encoders, yet text conditioning remains largely static and often utilizes only a single LLM layer, despite pronounced se...
- Interpretable Logical Anomaly Classification via Constraint Decomposition and Instruction Fine-Tuning : Abstract: Logical anomalies are violations of predefined constraints on object quantity, spatial layout, and compositional relationships in industrial images. While prior work largely treats anomaly d...
- PnP-U3D: Plug-and-Play 3D Framework Bridging Autoregression and Diffusion for Unified Understanding and Generation : Abstract: The rapid progress of large multimodal models has inspired efforts toward unified frameworks that couple understanding and generation. While such paradigms have shown remarkable success in 2...
- Constrained Dynamic Gaussian Splatting : Abstract: While Dynamic Gaussian Splatting enables high-fidelity 4D reconstruction, its deployment is severely hindered by a fundamental dilemma: unconstrained densification leads to excessive memory ...
- Cut to the Mix: Simple Data Augmentation Outperforms Elaborate Ones in Limited Organ Segmentation Datasets : Abstract: Multi-organ segmentation is a widely applied clinical routine and automated organ segmentation tools dramatically improve the pipeline of the radiologists. Recently, deep learning (DL) based...
- ELIQ: A Label-Free Framework for Quality Assessment of Evolving AI-Generated Images : Abstract: Generative text-to-image models are advancing at an unprecedented pace, continuously shifting the perceptual quality ceiling and rendering previously collected labels unreliable for newer ge...
- SlowFocus: Enhancing Fine-grained Temporal Understanding in Video LLM : Abstract: Large language models (LLMs) have demonstrated exceptional capabilities in text understanding, which has paved the way for their expansion into video LLMs (Vid-LLMs) to analyze video data. H...
- High-Resolution Underwater Camouflaged Object Detection: GBU-UCOD Dataset and Topology-Aware and Frequency-Decoupled Networks : Abstract: Underwater Camouflaged Object Detection (UCOD) is a challenging task due to the extreme visual similarity between targets and backgrounds across varying marine depths. Existing methods often...
- TIPS Over Tricks: Simple Prompts for Effective Zero-shot Anomaly Detection : Abstract: Anomaly detection identifies departures from expected behavior in safety-critical settings. When target-domain normal data are unavailable, zero-shot anomaly detection (ZSAD) leverages visio...
- Refer-Agent: A Collaborative Multi-Agent System with Reasoning and Reflection for Referring Video Object Segmentation : Abstract: Referring Video Object Segmentation (RVOS) aims to segment objects in videos based on textual queries. Current methods mainly rely on large-scale supervised fine-tuning (SFT) of Multi-modal ...
- A Lightweight Library for Energy-Based Joint-Embedding Predictive Architectures : Abstract: We present EB-JEPA, an open-source library for learning representations and world models using Joint-Embedding Predictive Architectures (JEPAs). JEPAs learn to predict in representation spac...
- KTV: Keyframes and Key Tokens Selection for Efficient Training-Free Video LLMs : Abstract: Training-free video understanding leverages the strong image comprehension capabilities of pre-trained vision language models (VLMs) by treating a video as a sequence of static frames, thus ...
- Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis : Abstract: Retinal diseases spanning a broad spectrum can be effectively identified and diagnosed using complementary signals from multimodal data. However, multimodal diagnosis in ophthalmic practice ...
- Multi-Objective Optimization for Synthetic-to-Real Style Transfer : Abstract: Semantic segmentation networks require large amounts of pixel-level annotated data, which are costly to obtain for real-world images. Computer graphics engines can generate synthetic images ...
- SPWOOD: Sparse Partial Weakly-Supervised Oriented Object Detection : Abstract: A consistent trend throughout the research of oriented object detection has been the pursuit of maintaining comparable performance with fewer and weaker annotations. This is particularly cru...
- MM-SCALE: Grounded Multimodal Moral Reasoning via Scalar Judgment and Listwise Alignment : Abstract: Vision-Language Models (VLMs) continue to struggle to make morally salient judgments in multimodal and socially ambiguous contexts. Prior works typically rely on binary or pairwise supervisi...
- Referring Industrial Anomaly Segmentation : Abstract: Industrial Anomaly Detection (IAD) is vital for manufacturing, yet traditional methods face significant challenges: unsupervised approaches yield rough localizations requiring manual thresho...
- RegionReasoner: Region-Grounded Multi-Round Visual Reasoning : Abstract: Large vision-language models have achieved remarkable progress in visual reasoning, yet most existing systems rely on single-step or text-only reasoning, limiting their ability to iterativel...
- Edge-Optimized Vision-Language Models for Underground Infrastructure Assessment : Abstract: Autonomous inspection of underground infrastructure, such as sewer and culvert systems, is critical to public safety and urban sustainability. Although robotic platforms equipped with visual...
- LIVE: Long-horizon Interactive Video World Modeling : Abstract: Autoregressive video world models predict future visual observations conditioned on actions. While effective over short horizons, these models often struggle with long-horizon generation, as...
- See-through: Single-image Layer Decomposition for Anime Characters : Abstract: We introduce a framework that automates the transformation of static anime illustrations into manipulatable 2.5D models. Current professional workflows require tedious manual segmentation an...
- Zero-shot large vision-language model prompting for automated bone identification in paleoradiology x-ray archives : Abstract: Paleoradiology, the use of modern imaging technologies to study archaeological and anthropological remains, offers new windows on millennial scale patterns of human health. Unfortunately, th...
- Test-Time Conditioning with Representation-Aligned Visual Features : Abstract: While representation alignment with self-supervised models has been shown to improve diffusion model training, its potential for enhancing inference-time conditioning remains largely unexplo...
- RAWDet-7: A Multi-Scenario Benchmark for Object Detection and Description on Quantized RAW Images : Abstract: Most vision models are trained on RGB images processed through ISP pipelines optimized for human perception, which can discard sensor-level information useful for machine reasoning. RAW imag...
- FOVI: A biologically-inspired foveated interface for deep vision models : Abstract: Human vision is foveated, with variable resolution peaking at the center of a large field of view; this reflects an efficient trade-off for active sensing, allowing eye-movements to bring di...
- QVLA: Not All Channels Are Equal in Vision-Language-Action Model's Quantization : Abstract: The advent of Vision-Language-Action (VLA) models represents a significant leap for embodied intelligence, yet their immense computational demands critically hinder deployment on resource-co...
- SAES-SVD: Self-Adaptive Suppression of Accumulated and Local Errors for SVD-based LLM Compression : Abstract: The rapid growth in the parameter scale of large language models (LLMs) has created a high demand for efficient compression techniques. As a hardware-agnostic and highly compatible technique...
- ReMiT: RL-Guided Mid-Training for Iterative LLM Evolution : Abstract: Standard training pipelines for large language models (LLMs) are typically unidirectional, progressing from pre-training to post-training. However, the potential for a bidirectional process-...
- AERO: Autonomous Evolutionary Reasoning Optimization via Endogenous Dual-Loop Feedback : Abstract: Large Language Models (LLMs) have achieved significant success in complex reasoning but remain bottlenecked by reliance on expert-annotated data and external verifiers. While existing self-e...
- Test-time Recursive Thinking: Self-Improvement without External Feedback : Abstract: Modern Large Language Models (LLMs) have shown rapid improvements in reasoning capabilities, driven largely by reinforcement learning (RL) with verifiable rewards. Here, we ask whether these...
- Task--Specificity Score: Measuring How Much Instructions Really Matter for Supervision : Abstract: Instruction tuning is now the default way to train and adapt large language models, but many instruction--input--output pairs are only weakly specified: for a given input, the same output ca...
- The Mask of Civility: Benchmarking Chinese Mock Politeness Comprehension in Large Language Models : Abstract: From a pragmatic perspective, this study systematically evaluates the differences in performance among representative large language models (LLMs) in recognizing politeness, impoliteness, an...
- ChemPro: A Progressive Chemistry Benchmark for Large Language Models : Abstract: We introduce ChemPro, a progressive benchmark with 4100 natural language question-answer pairs in Chemistry, across 4 coherent sections of difficulty designed to assess the proficiency of La...
- One Model, All Roles: Multi-Turn, Multi-Agent Self-Play Reinforcement Learning for Conversational Social Intelligence : Abstract: This paper introduces OMAR: One Model, All Roles, a reinforcement learning framework that enables AI to develop social intelligence through multi-turn, multi-agent conversational self-play. ...
- Short Chains, Deep Thoughts: Balancing Reasoning Efficiency and Intra-Segment Capability via Split-Merge Optimization : Abstract: While Large Reasoning Models (LRMs) have demonstrated impressive capabilities in solving complex tasks through the generation of long reasoning chains, this reliance on verbose generation re...
- FASA: Frequency-aware Sparse Attention : Abstract: The deployment of Large Language Models (LLMs) faces a critical bottleneck when handling lengthy inputs: the prohibitive memory footprint of the Key Value (KV) cache. To address this bottlen...
- Privasis: Synthesizing the Largest "Public" Private Dataset from Scratch : Abstract: Research involving privacy-sensitive data has always been constrained by data scarcity, standing in sharp contrast to other areas that have benefited from data scaling. This challenge is bec...
- ATACompressor: Adaptive Task-Aware Compression for Efficient Long-Context Processing in LLMs : Abstract: Long-context inputs in large language models (LLMs) often suffer from the "lost in the middle" problem, where critical information becomes diluted or ignored due to excessive length. Context...
- POP: Prefill-Only Pruning for Efficient Large Model Inference : Abstract: Large Language Models (LLMs) and Vision-Language Models (VLMs) have demonstrated remarkable capabilities. However, their deployment is hindered by significant computational costs. Existing s...
- MIRROR: A Multi-Agent Framework with Iterative Adaptive Revision and Hierarchical Retrieval for Optimization Modeling in Operations Research : Abstract: Operations Research (OR) relies on expert-driven modeling-a slow and fragile process ill-suited to novel scenarios. While large language models (LLMs) can automatically translate natural lan...
- PEGRL: Improving Machine Translation by Post-Editing Guided Reinforcement Learning : Abstract: Reinforcement learning (RL) has shown strong promise for LLM-based machine translation, with recent methods such as GRPO demonstrating notable gains; nevertheless, translation-oriented RL re...
- Pursuing Best Industrial Practices for Retrieval-Augmented Generation in the Medical Domain : Abstract: While retrieval augmented generation (RAG) has been swiftly adopted in industrial applications based on large language models (LLMs), there is no consensus on what are the best practices for...
- Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective : Abstract: Proprietary large language models (LLMs) embody substantial economic value and are generally exposed only as black-box APIs, yet adversaries can still exploit their outputs to extract knowle...
- Verified Critical Step Optimization for LLM Agents : Abstract: As large language model agents tackle increasingly complex long-horizon tasks, effective post-training becomes critical. Prior work faces fundamental challenges: outcome-only rewards fail to...
- FactNet: A Billion-Scale Knowledge Graph for Multilingual Factual Grounding : Abstract: While LLMs exhibit remarkable fluency, their utility is often compromised by factual hallucinations and a lack of traceable provenance. Existing resources for grounding mitigate this but typ...
- A-RAG: Scaling Agentic Retrieval-Augmented Generation via Hierarchical Retrieval Interfaces : Abstract: Frontier language models have demonstrated strong reasoning and long-horizon tool-use capabilities. However, existing RAG systems fail to leverage these capabilities. They still rely on two ...
- Preferences for Idiomatic Language are Acquired Slowly -- and Forgotten Quickly: A Case Study on Swedish : Abstract: In this study, we investigate how language models develop preferences for \textit{idiomatic} as compared to \textit{linguistically acceptable} Swedish, both during pretraining and when adapt...
- Learning to Reason Faithfully through Step-Level Faithfulness Maximization : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has markedly improved the performance of Large Language Models (LLMs) on tasks requiring multi-step reasoning. However, most RLVR pipeli...
- SEAD: Self-Evolving Agent for Multi-Turn Service Dialogue : Abstract: Large Language Models have demonstrated remarkable capabilities in open-domain dialogues. However, current methods exhibit suboptimal performance in service dialogues, as they rely on noisy,...
- Assessing the Impact of Typological Features on Multilingual Machine Translation in the Age of Large Language Models : Abstract: Despite major advances in multilingual modeling, large quality disparities persist across languages. Besides the obvious impact of uneven training resources, typological properties have also...
- HySparse: A Hybrid Sparse Attention Architecture with Oracle Token Selection and KV Cache Sharing : Abstract: This work introduces Hybrid Sparse Attention (HySparse), a new architecture that interleaves each full attention layer with several sparse attention layers. While conceptually simple, HySpar...
- ACL: Aligned Contrastive Learning Improves BERT and Multi-exit BERT Fine-tuning : Abstract: Despite its success in self-supervised learning, contrastive learning is less studied in the supervised setting. In this work, we first use a set of pilot experiments to show that in the sup...
- Use Graph When It Needs: Efficiently and Adaptively Integrating Retrieval-Augmented Generation with Graphs : Abstract: Large language models (LLMs) often struggle with knowledge-intensive tasks due to hallucinations and outdated parametric knowledge. While Retrieval-Augmented Generation (RAG) addresses this ...
- CL-bench: A Benchmark for Context Learning : Abstract: Current language models (LMs) excel at reasoning over prompts using pre-trained knowledge. However, real-world tasks are far more complex and context-dependent: models must learn from task-s...
- Efficient Algorithms for Partial Constraint Satisfaction Problems over Control-flow Graphs : Abstract: In this work, we focus on the Partial Constraint Satisfaction Problem (PCSP) over control-flow graphs (CFGs) of programs. PCSP serves as a generalization of the well-known Constraint Satisfa...
- Controlling Output Rankings in Generative Engines for LLM-based Search : Abstract: The way customers search for and choose products is changing with the rise of large language models (LLMs). LLM-based search, or generative engines, provides direct product recommendations t...
- Learning Query-Specific Rubrics from Human Preferences for DeepResearch Report Generation : Abstract: Nowadays, training and evaluating DeepResearch-generated reports remain challenging due to the lack of verifiable reward signals. Accordingly, rubric-based evaluation has become a common pra...
- BIRDTurk: Adaptation of the BIRD Text-to-SQL Dataset to Turkish : Abstract: Text-to-SQL systems have achieved strong performance on English benchmarks, yet their behavior in morphologically rich, low-resource languages remains largely unexplored. We introduce BIRDTu...
- RAGTurk: Best Practices for Retrieval Augmented Generation in Turkish : Abstract: Retrieval-Augmented Generation (RAG) enhances LLM factuality, yet design guidance remains English-centric, limiting insights for morphologically rich languages like Turkish. We address this ...
- Instruction Anchors: Dissecting the Causal Dynamics of Modality Arbitration : Abstract: Modality following serves as the capacity of multimodal large language models (MLLMs) to selectively utilize multimodal contexts based on user instructions. It is fundamental to ensuring saf...
- Rethinking the Reranker: Boundary-Aware Evidence Selection for Robust Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) systems remain brittle under realistic retrieval noise, even when the required evidence appears in the top-K results. A key reason is that retrievers and...
- OCRTurk: A Comprehensive OCR Benchmark for Turkish : Abstract: Document parsing is now widely used in applications, such as large-scale document digitization, retrieval-augmented generation, and domain-specific pipelines in healthcare and education. Ben...
- Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language Models : Abstract: Recent advances in large language models (LLMs) have made automated multiple-choice question (MCQ) generation increasingly feasible; however, reliably producing items that satisfy controlled...
- OmniRAG-Agent: Agentic Omnimodal Reasoning for Low-Resource Long Audio-Video Question Answering : Abstract: Long-horizon omnimodal question answering answers questions by reasoning over text, images, audio, and video. Despite recent progress on OmniLLMs, low-resource long audio-video QA still suff...
- Beyond Tokens: Semantic-Aware Speculative Decoding for Efficient Inference by Probing Internal States : Abstract: Large Language Models (LLMs) achieve strong performance across many tasks but suffer from high inference latency due to autoregressive decoding. The issue is exacerbated in Large Reasoning M...
- No Shortcuts to Culture: Indonesian Multi-hop Question Answering for Complex Cultural Understanding : Abstract: Understanding culture requires reasoning across context, tradition, and implicit social knowledge, far beyond recalling isolated facts. Yet most culturally focused question answering (QA) be...
- Training Multi-Turn Search Agent via Contrastive Dynamic Branch Sampling : Abstract: Agentic reinforcement learning has enabled large language models to perform complex multi-turn planning and tool use. However, learning in long-horizon settings remains challenging due to sp...
- CUBO: Self-Contained Retrieval-Augmented Generation on Consumer Laptops 10 GB Corpora, 16 GB RAM, Single-Device Deployment : Abstract: Organizations handling sensitive documents face a tension: cloud-based AI risks GDPR violations, while local systems typically require 18-32 GB RAM. This paper presents CUBO, a systems-orien...
- Context Compression via Explicit Information Transmission : Abstract: Long-context inference with Large Language Models (LLMs) is costly due to quadratic attention and growing key-value caches, motivating context compression. In this work, we study soft contex...
- They Said Memes Were Harmless-We Found the Ones That Hurt: Decoding Jokes, Symbols, and Cultural References : Abstract: Meme-based social abuse detection is challenging because harmful intent often relies on implicit cultural symbolism and subtle cross-modal incongruence. Prior approaches, from fusion-based m...
- Accelerating Scientific Research with Gemini: Case Studies and Common Techniques : Abstract: Recent advances in large language models (LLMs) have opened new avenues for accelerating scientific research. While models are increasingly capable of assisting with routine tasks, their abi...
- Parallel-Probe: Towards Efficient Parallel Thinking via 2D Probing : Abstract: Parallel thinking has emerged as a promising paradigm for reasoning, yet it imposes significant computational burdens. Existing efficiency methods primarily rely on local, per-trajectory sig...
- Beyond Translation: Cross-Cultural Meme Transcreation with Vision-Language Models : Abstract: Memes are a pervasive form of online communication, yet their cultural specificity poses significant challenges for cross-cultural adaptation. We study cross-cultural meme transcreation, a m...
- Social Catalysts, Not Moral Agents: The Illusion of Alignment in LLM Societies : Abstract: The rapid evolution of Large Language Models (LLMs) has led to the emergence of Multi-Agent Systems where collective cooperation is often threatened by the "Tragedy of the Commons." This stu...
- Fine-Tuning Language Models to Know What They Know : Abstract: Metacognition is a critical component of intelligence, specifically regarding the awareness of one's own knowledge. While humans rely on shared internal memory for both answering questions a...
- WAXAL: A Large-Scale Multilingual African Language Speech Corpus : Abstract: The advancement of speech technology has predominantly favored high-resource languages, creating a significant digital divide for speakers of most Sub-Saharan African languages. To address t...
- Scaling Small Agents Through Strategy Auctions : Abstract: Small language models are increasingly viewed as a promising, cost-effective approach to agentic AI, with proponents claiming they are sufficiently capable for agentic workflows. However, wh...
- Aligning Language Model Benchmarks with Pairwise Preferences : Abstract: Language model benchmarks are pervasive and computationally-efficient proxies for real-world performance. However, many recent works find that benchmarks often fail to predict real utility. ...
- A vector logic for intensional formal semantics : Abstract: Formal semantics and distributional semantics are distinct approaches to linguistic meaning: the former models meaning as reference via model-theoretic structures; the latter represents mean...
- N\"uwa: Mending the Spatial Integrity Torn by VLM Token Pruning : Abstract: Vision token pruning has proven to be an effective acceleration technique for the efficient Vision Language Model (VLM). However, existing pruning methods demonstrate excellent performance p...
- WST-X Series: Wavelet Scattering Transform for Interpretable Speech Deepfake Detection : Abstract: Designing front-ends for speech deepfake detectors primarily focuses on two categories. Hand-crafted filterbank features are transparent but are limited in capturing high-level semantic deta...
- RC-GRPO: Reward-Conditioned Group Relative Policy Optimization for Multi-Turn Tool Calling Agents : Abstract: Multi-turn tool calling is challenging for Large Language Models (LLMs) because rewards are sparse and exploration is expensive. A common recipe, SFT followed by GRPO, can stall when within-...
- MAS-ProVe: Understanding the Process Verification of Multi-Agent Systems : Abstract: Multi-Agent Systems (MAS) built on Large Language Models (LLMs) often exhibit high variance in their reasoning trajectories. Process verification, which evaluates intermediate steps in traje...
- From Speech-to-Spatial: Grounding Utterances on A Live Shared View with Augmented Reality : Abstract: We introduce Speech-to-Spatial, a referent disambiguation framework that converts verbal remote-assistance instructions into spatially grounded AR guidance. Unlike prior systems that rely on...
- VALUEFLOW: Toward Pluralistic and Steerable Value-based Alignment in Large Language Models : Abstract: Aligning Large Language Models (LLMs) with the diverse spectrum of human values remains a central challenge: preference-based methods often fail to capture deeper motivational principles. Va...
- Mi\'{c}i Princ -- A Little Boy Teaching Speech Technologies the Chakavian Dialect : Abstract: This paper documents our efforts in releasing the printed and audio book of the translation of the famous novel The Little Prince into the Chakavian dialect, as a computer-readable, AI-ready...
- SWE-Master: Unleashing the Potential of Software Engineering Agents via Post-Training : Abstract: In this technical report, we present SWE-Master, an open-source and fully reproducible post-training framework for building effective software engineering agents. SWE-Master systematically e...
- SWE-World: Building Software Engineering Agents in Docker-Free Environments : Abstract: Recent advances in large language models (LLMs) have enabled software engineering agents to tackle complex code modification tasks. Most existing approaches rely on execution feedback from c...
- Decoupling Skeleton and Flesh: Efficient Multimodal Table Reasoning with Disentangled Alignment and Structure-aware Guidance : Abstract: Reasoning over table images remains challenging for Large Vision-Language Models (LVLMs) due to complex layouts and tightly coupled structure-content information. Existing solutions often de...
- Tutorial on Reasoning for IR & IR for Reasoning : Abstract: Information retrieval has long focused on ranking documents by semantic relatedness. Yet many real-world information needs demand more: enforcement of logical constraints, multi-step inferen...
- Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration : Abstract: Search-integrated reasoning enables language agents to transcend static parametric knowledge by actively querying external sources. However, training these agents via reinforcement learning ...
- Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems : Abstract: While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent r...
- AOrchestra: Automating Sub-Agent Creation for Agentic Orchestration : Abstract: Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has driven the rise of a sub-agent-as-tools paradigm for mu...
- WebSentinel: Detecting and Localizing Prompt Injection Attacks for Web Agents : Abstract: Prompt injection attacks manipulate webpage content to cause web agents to execute attacker-specified tasks instead of the user's intended ones. Existing methods for detecting and localizing...
- FullStack-Agent: Enhancing Agentic Full-Stack Web Coding via Development-Oriented Testing and Repository Back-Translation : Abstract: Assisting non-expert users to develop complex interactive websites has become a popular task for LLM-powered code agents. However, existing code agents tend to only generate frontend web pag...
- AutoFigure: Generating and Refining Publication-Ready Scientific Illustrations : Abstract: High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in b...
- Merged ChemProt-DrugProt for Relation Extraction from Biomedical Literature : Abstract: The extraction of chemical-gene relations plays a pivotal role in understanding the intricate interactions between chemical compounds and genes, with significant implications for drug discov...
- Inferring Scientific Cross-Document Coreference and Hierarchy with Definition-Augmented Relational Reasoning : Abstract: We address the fundamental task of inferring cross-document coreference and hierarchy in scientific texts, which has important applications in knowledge graph construction, search, recommend...
- MemoryFormer: Minimize Transformer Computation by Removing Fully-Connected Layers : Abstract: In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-a...
- Dataset-Driven Channel Masks in Transformers for Multivariate Time Series : Abstract: Recent advancements in foundation models have been successfully extended to the time series (TS) domain, facilitated by the emergence of large-scale TS datasets. However, previous efforts ha...
- Agnostic Learning of Arbitrary ReLU Activation under Gaussian Marginals : Abstract: We consider the problem of learning an arbitrarily-biased ReLU activation (or neuron) over Gaussian marginals with the squared loss objective. Despite the ReLU neuron being the basic buildin...
- Exploring the Global-to-Local Attention Scheme in Graph Transformers: An Empirical Study : Abstract: Graph Transformers (GTs) show considerable potential in graph representation learning. The architecture of GTs typically integrates Graph Neural Networks (GNNs) with global attention mechani...
- Simple Denoising Diffusion Language Models : Abstract: Recent Uniform State Diffusion Models (USDMs), initialized from a uniform prior, offer the promise of fast text generation due to their inherent self-correction ability compared to masked di...
- Textual Equilibrium Propagation for Deep Compound AI Systems : Abstract: Large language models (LLMs) are increasingly deployed as part of compound AI systems that coordinate multiple modules (e.g., retrievers, tools, verifiers) over long-horizon workflows. Recen...
- The Powers of Precision: Structure-Informed Detection in Complex Systems -- From Customer Churn to Seizure Onset : Abstract: Emergent phenomena -- onset of epileptic seizures, sudden customer churn, or pandemic outbreaks -- often arise from hidden causal interactions in complex systems. We propose a machine learni...
- Transferable Graph Condensation from the Causal Perspective : Abstract: The increasing scale of graph datasets has significantly improved the performance of graph representation learning methods, but it has also introduced substantial training challenges. Graph ...
- Convex Loss Functions for Support Vector Machines (SVMs) and Neural Networks : Abstract: We propose a new convex loss for Support Vector Machines, both for the binary classification and for the regression models. Therefore, we show the mathematical derivation of the dual problem...
- Scalable Linearized Laplace Approximation via Surrogate Neural Kernel : Abstract: We introduce a scalable method to approximate the kernel of the Linearized Laplace Approximation (LLA). For this, we use a surrogate deep neural network (DNN) that learns a compact feature r...
- AlignAtt: Using Attention-based Audio-Translation Alignments as a Guide for Simultaneous Speech Translation : Abstract: Attention is the core mechanism of today's most used architectures for natural language processing and has been analyzed from many perspectives, including its effectiveness for machine trans...
- NRR-Phi: Text-to-State Mapping for Ambiguity Preservation in LLM Inference : Abstract: Large language models exhibit a systematic tendency toward early semantic commitment: given ambiguous input, they collapse multiple valid interpretations into a single response before suffic...
- Bias-Reduced Estimation of Finite Mixtures: An Application to Latent Group Structures in Panel Data : Abstract: Finite mixture models are widely used in econometric analyses to capture unobserved heterogeneity. This paper shows that maximum likelihood estimation of finite mixtures of parametric densit...
- SERA: Soft-Verified Efficient Repository Agents : Abstract: Open-weight coding agents should hold a fundamental advantage over closed-source systems: they can be specialized to private codebases, encoding repository-specific information directly in t...
- Linear representations in language models can change dramatically over a conversation : Abstract: Language model representations often contain linear directions that correspond to high-level concepts. Here, we study the dynamics of these representations: how representations evolve along ...
- The Hypocrisy Gap: Quantifying Divergence Between Internal Belief and Chain-of-Thought Explanation via Sparse Autoencoders : Abstract: Large Language Models (LLMs) frequently exhibit unfaithful behavior, producing a final answer that differs significantly from their internal chain of thought (CoT) reasoning in order to appe...
- STEMVerse: A Dual-Axis Diagnostic Framework for STEM Reasoning in Large Language Models : Abstract: As Large Language Models (LLMs) achieve significant breakthroughs in complex reasoning tasks, evaluating their proficiency in science, technology, engineering, and mathematics (STEM) has bec...
- ROSA-Tuning: Enhancing Long-Context Modeling via Suffix Matching : Abstract: Long-context capability and computational efficiency are among the central challenges facing today's large language models. Existing efficient attention methods reduce computational complexi...
- Graph-Augmented Reasoning with Large Language Models for Tobacco Pest and Disease Management : Abstract: This paper proposes a graph-augmented reasoning framework for tobacco pest and disease management that integrates structured domain knowledge into large language models. Building on GraphRAG...
- WideSeek: Advancing Wide Research via Multi-Agent Scaling : Abstract: Search intelligence is evolving from Deep Research to Wide Research, a paradigm essential for retrieving and synthesizing comprehensive information under complex constraints in parallel. How...
- InfMem: Learning System-2 Memory Control for Long-Context Agent : Abstract: Reasoning over ultra-long documents requires synthesizing sparse evidence scattered across distant segments under strict memory constraints. While streaming agents enable scalable processing...
- Predicting first-episode homelessness among US Veterans using longitudinal EHR data: time-varying models and social risk factors : Abstract: Homelessness among US veterans remains a critical public health challenge, yet risk prediction offers a pathway for proactive intervention. In this retrospective prognostic study, we analyze...
- Time-Critical Multimodal Medical Transportation: Organs, Patients, and Medical Supplies : Abstract: Timely transportation of organs, patients, and medical supplies is critical to modern healthcare, particularly in emergencies and transplant scenarios where even short delays can severely im...
- AmharicStoryQA: A Multicultural Story Question Answering Benchmark in Amharic : Abstract: With the growing emphasis on multilingual and cultural evaluation benchmarks for large language models, language and culture are often treated as synonymous, and performance is commonly used...
- When Efficient Communication Explains Convexity : Abstract: Much recent work has argued that the variation in the languages of the world can be explained from the perspective of efficient communication; in particular, languages can be seen as optimal...
- R2-Router: A New Paradigm for LLM Routing with Reasoning : Abstract: As LLMs proliferate with diverse capabilities and costs, LLM routing has emerged by learning to predict each LLM's quality and cost for a given query, then selecting the one with high qualit...
- CATNIP: LLM Unlearning via Calibrated and Tokenized Negative Preference Alignment : Abstract: Pretrained knowledge memorized in LLMs raises critical concerns over safety and privacy, which has motivated LLM Unlearning as a technique for selectively removing the influences of undesira...
- Act or Clarify? Modeling Sensitivity to Uncertainty and Cost in Communication : Abstract: When deciding how to act under uncertainty, agents may choose to act to reduce uncertainty or they may act despite that uncertainty.In communicative settings, an important way of reducing un...
- Which course? Discourse! Teaching Discourse and Generation in the Era of LLMs : Abstract: The field of NLP has undergone vast, continuous transformations over the past few years, sparking debates going beyond discipline boundaries. This begs important questions in education: how ...
- HALT: Hallucination Assessment via Log-probs as Time series : Abstract: Hallucinations remain a major obstacle for large language models (LLMs), especially in safety-critical domains. We present HALT (Hallucination Assessment via Log-probs as Time series), a lig...
- Equal Access, Unequal Interaction: A Counterfactual Audit of LLM Fairness : Abstract: Prior work on fairness in large language models (LLMs) has primarily focused on access-level behaviors such as refusals and safety filtering. However, equitable access does not ensure equita...
- CPMobius: Iterative Coach-Player Reasoning for Data-Free Reinforcement Learning : Abstract: Large Language Models (LLMs) have demonstrated strong potential in complex reasoning, yet their progress remains fundamentally constrained by reliance on massive high-quality human-curated t...
- Monotonicity as an Architectural Bias for Robust Language Models : Abstract: Large language models (LLMs) are known to exhibit brittle behavior under adversarial prompts and jailbreak attacks, even after extensive alignment and fine-tuning. This fragility reflects a ...
- Eidolon: A Practical Post-Quantum Signature Scheme Based on k-Colorability in the Age of Graph Neural Networks : Abstract: We propose Eidolon, a practical post-quantum signature scheme based on the NP-complete k-colorability problem. Our construction generalizes the Goldreich-Micali-Wigderson zero-knowledge prot...
- NSC-SL: A Bandwidth-Aware Neural Subspace Compression for Communication-Efficient Split Learning : Abstract: The expanding scale of neural networks poses a major challenge for distributed machine learning, particularly under limited communication resources. While split learning (SL) alleviates clie...
- Near-Universal Multiplicative Updates for Nonnegative Einsum Factorization : Abstract: Despite the ubiquity of multiway data across scientific domains, there are few user-friendly tools that fit tailored nonnegative tensor factorizations. Researchers may use gradient-based aut...
- From Task Solving to Robust Real-World Adaptation in LLM Agents : Abstract: Large language models are increasingly deployed as specialized agents that plan, call tools, and take actions over extended horizons. Yet many existing evaluations assume a "clean interface"...
- Scaling-Aware Adapter for Structure-Grounded LLM Reasoning : Abstract: Large language models (LLMs) are enabling reasoning over biomolecular structures, yet existing methods remain modality-specific and typically compress structural inputs through sequence-base...
- Evaluating False Alarm and Missing Attacks in CAN IDS : Abstract: Modern vehicles rely on electronic control units (ECUs) interconnected through the Controller Area Network (CAN), making in-vehicle communication a critical security concern. Machine learnin...
- Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks : Abstract: We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete time...
- LmPT: Conditional Point Transformer for Anatomical Landmark Detection on 3D Point Clouds : Abstract: Accurate identification of anatomical landmarks is crucial for various medical applications. Traditional manual landmarking is time-consuming and prone to inter-observer variability, while r...
- Downscaling land surface temperature data using edge detection and block-diagonal Gaussian process regression : Abstract: Accurate and high-resolution estimation of land surface temperature (LST) is crucial in estimating evapotranspiration, a measure of plant water use and a central quantity in agricultural app...
- Beyond Content: Behavioral Policies Reveal Actors in Information Operations : Abstract: The detection of online influence operations -- coordinated campaigns by malicious actors to spread narratives -- has traditionally depended on content analysis or network features. These ap...
- Chain of Simulation: A Dual-Mode Reasoning Framework for Large Language Models with Dynamic Problem Routing : Abstract: We present Chain of Simulation (CoS), a novel dual-mode reasoning framework that dynamically routes problems to specialized reasoning strategies in Large Language Models (LLMs). Unlike exist...
- IMAGINE: Intelligent Multi-Agent Godot-based Indoor Networked Exploration : Abstract: The exploration of unknown, Global Navigation Satellite System (GNSS) denied environments by an autonomous communication-aware and collaborative group of Unmanned Aerial Vehicles (UAVs) pres...
- STEER: Inference-Time Risk Control via Constrained Quality-Diversity Search : Abstract: Large Language Models (LLMs) trained for average correctness often exhibit mode collapse, producing narrow decision behaviors on tasks where multiple responses may be reasonable. This limita...
- "I May Not Have Articulated Myself Clearly": Diagnosing Dynamic Instability in LLM Reasoning at Inference Time : Abstract: Reasoning failures in large language models (LLMs) are typically measured only at the end of a generation, yet many failures manifest as a process-level breakdown: the model "loses the threa...
- DoubleTake: Contrastive Reasoning for Faithful Decision-Making in Medical Imaging : Abstract: Accurate decision making in medical imaging requires reasoning over subtle visual differences between confusable conditions, yet most existing approaches rely on nearest neighbor retrieval t...
- Reasoning about Reasoning: BAPO Bounds on Chain-of-Thought Token Complexity in LLMs : Abstract: Inference-time scaling via chain-of-thought (CoT) reasoning is a major driver of state-of-the-art LLM performance, but it comes with substantial latency and compute costs. We address a funda...
- A Multi-scale Linear-time Encoder for Whole-Slide Image Analysis : Abstract: We introduce Multi-scale Adaptive Recurrent Biomedical Linear-time Encoder (MARBLE), the first \textit{purely Mamba-based} multi-state multiple instance learning (MIL) framework for whole-sl...
- DeltaEvolve: Accelerating Scientific Discovery through Momentum-Driven Evolution : Abstract: LLM-driven evolutionary systems have shown promise for automated science discovery, yet existing approaches such as AlphaEvolve rely on full-code histories that are context-inefficient and p...
- Training-Free Self-Correction for Multimodal Masked Diffusion Models : Abstract: Masked diffusion models have emerged as a powerful framework for text and multimodal generation. However, their sampling procedure updates multiple tokens simultaneously and treats generated...
- Weighted Sum-of-Trees Model for Clustered Data : Abstract: Clustered data, which arise when observations are nested within groups, are incredibly common in clinical, education, and social science research. Traditionally, a linear mixed model, which ...
- Synthetic Data Augmentation for Medical Audio Classification: A Preliminary Evaluation : Abstract: Medical audio classification remains challenging due to low signal-to-noise ratios, subtle discriminative features, and substantial intra-class variability, often compounded by class imbalan...
- Embodiment-Aware Generalist Specialist Distillation for Unified Humanoid Whole-Body Control : Abstract: Humanoid Whole-Body Controllers trained with reinforcement learning (RL) have recently achieved remarkable performance, yet many target a single robot embodiment. Variations in dynamics, deg...
- Learning Fast Monomial Orders for Gr\"obner Basis Computations : Abstract: The efficiency of Gröbner basis computation, the standard engine for solving systems of polynomial equations, depends on the choice of monomial ordering. Despite a near-continuum of possible...
- Where Norms and References Collide: Evaluating LLMs on Normative Reasoning : Abstract: Embodied agents, such as robots, will need to interact in situated environments where successful communication often depends on reasoning over social norms: shared expectations that constrai...
- Aligning Forest and Trees in Images and Long Captions for Visually Grounded Understanding : Abstract: Large vision-language models such as CLIP struggle with long captions because they align images and texts as undifferentiated wholes. Fine-grained vision-language understanding requires hier...
- Methods and Open Problems in Differentiable Social Choice: Learning Mechanisms, Decisions, and Alignment : Abstract: Social choice is no longer a peripheral concern of political theory or economics-it has become a foundational component of modern machine learning systems. From auctions and resource allocat...
- VOILA: Value-of-Information Guided Fidelity Selection for Cost-Aware Multimodal Question Answering : Abstract: Despite significant costs from retrieving and processing high-fidelity visual inputs, most multimodal vision-language systems operate at fixed fidelity levels. We introduce VOILA, a framewor...
- Rethinking Music Captioning with Music Metadata LLMs : Abstract: Music captioning, or the task of generating a natural language description of music, is useful for both music understanding and controllable music generation. Training captioning models, how...
- Physics-inspired transformer quantum states via latent imaginary-time evolution : Abstract: Neural quantum states (NQS) are powerful ansätze in the variational Monte Carlo framework, yet their architectures are often treated as black boxes. We propose a physically transparent frame...
- Generalizable and Interpretable RF Fingerprinting with Shapelet-Enhanced Large Language Models : Abstract: Deep neural networks (DNNs) have achieved remarkable success in radio frequency (RF) fingerprinting for wireless device authentication. However, their practical deployment faces two major li...
- LatentMem: Customizing Latent Memory for Multi-Agent Systems : Abstract: Large language model (LLM)-powered multi-agent systems (MAS) demonstrate remarkable collective intelligence, wherein multi-agent memory serves as a pivotal mechanism for continual adaptation...
- Unified Inference Framework for Single and Multi-Player Performative Prediction: Method and Asymptotic Optimality : Abstract: Performative prediction characterizes environments where predictive models alter the very data distributions they aim to forecast, triggering complex feedback loops. While prior research tre...
- Training and Simulation of Quadrupedal Robot in Adaptive Stair Climbing for Indoor Firefighting: An End-to-End Reinforcement Learning Approach : Abstract: Quadruped robots are used for primary searches during the early stages of indoor fires. A typical primary search involves quickly and thoroughly looking for victims under hazardous condition...
- Feature, Alignment, and Supervision in Category Learning: A Comparative Approach with Children and Neural Networks : Abstract: Understanding how humans and machines learn from sparse data is central to cognitive science and machine learning. Using a species-fair design, we compare children and convolutional neural n...
- Fully Kolmogorov-Arnold Deep Model in Medical Image Segmentation : Abstract: Deeply stacked KANs are practically impossible due to high training difficulties and substantial memory requirements. Consequently, existing studies can only incorporate few KAN layers, hind...
- Online Conformal Prediction via Universal Portfolio Algorithms : Abstract: Online conformal prediction (OCP) seeks prediction intervals that achieve long-run $1-α$ coverage for arbitrary (possibly adversarial) data streams, while remaining as informative as possibl...
- NeuralFLoC: Neural Flow-Based Joint Registration and Clustering of Functional Data : Abstract: Clustering functional data in the presence of phase variation is challenging, as temporal misalignment can obscure intrinsic shape differences and degrade clustering performance. Most existi...
- ForesightKV: Optimizing KV Cache Eviction for Reasoning Models by Learning Long-Term Contribution : Abstract: Recently, large language models (LLMs) have shown remarkable reasoning abilities by producing long reasoning traces. However, as the sequence length grows, the key-value (KV) cache expands l...
- Latent Neural-ODE for Model-Informed Precision Dosing: Overcoming Structural Assumptions in Pharmacokinetics : Abstract: Accurate estimation of tacrolimus exposure, quantified by the area under the concentration-time curve (AUC), is essential for precision dosing after renal transplantation. Current practice r...
- Token Sparse Attention: Efficient Long-Context Inference with Interleaved Token Selection : Abstract: The quadratic complexity of attention remains the central bottleneck in long-context inference for large language models. Prior acceleration methods either sparsify the attention map with st...
- TAME: A Trustworthy Test-Time Evolution of Agent Memory with Systematic Benchmarking : Abstract: Test-time evolution of agent memory serves as a pivotal paradigm for achieving AGI by bolstering complex reasoning through experience accumulation. However, even during benign task evolution...
- Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning : Abstract: Scaling test-time compute via long Chain-ofThought unlocks remarkable gains in reasoning capabilities, yet it faces practical limits due to the linear growth of KV cache and quadratic attent...
- Principled Federated Random Forests for Heterogeneous Data : Abstract: Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike ...
- HypCBC: Domain-Invariant Hyperbolic Cross-Branch Consistency for Generalizable Medical Image Analysis : Abstract: Robust generalization beyond training distributions remains a critical challenge for deep neural networks. This is especially pronounced in medical image analysis, where data is often scarce...
- Agentic Proposing: Enhancing Large Language Model Reasoning via Compositional Skill Synthesis : Abstract: Advancing complex reasoning in large language models relies on high-quality, verifiable datasets, yet human annotation remains cost-prohibitive and difficult to scale. Current synthesis para...
- RDT2: Exploring the Scaling Limit of UMI Data Towards Zero-Shot Cross-Embodiment Generalization : Abstract: Vision-Language-Action (VLA) models hold promise for generalist robotics but currently struggle with data scarcity, architectural inefficiencies, and the inability to generalize across diffe...
- Multiparameter Uncertainty Mapping in Quantitative Molecular MRI using a Physics-Structured Variational Autoencoder (PS-VAE) : Abstract: Quantitative imaging methods, such as magnetic resonance fingerprinting (MRF), aim to extract interpretable pathology biomarkers by estimating biophysical tissue parameters from signal evolu...
- A Novel approach to portfolio construction : Abstract: This paper proposes a machine learning-based framework for asset selection and portfolio construction, termed the Best-Path Algorithm Sparse Graphical Model (BPASGM). The method extends the ...
- Accurate Failure Prediction in Agents Does Not Imply Effective Failure Prevention : Abstract: Proactive interventions by LLM critic models are often assumed to improve reliability, yet their effects at deployment time are poorly understood. We show that a binary LLM critic with stron...
- Tiled Prompts: Overcoming Prompt Underspecification in Image and Video Super-Resolution : Abstract: Text-conditioned diffusion models have advanced image and video super-resolution by using prompts as semantic priors, but modern super-resolution pipelines typically rely on latent tiling to...
- Building Interpretable Models for Moral Decision-Making : Abstract: We build a custom transformer model to study how neural networks make moral decisions on trolley-style dilemmas. The model processes structured scenarios using embeddings that encode who is ...
- PACE: Pretrained Audio Continual Learning : Abstract: Audio is a fundamental modality for analyzing speech, music, and environmental sounds. Although pretrained audio models have significantly advanced audio understanding, they remain fragile i...
- GFlowPO: Generative Flow Network as a Language Model Prompt Optimizer : Abstract: Finding effective prompts for language models (LMs) is critical yet notoriously difficult: the prompt space is combinatorially large, rewards are sparse due to expensive target-LM evaluation...
- Symbol-Aware Reasoning with Masked Discrete Diffusion for Handwritten Mathematical Expression Recognition : Abstract: Handwritten Mathematical Expression Recognition (HMER) requires reasoning over diverse symbols and 2D structural layouts, yet autoregressive models struggle with exposure bias and syntactic ...
- From Vicious to Virtuous Cycles: Synergistic Representation Learning for Unsupervised Video Object-Centric Learning : Abstract: Unsupervised object-centric learning models, particularly slot-based architectures, have shown great promise in decomposing complex scenes. However, their reliance on reconstruction-based tr...
- Improving the Linearized Laplace Approximation via Quadratic Approximations : Abstract: Deep neural networks (DNNs) often produce overconfident out-of-distribution predictions, motivating Bayesian uncertainty quantification. The Linearized Laplace Approximation (LLA) achieves t...
- Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility : Abstract: Vision language models (VLMs) extend the reasoning capabilities of large language models (LLMs) to cross-modal settings, yet remain highly vulnerable to multimodal jailbreak attacks. Existin...
- Enhancing Quantum Diffusion Models for Complex Image Generation : Abstract: Quantum generative models offer a novel approach to exploring high-dimensional Hilbert spaces but face significant challenges in scalability and expressibility when applied to multi-modal di...
- CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering : Abstract: Emotional expression in human speech is nuanced and compositional, often involving multiple, sometimes conflicting, affective cues that may diverge from linguistic content. In contrast, most...
- DiscoverLLM: From Executing Intents to Discovering Them : Abstract: To handle ambiguous and open-ended requests, Large Language Models (LLMs) are increasingly trained to interact with users to surface intents they have not yet expressed (e.g., ask clarificat...
- Acceleration of Atomistic NEGF: Algorithms, Parallelization, and Machine Learning : Abstract: The Non-equilibrium Green's function (NEGF) formalism is a particularly powerful method to simulate the quantum transport properties of nanoscale devices such as transistors, photo-diodes, o...
- CRL-VLA: Continual Vision-Language-Action Learning : Abstract: Lifelong learning is critical for embodied agents in open-world environments, where reinforcement learning fine-tuning has emerged as an important paradigm to enable Vision-Language-Action (...
- Score-based diffusion models for diffuse optical tomography with uncertainty quantification : Abstract: Score-based diffusion models are a recently developed framework for posterior sampling in Bayesian inverse problems with a state-of-the-art performance for severely ill-posed problems by lev...
- IntentRL: Training Proactive User-intent Agents for Open-ended Deep Research via Reinforcement Learning : Abstract: Deep Research (DR) agents extend Large Language Models (LLMs) beyond parametric knowledge by autonomously retrieving and synthesizing evidence from large web corpora into long-form reports, ...
- Self-Verification Dilemma: Experience-Driven Suppression of Overused Checking in LLM Reasoning : Abstract: Large Reasoning Models (LRMs) achieve strong performance by generating long reasoning traces with reflection. Through a large-scale empirical analysis, we find that a substantial fraction of...
- DALI: A Workload-Aware Offloading Framework for Efficient MoE Inference on Local PCs : Abstract: Mixture of Experts (MoE) architectures significantly enhance the capacity of LLMs without proportional increases in computation, but at the cost of a vast parameter size. Offloading MoE expe...
- Generative Decompression: Optimal Lossy Decoding Against Distribution Mismatch : Abstract: This paper addresses optimal decoding strategies in lossy compression where the assumed distribution for compressor design mismatches the actual (true) distribution of the source. This probl...
- Can Large Language Models Generalize Procedures Across Representations? : Abstract: Large language models (LLMs) are trained and tested extensively on symbolic representations such as code and graphs, yet real-world user tasks are often specified in natural language. To wha...
- EHRWorld: A Patient-Centric Medical World Model for Long-Horizon Clinical Trajectories : Abstract: World models offer a principled framework for simulating future states under interventions, but realizing such models in complex, high-stakes domains like medicine remains challenging. Recen...
- $V_0$: A Generalist Value Model for Any Policy at State Zero : Abstract: Policy gradient methods rely on a baseline to measure the relative advantage of an action, ensuring the model reinforces behaviors that outperform its current average capability. In the trai...
- Generator-based Graph Generation via Heat Diffusion : Abstract: Graph generative modelling has become an essential task due to the wide range of applications in chemistry, biology, social networks, and knowledge representation. In this work, we propose a...
- Simulation-Based Inference via Regression Projection and Batched Discrepancies : Abstract: We analyze a lightweight simulation-based inference method that infers simulator parameters using only a regression-based projection of the observed data. After fitting a surrogate linear re...
- TRE: Encouraging Exploration in the Trust Region : Abstract: Entropy regularization is a standard technique in reinforcement learning (RL) to enhance exploration, yet it yields negligible effects or even degrades performance in Large Language Models (...
- Mitigating Conversational Inertia in Multi-Turn Agents : Abstract: Large language models excel as few-shot learners when provided with appropriate demonstrations, yet this strength becomes problematic in multiturn agent scenarios, where LLMs erroneously mim...
- Efficient Sequential Neural Network with Spatial-Temporal Attention and Linear LSTM for Robust Lane Detection Using Multi-Frame Images : Abstract: Lane detection is a crucial perception task for all levels of automated vehicles (AVs) and Advanced Driver Assistance Systems, particularly in mixed-traffic environments where AVs must inter...
- Neural Attention Search Linear: Towards Adaptive Token-Level Hybrid Attention Models : Abstract: The quadratic computational complexity of softmax transformers has become a bottleneck in long-context scenarios. In contrast, linear attention model families provide a promising direction t...
- Improved Analysis of the Accelerated Noisy Power Method with Applications to Decentralized PCA : Abstract: We analyze the Accelerated Noisy Power Method, an algorithm for Principal Component Analysis in the setting where only inexact matrix-vector products are available, which can arise for insta...
- VR-VFL: Joint Rate and Client Selection for Vehicular Federated Learning Under Imperfect CSI : Abstract: Federated learning in vehicular edge networks faces major challenges in efficient resource allocation, largely due to high vehicle mobility and the presence of imperfect channel state inform...
- Efficient Variance-reduced Estimation from Generative EHR Models: The SCOPE and REACH Estimators : Abstract: Generative models trained using self-supervision of tokenized electronic health record (EHR) timelines show promise for clinical outcome prediction. This is typically done using Monte Carlo ...
- Conditional Flow Matching for Visually-Guided Acoustic Highlighting : Abstract: Visually-guided acoustic highlighting seeks to rebalance audio in alignment with the accompanying video, creating a coherent audio-visual experience. While visual saliency and enhancement ha...
- Fast Sampling for Flows and Diffusions with Lazy and Point Mass Stochastic Interpolants : Abstract: Stochastic interpolants unify flows and diffusions, popular generative modeling frameworks. A primary hyperparameter in these methods is the interpolation schedule that determines how to bri...
- Understanding Agent Scaling in LLM-Based Multi-Agent Systems via Diversity : Abstract: LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increas...
- Conformal Thinking: Risk Control for Reasoning on a Compute Budget : Abstract: Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when th...
- Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning : Abstract: Multimodal Large Language Models (MLLMs) suffer from severe training inefficiency issue, which is associated with their massive model sizes and visual token numbers. Existing efforts in effi...
- Preference-based Conditional Treatment Effects and Policy Learning : Abstract: We introduce a new preference-based framework for conditional treatment effect estimation and policy learning, built on the Conditional Preference-based Treatment Effect (CPTE). CPTE require...
- Investigating Quantum Circuit Designs Using Neuro-Evolution : Abstract: Designing effective quantum circuits remains a central challenge in quantum computing, as circuit structure strongly influences expressivity, trainability, and hardware feasibility. Current ...
- On the Convergence of Experience Replay in Policy Optimization: Characterizing Bias, Variance, and Finite-Time Convergence : Abstract: Experience replay is a core ingredient of modern deep reinforcement learning, yet its benefits in policy optimization are poorly understood beyond empirical heuristics. This paper develops a...
- Discrete Latent Structure in Neural Networks : Abstract: Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences...
- Sparse maximal update parameterization: A holistic approach to sparse training dynamics : Abstract: Several challenges make it difficult for sparse neural networks to compete with dense models. First, setting a large fraction of weights to zero impairs forward and gradient signal propagati...
- Achieving Linear Speedup for Composite Federated Learning : Abstract: This paper proposes FedNMap, a normal map-based method for composite federated learning, where the objective consists of a smooth loss and a possibly nonsmooth regularizer. FedNMap leverages...
- MeKi: Memory-based Expert Knowledge Injection for Efficient LLM Scaling : Abstract: Scaling Large Language Models (LLMs) typically relies on increasing the number of parameters or test-time computations to boost performance. However, these strategies are impractical for edg...
- Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures : Abstract: Machine unlearning aims to remove specific content from trained models while preserving overall performance. However, the phenomenon of benign relearning, in which forgotten information reem...
- Dynamic Topology Optimization for Non-IID Data in Decentralized Learning : Abstract: Decentralized learning (DL) enables a set of nodes to train a model collaboratively without central coordination, offering benefits for privacy and scalability. However, DL struggles to trai...
- An Approximate Ascent Approach To Prove Convergence of PPO : Abstract: Proximal Policy Optimization (PPO) is among the most widely used deep reinforcement learning algorithms, yet its theoretical foundations remain incomplete. Most importantly, convergence and ...
- Chain-of-Goals Hierarchical Policy for Long-Horizon Offline Goal-Conditioned RL : Abstract: Offline goal-conditioned reinforcement learning remains challenging for long-horizon tasks. While hierarchical approaches mitigate this issue by decomposing tasks, most existing methods rely...
- On the Entropy Dynamics in Reinforcement Fine-Tuning of Large Language Models : Abstract: Entropy serves as a critical metric for measuring the diversity of outputs generated by large language models (LLMs), providing valuable insights into their exploration capabilities. While r...
- The Label Horizon Paradox: Rethinking Supervision Targets in Financial Forecasting : Abstract: While deep learning has revolutionized financial forecasting through sophisticated architectures, the design of the supervision signal itself is rarely scrutinized. We challenge the canonica...
- Most Convolutional Networks Suffer from Small Adversarial Perturbations : Abstract: The existence of adversarial examples is relatively understood for random fully connected neural networks, but much less so for convolutional neural networks (CNNs). The recent work [Daniely...
- Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing : Abstract: Reinforcement learning with verifiable rewards (RLVR) is effective for training large language models on deterministic outcome reasoning tasks. Prior work shows RLVR works with few prompts, ...
- Causal Inference on Networks under Misspecified Exposure Mappings: A Partial Identification Framework : Abstract: Estimating treatment effects in networks is challenging, as each potential outcome depends on the treatments of all other nodes in the network. To overcome this difficulty, existing methods ...
- Soft-Radial Projection for Constrained End-to-End Learning : Abstract: Integrating hard constraints into deep learning is essential for safety-critical systems. Yet existing constructive layers that project predictions onto constraint boundaries face a fundamen...
- Scaling Continual Learning with Bi-Level Routing Mixture-of-Experts : Abstract: Continual learning, especially class-incremental learning (CIL), on the basis of a pre-trained model (PTM) has garnered substantial research interest in recent years. However, how to effecti...
- ScDiVa: Masked Discrete Diffusion for Joint Modeling of Single-Cell Identity and Expression : Abstract: Single-cell RNA-seq profiles are high-dimensional, sparse, and unordered, causing autoregressive generation to impose an artificial ordering bias and suffer from error accumulation. To addre...
- DeepDFA: Injecting Temporal Logic in Deep Learning for Sequential Subsymbolic Applications : Abstract: Integrating logical knowledge into deep neural network training is still a hard challenge, especially for sequential or temporally extended domains involving subsymbolic observations. To add...
- A Minimal Task Reveals Emergent Path Integration and Object-Location Binding in a Predictive Sequence Model : Abstract: Adaptive cognition requires structured internal models representing objects and their relations. Predictive neural networks are often proposed to form such "world models", yet their underlyi...
- Least but not Last: Fine-tuning Intermediate Principal Components for Better Performance-Forgetting Trade-Offs : Abstract: Low-Rank Adaptation (LoRA) methods have emerged as crucial techniques for adapting large pre-trained models to downstream tasks under computational and memory constraints. However, they face...
- Lookahead Path Likelihood Optimization for Diffusion LLMs : Abstract: Diffusion Large Language Models (dLLMs) support arbitrary-order generation, yet their inference performance critically depends on the unmasking order. Existing strategies rely on heuristics ...
- Reparameterization Flow Policy Optimization : Abstract: Reparameterization Policy Gradient (RPG) has emerged as a powerful paradigm for model-based reinforcement learning, enabling high sample efficiency by backpropagating gradients through diffe...
- Explaining the Explainer: Understanding the Inner Workings of Transformer-based Symbolic Regression Models : Abstract: Following their success across many domains, transformers have also proven effective for symbolic regression (SR); however, the internal mechanisms underlying their generation of mathematica...
- A Function-Space Stability Boundary for Generalization in Interpolating Learning Systems : Abstract: Modern learning systems often interpolate training data while still generalizing well, yet it remains unclear when algorithmic stability explains this behavior. We model training as a functi...
- Mitigating Staleness in Asynchronous Pipeline Parallelism via Basis Rotation : Abstract: Asynchronous pipeline parallelism maximizes hardware utilization by eliminating the pipeline bubbles inherent in synchronous execution, offering a path toward efficient large-scale distribut...
- Not All Negative Samples Are Equal: LLMs Learn Better from Plausible Reasoning : Abstract: Learning from negative samples holds great promise for improving Large Language Model (LLM) reasoning capability, yet existing methods treat all incorrect responses as equally informative, o...
- Rank-Learner: Orthogonal Ranking of Treatment Effects : Abstract: Many decision-making problems require ranking individuals by their treatment effects rather than estimating the exact effect magnitudes. Examples include prioritizing patients for preventive...
- Live or Lie: Action-Aware Capsule Multiple Instance Learning for Risk Assessment in Live Streaming Platforms : Abstract: Live streaming has become a cornerstone of today's internet, enabling massive real-time social interactions. However, it faces severe risks arising from sparse, coordinated malicious behavio...
- WARP Logic Neural Networks : Abstract: Fast and efficient AI inference is increasingly important, and recent models that directly learn low-level logic operations have achieved state-of-the-art performance. However, existing logi...
- Robust Representation Learning in Masked Autoencoders : Abstract: Masked Autoencoders (MAEs) achieve impressive performance in image classification tasks, yet the internal representations they learn remain less understood. This work started as an attempt t...
- Sparse Training of Neural Networks based on Multilevel Mirror Descent : Abstract: We introduce a dynamic sparse training algorithm based on linearized Bregman iterations / mirror descent that exploits the naturally incurred sparsity by alternating between periods of stati...
- MatGPTQ: Accurate and Efficient Post-Training Matryoshka Quantization : Abstract: Matryoshka Quantization (MatQuant) is a recent quantization approach showing that a single integer-quantized model can be served across multiple precisions, by slicing the most significant b...
- How to Train Your Resistive Network: Generalized Equilibrium Propagation and Analytical Learning : Abstract: Machine learning is a powerful method of extracting meaning from data; unfortunately, current digital hardware is extremely energy-intensive. There is interest in an alternative analog compu...
- When Single Answer Is Not Enough: Rethinking Single-Step Retrosynthesis Benchmarks for LLMs : Abstract: Recent progress has expanded the use of large language models (LLMs) in drug discovery, including synthesis planning. However, objective evaluation of retrosynthesis performance remains limi...
- NPCNet: Navigator-Driven Pseudo Text for Deep Clustering of Early Sepsis Phenotyping : Abstract: Sepsis is a heterogeneous syndrome. Identifying clinically distinct phenotypes may enable more precise treatment strategies. In recent years, many researchers have applied clustering algorit...
- CoGenCast: A Coupled Autoregressive-Flow Generative Framework for Time Series Forecasting : Abstract: Time series forecasting can be viewed as a generative problem that requires both semantic understanding over contextual conditions and stochastic modeling of continuous temporal dynamics. Ex...
- Riemannian Neural Optimal Transport : Abstract: Computational optimal transport (OT) offers a principled framework for generative modeling. Neural OT methods, which use neural networks to learn an OT map (or potential) from data in an amo...
- EVE: Efficient Verification of Data Erasure through Customized Perturbation in Approximate Unlearning : Abstract: Verifying whether the machine unlearning process has been properly executed is critical but remains underexplored. Some existing approaches propose unlearning verification methods based on b...
- Asymmetric Hierarchical Anchoring for Audio-Visual Joint Representation: Resolving Information Allocation Ambiguity for Robust Cross-Modal Generalization : Abstract: Audio-visual joint representation learning under Cross-Modal Generalization (CMG) aims to transfer knowledge from a labeled source modality to an unlabeled target modality through a unified ...
- Optimization and Generation in Aerodynamics Inverse Design : Abstract: Inverse design with physics-based objectives is challenging because it couples high-dimensional geometry with expensive simulations, as exemplified by aerodynamic shape optimization for drag...
- APEX: Probing Neural Networks via Activation Perturbation : Abstract: Prior work on probing neural networks primarily relies on input-space analysis or parameter perturbation, both of which face fundamental limitations in accessing structural information encod...
- SAGE-5GC: Security-Aware Guidelines for Evaluating Anomaly Detection in the 5G Core Network : Abstract: Machine learning-based anomaly detection systems are increasingly being adopted in 5G Core networks to monitor complex, high-volume traffic. However, most existing approaches are evaluated u...
- Explanations Leak: Membership Inference with Differential Privacy and Active Learning Defense : Abstract: Counterfactual explanations (CFs) are increasingly integrated into Machine Learning as a Service (MLaaS) systems to improve transparency; however, ML models deployed via APIs are already vul...
- Quantization-Aware Regularizers for Deep Neural Networks Compression : Abstract: Deep Neural Networks reached state-of-the-art performance across numerous domains, but this progress has come at the cost of increasingly large and over-parameterized models, posing serious ...
- Ultra Fast PDE Solving via Physics Guided Few-step Diffusion : Abstract: Diffusion-based models have demonstrated impressive accuracy and generalization in solving partial differential equations (PDEs). However, they still face significant limitations, such as hi...
- CTTVAE: Latent Space Structuring for Conditional Tabular Data Generation on Imbalanced Datasets : Abstract: Generating synthetic tabular data under severe class imbalance is essential for domains where rare but high-impact events drive decision-making. However, most generative models either overlo...
- Reinforcement Fine-Tuning for History-Aware Dense Retriever in RAG : Abstract: Retrieval-augmented generation (RAG) enables large language models (LLMs) to produce evidence-based responses, and its performance hinges on the matching between the retriever and LLMs. Retr...
- Sequential Group Composition: A Window into the Mechanics of Deep Learning : Abstract: How do neural networks trained over sequences acquire the ability to perform structured operations, such as arithmetic, geometric, and algorithmic computation? To gain insight into this ques...
- Equilibrium Propagation for Non-Conservative Systems : Abstract: Equilibrium Propagation (EP) is a physics-inspired learning algorithm that uses stationary states of a dynamical system both for inference and learning. In its original formulation it is lim...
- ContraLog: Log File Anomaly Detection with Contrastive Learning and Masked Language Modeling : Abstract: Log files record computational events that reflect system state and behavior, making them a primary source of operational insights in modern computer systems. Automated anomaly detection on ...
- Universal One-third Time Scaling in Learning Peaked Distributions : Abstract: Training large language models (LLMs) is computationally expensive, partly because the loss exhibits slow power-law convergence whose origin remains debatable. Through systematic analysis of...
- QuAIL: Quality-Aware Inertial Learning for Robust Training under Data Corruption : Abstract: Tabular machine learning systems are frequently trained on data affected by non-uniform corruption, including noisy measurements, missing entries, and feature-specific biases. In practice, t...
- LLM-Inspired Pretrain-Then-Finetune for Small-Data, Large-Scale Optimization : Abstract: We consider small-data, large-scale decision problems in which a firm must make many operational decisions simultaneously (e.g., across a large product portfolio) while observing only a few,...
- Conflict-Resolving and Sharpness-Aware Minimization for Generalized Knowledge Editing with Multiple Updates : Abstract: Large language models (LLMs) rely on internal knowledge to solve many downstream tasks, making it crucial to keep them up to date. Since full retraining is expensive, prior work has explored...
- Data-Driven Graph Filters via Adaptive Spectral Shaping : Abstract: We introduce Adaptive Spectral Shaping, a data-driven framework for graph filtering that learns a reusable baseline spectral kernel and modulates it with a small set of Gaussian factors. The...
- Anytime Pretraining: Horizon-Free Learning-Rate Schedules with Weight Averaging : Abstract: Large language models are increasingly trained in continual or open-ended settings, where the total training horizon is not known in advance. Despite this, most existing pretraining recipes ...
- Efficient Training of Boltzmann Generators Using Off-Policy Log-Dispersion Regularization : Abstract: Sampling from unnormalized probability densities is a central challenge in computational science. Boltzmann generators are generative models that enable independent sampling from the Boltzma...
- Fast-MWEM: Private Data Release in Sublinear Time : Abstract: The Multiplicative Weights Exponential Mechanism (MWEM) is a fundamental iterative framework for private data analysis, with broad applications such as answering $m$ linear queries, or priva...
- Soft Sensor for Bottom-Hole Pressure Estimation in Petroleum Wells Using Long Short-Term Memory and Transfer Learning : Abstract: Monitoring bottom-hole variables in petroleum wells is essential for production optimization, safety, and emissions reduction. Permanent Downhole Gauges (PDGs) provide real-time pressure dat...
- Decision-oriented benchmarking to transform AI weather forecast access: Application to the Indian monsoon : Abstract: Artificial intelligence weather prediction (AIWP) models now often outperform traditional physics-based models on common metrics while requiring orders-of-magnitude less computing resources ...
- Reasoning with Latent Tokens in Diffusion Language Models : Abstract: Discrete diffusion models have recently become competitive with autoregressive models for language modeling, even outperforming them on reasoning tasks requiring planning and global coherenc...
- UniGeM: Unifying Data Mixing and Selection via Geometric Exploration and Mining : Abstract: The scaling of Large Language Models (LLMs) is increasingly limited by data quality. Most methods handle data mixing and sample selection separately, which can break the structure in code co...
- Reasoning Cache: Continual Improvement Over Long Horizons via Short-Horizon RL : Abstract: Large Language Models (LLMs) that can continually improve beyond their training budgets are able to solve increasingly difficult problems by adapting at test time, a property we refer to as ...
- Reward Redistribution for CVaR MDPs using a Bellman Operator on L-infinity : Abstract: Tail-end risk measures such as static conditional value-at-risk (CVaR) are used in safety-critical applications to prevent rare, yet catastrophic events. Unlike risk-neutral objectives, the ...
- Efficient Estimation of Kernel Surrogate Models for Task Attribution : Abstract: Modern AI agents such as large language models are trained on diverse tasks -- translation, code generation, mathematical reasoning, and text prediction -- simultaneously. A key question is ...
- Inference-time Unlearning Using Conformal Prediction : Abstract: Machine unlearning is the process of efficiently removing specific information from a trained machine learning model without retraining from scratch. Existing unlearning methods, which often...
- Should I use Synthetic Data for That? An Analysis of the Suitability of Synthetic Data for Data Sharing and Augmentation : Abstract: Recent advances in generative modelling have led many to see synthetic data as the go-to solution for a range of problems around data access, scarcity, and under-representation. In this pape...
- Manifold Random Features : Abstract: We present a new paradigm for creating random features to approximate bi-variate functions (in particular, kernels) defined on general manifolds. This new mechanism of Manifold Random Featur...
- Prediction of Critical Heat Flux in Rod Bundles Using Tube-Based Hybrid Machine Learning Models in CTF : Abstract: The prediction of critical heat flux (CHF) using machine learning (ML) approaches has become a highly active research activity in recent years, the goal of which is to build models more accu...
- Bridging Online and Offline RL: Contextual Bandit Learning for Multi-Turn Code Generation : Abstract: Recently, there have been significant research interests in training large language models (LLMs) with reinforcement learning (RL) on real-world tasks, such as multi-turn code generation. Wh...
- Enhancing Imbalanced Node Classification via Curriculum-Guided Feature Learning and Three-Stage Attention Network : Abstract: Imbalanced node classification in graph neural networks (GNNs) happens when some labels are much more common than others, which causes the model to learn unfairly and perform badly on the le...
- Antidistillation Fingerprinting : Abstract: Model distillation enables efficient emulation of frontier large language models (LLMs), creating a need for robust mechanisms to detect when a third-party student model has trained on a tea...
- SymPlex: A Structure-Aware Transformer for Symbolic PDE Solving : Abstract: We propose SymPlex, a reinforcement learning framework for discovering analytical symbolic solutions to partial differential equations (PDEs) without access to ground-truth expressions. SymP...
- Robust Intervention Learning from Emergency Stop Interventions : Abstract: Human interventions are a common source of data in autonomous systems during testing. These interventions provide an important signal about where the current policy needs improvement, but ar...
- Understanding and Exploiting Weight Update Sparsity for Communication-Efficient Distributed RL : Abstract: Reinforcement learning (RL) is a critical component for post-training large language models (LLMs). However, in bandwidth-constrained distributed RL, scalability is often bottlenecked by the...
- PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning : Abstract: We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining ...
- Test-Time Detoxification without Training or Learning Anything : Abstract: Large language models can produce toxic or inappropriate text even for benign inputs, creating risks when deployed at scale. Detoxification is therefore important for safety and user trust, ...
- Learning-augmented smooth integer programs with PAC-learnable oracles : Abstract: This paper investigates learning-augmented algorithms for smooth integer programs, covering canonical problems such as MAX-CUT and MAX-k-SAT. We introduce a framework that incorporates a pre...
- Design and Evaluation of Whole-Page Experience Optimization for E-commerce Search : Abstract: E-commerce Search Results Pages (SRPs) are evolving from linear lists to complex, non-linear layouts, rendering traditional position-biased ranking models insufficient. Moreover, existing op...
- CreditAudit: 2D Auditing for LLM Evaluation and Selection : Abstract: Leaderboard scores on public benchmarks have been steadily rising and converging, with many frontier language models now separated by only marginal differences. However, these scores often f...
- Measuring Individual User Fairness with User Similarity and Effectiveness Disparity : Abstract: Individual user fairness is commonly understood as treating similar users similarly. In Recommender Systems (RSs), several evaluation measures exist for quantifying individual user fairness....
- GASTON: Graph-Aware Social Transformer for Online Networks : Abstract: Online communities have become essential places for socialization and support, yet they also possess toxicity, echo chambers, and misinformation. Detecting this harmful content is difficult ...
- HMVLA: Hyperbolic Multimodal Fusion for Vision-Language-Action Models : Abstract: Vision Language Action (VLA) models have recently shown great potential in bridging multimodal perception with robotic control. However, existing methods often rely on direct fine-tuning of ...
- WorldVQA: Measuring Atomic World Knowledge in Multimodal Large Language Models : Abstract: We introduce WorldVQA, a benchmark designed to evaluate the atomic visual world knowledge of Multimodal Large Language Models (MLLMs). Unlike current evaluations, which often conflate visual...
- Experience-Driven Multi-Agent Systems Are Training-free Context-aware Earth Observers : Abstract: Recent advances have enabled large language model (LLM) agents to solve complex tasks by orchestrating external tools. However, these agents often struggle in specialized, tool-intensive dom...
- MathlibLemma: Folklore Lemma Generation and Benchmark for Formal Mathematics : Abstract: While the ecosystem of Lean and Mathlib has enjoyed celebrated success in formal mathematical reasoning with the help of large language models (LLMs), the absence of many folklore lemmas in ...
- Relaxed Triangle Inequality for Kullback-Leibler Divergence Between Multivariate Gaussian Distributions : Abstract: The Kullback-Leibler (KL) divergence is not a proper distance metric and does not satisfy the triangle inequality, posing theoretical challenges in certain practical applications. Existing w...
- Uncertainty and Fairness Awareness in LLM-Based Recommendation Systems : Abstract: Large language models (LLMs) enable powerful zero-shot recommendations by leveraging broad contextual knowledge, yet predictive uncertainty and embedded biases threaten reliability and fairn...
- PeerRank: Autonomous LLM Evaluation Through Web-Grounded, Bias-Controlled Peer Review : Abstract: Evaluating large language models typically relies on human-authored benchmarks, reference answers, and human or single-model judgments, approaches that scale poorly, become quickly outdated,...
- Position: 3D Gaussian Splatting Watermarking Should Be Scenario-Driven and Threat-Model Explicit : Abstract: 3D content acquisition and creation are expanding rapidly in the new era of machine learning and AI. 3D Gaussian Splatting (3DGS) has become a promising high-fidelity and real-time represent...
- CryoLVM: Self-supervised Learning from Cryo-EM Density Maps with Large Vision Models : Abstract: Cryo-electron microscopy (cryo-EM) has revolutionized structural biology by enabling near-atomic-level visualization of biomolecular assemblies. However, the exponential growth in cryo-EM da...
- Trustworthy Blockchain-based Federated Learning for Electronic Health Records: Securing Participant Identity with Decentralized Identifiers and Verifiable Credentials : Abstract: The digitization of healthcare has generated massive volumes of Electronic Health Records (EHRs), offering unprecedented opportunities for training Artificial Intelligence (AI) models. Howev...
- Rethinking Test-Time Training: Tilting The Latent Distribution For Few-Shot Source-Free Adaptation : Abstract: Often, constraints arise in deployment settings where even lightweight parameter updates e.g. parameter-efficient fine-tuning could induce model shift or tuning instability. We study test-ti...
- A Positive Case for Faithfulness: LLM Self-Explanations Help Predict Model Behavior : Abstract: LLM self-explanations are often presented as a promising tool for AI oversight, yet their faithfulness to the model's true reasoning process is poorly understood. Existing faithfulness metri...
- Koopman Autoencoders with Continuous-Time Latent Dynamics for Fluid Dynamics Forecasting : Abstract: Data-driven surrogate models have emerged as powerful tools for accelerating the simulation of turbulent flows. However, classical approaches which perform autoregressive rollouts often trad...
- Tabula RASA: Exposing and Breaking the Relational Bottleneck in Transformers : Abstract: Transformers achieve remarkable performance across many domains, yet struggle with tasks requiring multi-hop relational reasoning over structured data. We analyze this limitation through cir...
- Semantics-Aware Generative Latent Data Augmentation for Learning in Low-Resource Domains : Abstract: Despite strong performance in data-rich regimes, deep learning often underperforms in the data-scarce settings common in practice. While foundation models (FMs) trained on massive datasets d...
- Causal Flow Q-Learning for Robust Offline Reinforcement Learning : Abstract: Expressive policies based on flow-matching have been successfully applied in reinforcement learning (RL) more recently due to their ability to model complex action distributions from offline...
- Zero Sum SVD: Balancing Loss Sensitivity for Low Rank LLM Compression : Abstract: Advances in large language models have driven strong performance across many tasks, but their memory and compute costs still hinder deployment. SVD-based compression reduces storage and can ...
- Recurrent Equivariant Constraint Modulation: Learning Per-Layer Symmetry Relaxation from Data : Abstract: Equivariant neural networks exploit underlying task symmetries to improve generalization, but strict equivariance constraints can induce more complex optimization dynamics that can hinder le...
- When pre-training hurts LoRA fine-tuning: a dynamical analysis via single-index models : Abstract: Pre-training on a source task is usually expected to facilitate fine-tuning on similar downstream problems. In this work, we mathematically show that this naive intuition is not always true:...
- Late-Stage Generalization Collapse in Grokking: Detecting anti-grokking with Weightwatcher : Abstract: \emph{Memorization} in neural networks lacks a precise operational definition and is often inferred from the grokking regime, where training accuracy saturates while test accuracy remains ve...
- A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk Minimization : Abstract: This paper studies optimization for a family of problems termed $\textbf{compositional entropic risk minimization}$, in which each data's loss is formulated as a Log-Expectation-Exponential ...
- Mixture of Concept Bottleneck Experts : Abstract: Concept Bottleneck Models (CBMs) promote interpretability by grounding predictions in human-understandable concepts. However, existing CBMs typically fix their task predictor to a single lin...
- Self-Soupervision: Cooking Model Soups without Labels : Abstract: Model soups are strange and strangely effective combinations of parameters. They take a model (the stock), fine-tune it into multiple models (the ingredients), and then mix their parameters ...
- TraceNAS: Zero-shot LLM Pruning via Gradient Trace Correlation : Abstract: Structured pruning is essential for efficient deployment of Large Language Models (LLMs). The varying sensitivity of LLM sub-blocks to pruning necessitates the identification of optimal non-...
- Controlled disagreement improves generalization in decentralized training : Abstract: Decentralized training is often regarded as inferior to centralized training because the consensus errors between workers are thought to undermine convergence and generalization, even with h...
- Manifold-Constrained Energy-Based Transition Models for Offline Reinforcement Learning : Abstract: Model-based offline reinforcement learning is brittle under distribution shift: policy improvement drives rollouts into state--action regions weakly supported by the dataset, where compoundi...
- Spatiotemporal Decision Transformer for Traffic Coordination : Abstract: Traffic signal control is a critical challenge in urban transportation, requiring coordination among multiple intersections to optimize network-wide traffic flow. While reinforcement learnin...
- A Random Matrix Theory Perspective on the Consistency of Diffusion Models : Abstract: Diffusion models trained on different, non-overlapping subsets of a dataset often produce strikingly similar outputs when given the same noise seed. We trace this consistency to a simple lin...
- Notes on the Reward Representation of Posterior Updates : Abstract: Many ideas in modern control and reinforcement learning treat decision-making as inference: start from a baseline distribution and update it when a signal arrives. We ask when this can be ma...
- Weighted Temporal Decay Loss for Learning Wearable PPG Data with Sparse Clinical Labels : Abstract: Advances in wearable computing and AI have increased interest in leveraging PPG for health monitoring over the past decade. One of the biggest challenges in developing health algorithms base...
- A Reproducible Framework for Bias-Resistant Machine Learning on Small-Sample Neuroimaging Data : Abstract: We introduce a reproducible, bias-resistant machine learning framework that integrates domain-informed feature engineering, nested cross-validation, and calibrated decision-threshold optimiz...
- How Does the Lagrangian Guide Safe Reinforcement Learning through Diffusion Models? : Abstract: Diffusion policy sampling enables reinforcement learning (RL) to represent multimodal action distributions beyond suboptimal unimodal Gaussian policies. However, existing diffusion-based RL ...
- Refining Decision Boundaries In Anomaly Detection Using Similarity Search Within the Feature Space : Abstract: Detecting rare and diverse anomalies in highly imbalanced datasets-such as Advanced Persistent Threats (APTs) in cybersecurity-remains a fundamental challenge for machine learning systems. A...
- Distance Marching for Generative Modeling : Abstract: Time-unconditional generative models learn time-independent denoising vector fields. But without time conditioning, the same noisy input may correspond to multiple noise levels and different...
- RPG-AE: Neuro-Symbolic Graph Autoencoders with Rare Pattern Mining for Provenance-Based Anomaly Detection : Abstract: Advanced Persistent Threats (APTs) are sophisticated, long-term cyberattacks that are difficult to detect because they operate stealthily and often blend into normal system behavior. This pa...
- Rare Event Early Detection: A Dataset of Sepsis Onset for Critically Ill Trauma Patients : Abstract: Sepsis is a major public health concern due to its high morbidity, mortality, and cost. Its clinical outcome can be substantially improved through early detection and timely intervention. By...
- 3D-Learning: Diffusion-Augmented Distributionally Robust Decision-Focused Learning : Abstract: Predict-then-Optimize (PTO) pipelines are widely employed in computing and networked systems, where Machine Learning (ML) models are used to predict critical contextual information for downs...
- Variational Sparse Paired Autoencoders (vsPAIR) for Inverse Problems and Uncertainty Quantification : Abstract: Inverse problems are fundamental to many scientific and engineering disciplines; they arise when one seeks to reconstruct hidden, underlying quantities from noisy measurements. Many applicat...
- Quant VideoGen: Auto-Regressive Long Video Generation via 2-Bit KV-Cache Quantization : Abstract: Despite rapid progress in autoregressive video diffusion, an emerging system algorithm bottleneck limits both deployability and generation capability: KV cache memory. In autoregressive vide...
- Human-Centric Traffic Signal Control for Equity: A Multi-Agent Action Branching Deep Reinforcement Learning Approach : Abstract: Coordinating traffic signals along multimodal corridors is challenging because many multi-agent deep reinforcement learning (DRL) approaches remain vehicle-centric and struggle with high-dim...
- Q-ShiftDP: A Differentially Private Parameter-Shift Rule for Quantum Machine Learning : Abstract: Quantum Machine Learning (QML) promises significant computational advantages, but preserving training data privacy remains challenging. Classical approaches like differentially private stoch...
- Co2PO: Coordinated Constrained Policy Optimization for Multi-Agent RL : Abstract: Constrained multi-agent reinforcement learning (MARL) faces a fundamental tension between exploration and safety-constrained optimization. Existing leading approaches, such as Lagrangian met...
- Why Some Models Resist Unlearning: A Linear Stability Perspective : Abstract: Machine unlearning, the ability to erase the effect of specific training samples without retraining from scratch, is critical for privacy, regulation, and efficiency. However, most progress ...
- NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computation...
- Learning to Repair Lean Proofs from Compiler Feedback : Abstract: As neural theorem provers become increasingly agentic, the ability to interpret and act on compiler feedback is critical. However, existing Lean datasets consist almost exclusively of correc...
- Adaptive Batch Sizes Using Non-Euclidean Gradient Noise Scales for Stochastic Sign and Spectral Descent : Abstract: To maximize hardware utilization, modern machine learning systems typically employ large constant or manually tuned batch size schedules, relying on heuristics that are brittle and costly to...
- Causal Graph Spatial-Temporal Autoencoder for Reliable and Interpretable Process Monitoring : Abstract: To improve the reliability and interpretability of industrial process monitoring, this article proposes a Causal Graph Spatial-Temporal Autoencoder (CGSTAE). The network architecture of CGST...
- From Zero to Hero: Advancing Zero-Shot Foundation Models for Tabular Outlier Detection : Abstract: Outlier detection (OD) is widely used in practice; but its effective deployment on new tasks is hindered by lack of labeled outliers, which makes algorithm and hyperparameter selection notor...
- FedKRSO: Communication and Memory Efficient Federated Fine-Tuning of Large Language Models : Abstract: Fine-tuning is essential to adapt general-purpose large language models (LLMs) to domain-specific tasks. As a privacy-preserving framework to leverage decentralized data for collaborative mo...
- Consistency Deep Equilibrium Models : Abstract: Deep Equilibrium Models (DEQs) have emerged as a powerful paradigm in deep learning, offering the ability to model infinite-depth networks with constant memory usage. However, DEQs incur sig...
- SAFE-KD: Risk-Controlled Early-Exit Distillation for Vision Backbones : Abstract: Early-exit networks reduce inference cost by allowing ``easy'' inputs to stop early, but practical deployment hinges on knowing \emph{when} early exit is safe. We introduce SAFE-KD, a univer...
- Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation : Abstract: Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural language prompts. In practice, however, geometric descri...
- CoBA-RL: Capability-Oriented Budget Allocation for Reinforcement Learning in LLMs : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a key approach for enhancing LLM reasoning.However, standard frameworks like Group Relative Policy Optimization (GRPO) ty...
- Fedcompass: Federated Clustered and Periodic Aggregation Framework for Hybrid Classical-Quantum Models : Abstract: Federated learning enables collaborative model training across decentralized clients under privacy constraints. Quantum computing offers potential for alleviating computational and communica...
- Evaluating LLMs When They Do Not Know the Answer: Statistical Evaluation of Mathematical Reasoning via Comparative Signals : Abstract: Evaluating mathematical reasoning in LLMs is constrained by limited benchmark sizes and inherent model stochasticity, yielding high-variance accuracy estimates and unstable rankings across p...
- Shortcut Features as Top Eigenfunctions of NTK: A Linear Neural Network Case and More : Abstract: One of the chronic problems of deep-learning models is shortcut learning. In a case where the majority of training data are dominated by a certain feature, neural networks prefer to learn su...
- FlashSinkhorn: IO-Aware Entropic Optimal Transport : Abstract: Entropic optimal transport (EOT) via Sinkhorn iterations is widely used in modern machine learning, yet GPU solvers remain inefficient at scale. Tensorized implementations suffer quadratic H...
- TMS: Trajectory-Mixed Supervision for Reward-Free, On-Policy SFT : Abstract: Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) are the two dominant paradigms for enhancing Large Language Model (LLM) performance on downstream tasks. While RL generally prese...
- Geometry-Preserving Neural Architectures on Manifolds with Boundary : Abstract: Preserving geometric structure is important in learning. We propose a unified class of geometry-aware architectures that interleave geometric updates between layers, where both projection la...
- Neural Predictor-Corrector: Solving Homotopy Problems with Reinforcement Learning : Abstract: The Homotopy paradigm, a general principle for solving challenging problems, appears across diverse domains such as robust optimization, global optimization, polynomial root-finding, and sam...
- PRISM: Structured Optimization via Anisotropic Spectral Shaping : Abstract: We propose PRISM, an optimizer that enhances first-order spectral descent methods like Muon with partial second-order information. It constructs an efficient, low-rank quasi-second-order pre...
- TextME: Bridging Unseen Modalities Through Text Descriptions : Abstract: Expanding multimodal representations to novel modalities is constrained by reliance on large-scale paired datasets (e.g., text-image, text-audio, text-3D, text-molecule), which are costly an...
- Consensus Group Relative Policy Optimization for Text Generation : Abstract: Many strong decoding methods for text generation follow a sample-and-rerank paradigm: they draw multiple candidates, score each under a utility (reward) function using consensus across sampl...
- Function-Space Empirical Bayes Regularisation with Large Vision-Language Model Priors : Abstract: Bayesian deep learning (BDL) provides a principled framework for reliable uncertainty quantification by combining deep neural networks with Bayesian inference. A central challenge in BDL lie...
- Quantized Evolution Strategies: High-precision Fine-tuning of Quantized LLMs at Low-precision Cost : Abstract: Post-Training Quantization (PTQ) is essential for deploying Large Language Models (LLMs) on memory-constrained devices, yet it renders models static and difficult to fine-tune. Standard fine...
- Contrastive Concept-Tree Search for LLM-Assisted Algorithm Discovery : Abstract: Large language Model (LLM)-assisted algorithm discovery is an iterative, black-box optimization process over programs to approximatively solve a target task, where an LLM proposes candidate ...
- Enhanced Parcel Arrival Forecasting for Logistic Hubs: An Ensemble Deep Learning Approach : Abstract: The rapid expansion of online shopping has increased the demand for timely parcel delivery, compelling logistics service providers to enhance the efficiency, agility, and predictability of t...
- SATORIS-N: Spectral Analysis based Traffic Observation Recovery via Informed Subspaces and Nuclear-norm minimization : Abstract: Traffic-density matrices from different days exhibit both low rank and stable correlations in their singular-vector subspaces. Leveraging this, we introduce SATORIS-N, a framework for imputi...
- Self-Hinting Language Models Enhance Reinforcement Learning : Abstract: Group Relative Policy Optimization (GRPO) has recently emerged as a practical recipe for aligning large language models with verifiable objectives. However, under sparse terminal rewards, GR...
- What Makes a Good Example? Modeling Exemplar Selection with Neural Network Representations : Abstract: Teaching requires distilling a rich category distribution into a small set of informative exemplars. Although prior work shows that humans consider both representativeness and diversity when...
- MemCast: Memory-Driven Time Series Forecasting with Experience-Conditioned Reasoning : Abstract: Time series forecasting (TSF) plays a critical role in decision-making for many real-world applications. Recently, LLM-based forecasters have made promising advancements. Despite their effec...
- StepScorer: Accelerating Reinforcement Learning with Step-wise Scoring and Psychological Regret Modeling : Abstract: Reinforcement learning algorithms often suffer from slow convergence due to sparse reward signals, particularly in complex environments where feedback is delayed or infrequent. This paper in...
- Adversarial construction as a potential solution to the experiment design problem in large task spaces : Abstract: Despite decades of work, we still lack a robust, task-general theory of human behavior even in the simplest domains. In this paper we tackle the generality problem head-on, by aiming to deve...
- Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm Feedback : Abstract: We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/server...
- DynSplit-KV: Dynamic Semantic Splitting for KVCache Compression in Efficient Long-Context LLM Inference : Abstract: Although Key-Value (KV) Cache is essential for efficient large language models (LLMs) inference, its growing memory footprint in long-context scenarios poses a significant bottleneck, making...
- Prompt Augmentation Scales up GRPO Training on Mathematical Reasoning : Abstract: Reinforcement learning algorithms such as group-relative policy optimization (GRPO) have demonstrated strong potential for improving the mathematical reasoning capabilities of large language...
- Reinforcement Learning with Promising Tokens for Large Language Models : Abstract: Reinforcement learning (RL) has emerged as a key paradigm for aligning and optimizing large language models (LLMs). Standard approaches treat the LLM as the policy and apply RL directly over...
- From Scalar Rewards to Potential Trends: Shaping Potential Landscapes for Model-Based Reinforcement Learning : Abstract: Model-based reinforcement learning (MBRL) achieves high sample efficiency by simulating future trajectories with learned dynamics and reward models. However, its effectiveness is severely co...
- Sparsity is Combinatorial Depth: Quantifying MoE Expressivity via Tropical Geometry : Abstract: While Mixture-of-Experts (MoE) architectures define the state-of-the-art, their theoretical success is often attributed to heuristic efficiency rather than geometric expressivity. In this wo...
- Spectral Evolution Search: Efficient Inference-Time Scaling for Reward-Aligned Image Generation : Abstract: Inference-time scaling offers a versatile paradigm for aligning visual generative models with downstream objectives without parameter updates. However, existing approaches that optimize the ...
- Lookahead Sample Reward Guidance for Test-Time Scaling of Diffusion Models : Abstract: Diffusion models have demonstrated strong generative performance; however, generated samples often fail to fully align with human intent. This paper studies a test-time scaling method that e...
- Topology Matters: A Cautionary Case Study of Graph SSL on Neuro-Inspired Benchmarks : Abstract: Understanding how local interactions give rise to global brain organization requires models that can represent information across multiple scales. We introduce a hierarchical self-supervised...
- BayeSQP: Bayesian Optimization through Sequential Quadratic Programming : Abstract: We introduce BayeSQP, a novel algorithm for general black-box optimization that merges the structure of sequential quadratic programming with concepts from Bayesian optimization. BayeSQP emp...
- Merging Beyond: Streaming LLM Updates via Activation-Guided Rotations : Abstract: The escalating scale of Large Language Models (LLMs) necessitates efficient adaptation techniques. Model merging has gained prominence for its efficiency and controllability. However, existi...
- GraDE: A Graph Diffusion Estimator for Frequent Subgraph Discovery in Neural Architectures : Abstract: Finding frequently occurring subgraph patterns or network motifs in neural architectures is crucial for optimizing efficiency, accelerating design, and uncovering structural insights. Howeve...
- Beyond Suffixes: Token Position in GCG Adversarial Attacks on Large Language Models : Abstract: Large Language Models (LLMs) have seen widespread adoption across multiple domains, creating an urgent need for robust safety alignment mechanisms. However, robustness remains challenging du...
- Unveiling Covert Toxicity in Multimodal Data via Toxicity Association Graphs: A Graph-Based Metric and Interpretable Detection Framework : Abstract: Detecting toxicity in multimodal data remains a significant challenge, as harmful meanings often lurk beneath seemingly benign individual modalities: only emerging when modalities are combin...
- BlockRR: A Unified Framework of RR-type Algorithms for Label Differential Privacy : Abstract: In this paper, we introduce BlockRR, a novel and unified randomized-response mechanism for label differential privacy. This framework generalizes existed RR-type mechanisms as special cases ...
- Universal Approximation of Continuous Functionals on Compact Subsets via Linear Measurements and Scalar Nonlinearities : Abstract: We study universal approximation of continuous functionals on compact subsets of products of Hilbert spaces. We prove that any such functional can be uniformly approximated by models that fi...
- Anomaly Detection via Mean Shift Density Enhancement : Abstract: Unsupervised anomaly detection stands as an important problem in machine learning, with applications in financial fraud prevention, network security and medical diagnostics. Existing unsuper...
- Lipschitz Multiscale Deep Equilibrium Models: A Theoretically Guaranteed and Accelerated Approach : Abstract: Deep equilibrium models (DEQs) achieve infinitely deep network representations without stacking layers by exploring fixed points of layer transformations in neural networks. Such models cons...
- R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model? : Abstract: In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end...
- Periodic Regularized Q-Learning : Abstract: In reinforcement learning (RL), Q-learning is a fundamental algorithm whose convergence is guaranteed in the tabular setting. However, this convergence guarantee does not hold under linear f...
- medR: Reward Engineering for Clinical Offline Reinforcement Learning via Tri-Drive Potential Functions : Abstract: Reinforcement Learning (RL) offers a powerful framework for optimizing dynamic treatment regimes (DTRs). However, clinical RL is fundamentally bottlenecked by reward engineering: the challen...
- Entropy-Gated Selective Policy Optimization:Token-Level Gradient Allocation for Hybrid Training of Large Language Models : Abstract: Hybrid training methods for large language models combine supervised fine tuning (SFT) on expert demonstrations with reinforcement learning (RL) on model rollouts, typically at the sample le...
- Information-Theoretic Multi-Model Fusion for Target-Oriented Adaptive Sampling in Materials Design : Abstract: Target-oriented discovery under limited evaluation budgets requires making reliable progress in high-dimensional, heterogeneous design spaces where each new measurement is costly, whether ex...
- From Inexact Gradients to Byzantine Robustness: Acceleration and Optimization under Similarity : Abstract: Standard federated learning algorithms are vulnerable to adversarial nodes, a.k.a. Byzantine failures. To solve this issue, robust distributed learning algorithms have been developed, which ...
- Bayesian Conformal Prediction as a Decision Risk Problem : Abstract: Bayesian posterior predictive densities as non-conformity scores and Bayesian quadrature are used to estimate and minimise the expected prediction set size. Operating within a split conforma...
- Robustness as an Emergent Property of Task Performance : Abstract: Robustness is often regarded as a critical future challenge for real-world applications, where stability is essential. However, as models often learn tasks in a similar order, we hypothesize...
- Causal Graph Learning via Distributional Invariance of Cause-Effect Relationship : Abstract: This paper introduces a new framework for recovering causal graphs from observational data, leveraging the observation that the distribution of an effect, conditioned on its causes, remains ...
- GraphDancer: Training LLMs to Explore and Reason over Graphs via Curriculum Reinforcement Learning : Abstract: Large language models (LLMs) increasingly rely on external knowledge to improve factuality, yet many real-world knowledge sources are organized as heterogeneous graphs rather than plain text...
- Scaled Dot-Product Attention implements projection of inputs onto a common surface : Abstract: Scaled dot-product attention (SDPA) is a fundamental component responsible for the success of large-language models and other nonlinear signal processing applications. The rationale for SDPA...
- IMU-1: Sample-Efficient Pre-training of Small Language Models : Abstract: We present IMU-1, a 430M-parameter language model trained on 72B tokens that approaches the benchmark performance of models trained on 56x more data. We describe a validated training recipe ...
- TabularMath: Evaluating Computational Extrapolation in Tabular Learning via Program-Verified Synthesis : Abstract: Standard tabular benchmarks mainly focus on the evaluation of a model's capability to interpolate values inside a data manifold, where models good at performing local statistical smoothing a...
- The "Robert Boulton" Singularity: Semantic Tunneling and Manifold Unfolding in Recursive AI : Abstract: The stability of generative artificial intelligence trained on recursive synthetic data is conventionally monitored via Perplexity (PPL). We demonstrate that PPL is a deceptive metric in con...
- Incident-Guided Spatiotemporal Traffic Forecasting : Abstract: Recent years have witnessed the rapid development of deep-learning-based, graph-neural-network-based forecasting methods for modern intelligent transportation systems. However, most existing...
- Formulating Reinforcement Learning for Human-Robot Collaboration through Off-Policy Evaluation : Abstract: Reinforcement learning (RL) has the potential to transform real-world decision-making systems by enabling autonomous agents to learn from experience. Deploying RL in real-world settings, esp...
- Hypersonic Flow Control: Generalized Deep Reinforcement Learning for Hypersonic Intake Unstart Control under Uncertainty : Abstract: The hypersonic unstart phenomenon poses a major challenge to reliable air-breathing propulsion at Mach 5 and above, where strong shock-boundary-layer interactions and rapid pressure fluctuat...
- CADENT: Gated Hybrid Distillation for Sample-Efficient Transfer in Reinforcement Learning : Abstract: Transfer learning promises to reduce the high sample complexity of deep reinforcement learning (RL), yet existing methods struggle with domain shift between source and target environments. P...
- Enhancing Psychologists' Understanding through Explainable Deep Learning Framework for ADHD Diagnosis : Abstract: Attention Deficit Hyperactivity Disorder (ADHD) is a neurodevelopmental disorder that is challenging to diagnose and requires advanced approaches for reliable and transparent identification ...
- From Sparse Decisions to Dense Reasoning: A Multi-attribute Trajectory Paradigm for Multimodal Moderation : Abstract: Safety moderation is pivotal for identifying harmful content. Despite the success of textual safety moderation, its multimodal counterparts remain hindered by a dual sparsity of data and sup...
- Enhancing Post-Training Quantization via Future Activation Awareness : Abstract: Post-training quantization (PTQ) is a widely used method to compress large language models (LLMs) without fine-tuning. It typically sets quantization hyperparameters (e.g., scaling factors) ...
- How Much Information Can a Vision Token Hold? A Scaling Law for Recognition Limits in VLMs : Abstract: Recent vision-centric approaches have made significant strides in long-context modeling. Represented by DeepSeek-OCR, these models encode rendered text into continuous vision tokens, achievi...
- Auto-Augmentation Contrastive Learning for Wearable-based Human Activity Recognition : Abstract: For low-semantic sensor signals from human activity recognition (HAR), contrastive learning (CL) is essential to implement novel applications or generic models without manual annotation, whi...
- Toward Ultra-Long-Horizon Sequential Model Editing : Abstract: Model editing has emerged as a practical approach for mitigating factual errors and outdated knowledge in large language models (LLMs). Among existing methods, the Locate-and-Edit (L&E) para...
- SPA-Cache: Singular Proxies for Adaptive Caching in Diffusion Language Models : Abstract: While Diffusion Language Models (DLMs) offer a flexible, arbitrary-order alternative to the autoregressive paradigm, their non-causal nature precludes standard KV caching, forcing costly hid...
- Beyond Alignment: Expanding Reasoning Capacity via Manifold-Reshaping Policy Optimization : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has demonstrated remarkable success in enhancing the reasoning capabilities of Large Language Models (LLMs). However, recent studies que...
- D$^2$Quant: Accurate Low-bit Post-Training Weight Quantization for LLMs : Abstract: Large language models (LLMs) deliver strong performance, but their high compute and memory costs make deployment difficult in resource-constrained scenarios. Weight-only post-training quanti...
- naPINN: Noise-Adaptive Physics-Informed Neural Networks for Recovering Physics from Corrupted Measurement : Abstract: Physics-Informed Neural Networks (PINNs) are effective methods for solving inverse problems and discovering governing equations from observational data. However, their performance degrades s...
- ToolTok: Tool Tokenization for Efficient and Generalizable GUI Agents : Abstract: Existing GUI agent models relying on coordinate-based one-step visual grounding struggle with generalizing to varying input resolutions and aspect ratios. Alternatives introduce coordinate-f...
- HyPAC: Cost-Efficient LLMs-Human Hybrid Annotation with PAC Error Guarantees : Abstract: Data annotation often involves multiple sources with different cost-quality trade-offs, such as fast large language models (LLMs), slow reasoning models, and human experts. In this work, we ...
- EEO-TFV: Escape-Explore Optimizer for Web-Scale Time-Series Forecasting and Vision Analysis : Abstract: Transformer-based foundation models have achieved remarkable progress in tasks such as time-series forecasting and image segmentation. However, they frequently suffer from error accumulation...
- BatCoder: Self-Supervised Bidirectional Code-Documentation Learning via Back-Translation : Abstract: Training LLMs for code-related tasks typically depends on high-quality code-documentation pairs, which are costly to curate and often scarce for niche programming languages. We introduce Bat...
- Learning to Explore with Parameter-Space Noise: A Deep Dive into Parameter-Space Noise for Reinforcement Learning with Verifiable Rewards : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) improves LLM reasoning, yet growing evidence indicates an exploration ceiling: it often reweights existing solution traces rather than d...
- Beyond Experience Retrieval: Learning to Generate Utility-Optimized Structured Experience for Frozen LLMs : Abstract: Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can int...
- The Alignment Curse: Cross-Modality Jailbreak Transfer in Omni-Models : Abstract: Recent advances in end-to-end trained omni-models have significantly improved multimodal understanding. At the same time, safety red-teaming has expanded beyond text to encompass audio-based...
- PA-MIL: Phenotype-Aware Multiple Instance Learning Guided by Language Prompting and Genotype-to-Phenotype Relationships : Abstract: Deep learning has been extensively researched in the analysis of pathology whole-slide images (WSIs). However, most existing methods are limited to providing prediction interpretability by l...
- Auditing Sybil: Explaining Deep Lung Cancer Risk Prediction Through Generative Interventional Attributions : Abstract: Lung cancer remains the leading cause of cancer mortality, driving the development of automated screening tools to alleviate radiologist workload. Standing at the frontier of this effort is ...
- A General ReLearner: Empowering Spatiotemporal Prediction by Re-learning Input-label Residual : Abstract: Prevailing spatiotemporal prediction models typically operate under a forward (unidirectional) learning paradigm, in which models extract spatiotemporal features from historical observation ...
- Label Curation Using Agentic AI : Abstract: Data annotation is essential for supervised learning, yet producing accurate, unbiased, and scalable labels remains challenging as datasets grow in size and modality. Traditional human-centr...
- High Rank Matrix Completion via Grassmannian Proxy Fusion : Abstract: This paper approaches high-rank matrix completion (HRMC) by filling missing entries in a data matrix where columns lie near a union of subspaces, clustering these columns, and identifying th...
- A Comparative Simulation Study of the Fairness and Accuracy of Predictive Policing Systems in Baltimore City : Abstract: There are ongoing discussions about predictive policing systems, such as those deployed in Los Angeles, California and Baltimore, Maryland, being unfair, for example, by exhibiting racial bi...
- IceBench-S2S: A Benchmark of Deep Learning for Challenging Subseasonal-to-Seasonal Daily Arctic Sea Ice Forecasting in Deep Latent Space : Abstract: Arctic sea ice plays a critical role in regulating Earth's climate system, significantly influencing polar ecological stability and human activities in coastal regions. Recent advances in ar...
- Mitigating Task-Order Sensitivity and Forgetting via Hierarchical Second-Order Consolidation : Abstract: We introduce $\textbf{Hierarchical Taylor Series-based Continual Learning (HTCL)}$, a framework that couples fast local adaptation with conservative, second-order global consolidation to add...
- Trajectory Consistency for One-Step Generation on Euler Mean Flows : Abstract: We propose \emph{Euler Mean Flows (EMF)}, a flow-based generative framework for one-step and few-step generation that enforces long-range trajectory consistency with minimal sampling cost. T...
- Reward Shaping for Inference-Time Alignment: A Stackelberg Game Perspective : Abstract: Existing alignment methods directly use the reward model learned from user preference data to optimize an LLM policy, subject to KL regularization with respect to the base policy. This pract...
- Product Interaction: An Algebraic Formalism for Deep Learning Architectures : Abstract: In this paper, we introduce product interactions, an algebraic formalism in which neural network layers are constructed from compositions of a multiplication operator defined over suitable a...
- QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals : Abstract: Weight-only quantization is important for compressing Large Language Models (LLMs). Inspired by the spirit of classical magnitude pruning, we study whether the magnitude of weight updates du...
- Copula-Based Aggregation and Context-Aware Conformal Prediction for Reliable Renewable Energy Forecasting : Abstract: The rapid growth of renewable energy penetration has intensified the need for reliable probabilistic forecasts to support grid operations at aggregated (fleet or system) levels. In practice,...
- Learnable Koopman-Enhanced Transformer-Based Time Series Forecasting with Spectral Control : Abstract: This paper proposes a unified family of learnable Koopman operator parameterizations that integrate linear dynamical systems theory with modern deep learning forecasting architectures. We in...
- Effective Frontiers: A Unification of Neural Scaling Laws : Abstract: Neural scaling laws govern the prediction power-law improvement of test loss with respect to model capacity ($N$), datasize ($D$), and compute ($C$). However, existing theoretical explanatio...
- Fubini Study geometry of representation drift in high dimensional data : Abstract: High dimensional representation drift is commonly quantified using Euclidean or cosine distances, which presuppose fixed coordinates when comparing representations across time, training or p...
- ContextEvolve: Multi-Agent Context Compression for Systems Code Optimization : Abstract: Large language models are transforming systems research by automating the discovery of performance-critical algorithms for computer systems. Despite plausible codes generated by LLMs, produc...
- RAP: KV-Cache Compression via RoPE-Aligned Pruning : Abstract: Long-context inference in large language models is increasingly bottlenecked by the memory and compute cost of the KV-Cache. Low-rank factorization compresses KV projections by writing $W \a...
- Step-Wise Refusal Dynamics in Autoregressive and Diffusion Language Models : Abstract: Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) models, offering parallel decoding and controllable sampling dynamics while achieving...
- Discovering Data Manifold Geometry via Non-Contracting Flows : Abstract: We introduce an unsupervised approach for constructing a global reference system by learning, in the ambient space, vector fields that span the tangent spaces of an unknown data manifold. In...
- A Semi-Supervised Pipeline for Generalized Behavior Discovery from Animal-Borne Motion Time Series : Abstract: Learning behavioral taxonomies from animal-borne sensors is challenging because labels are scarce, classes are highly imbalanced, and behaviors may be absent from the annotated set. We study...
- daVinci-Agency: Unlocking Long-Horizon Agency Data-Efficiently : Abstract: While Large Language Models (LLMs) excel at short-term tasks, scaling them to long-horizon agentic workflows remains challenging. The core bottleneck lies in the scarcity of training data th...
- Learning Consistent Causal Abstraction Networks : Abstract: Causal artificial intelligence aims to enhance explainability, trustworthiness, and robustness in AI by leveraging structural causal models (SCMs). In this pursuit, recent advances formalize...
- Learning Better Certified Models from Empirically-Robust Teachers : Abstract: Adversarial training attains strong empirical robustness to specific adversarial attacks by training on concrete adversarial perturbations, but it produces neural networks that are not amena...
- Performance of Small Language Model Pretraining on FABRIC: An Empirical Study : Abstract: Large language models (LLMs) require enormous computing power to pretrain on massive datasets. When limited datasets are available, smaller-sized LLMs are better choice to pretrain (on user-...
- A Reduction from Delayed to Immediate Feedback for Online Convex Optimization with Improved Guarantees : Abstract: We develop a reduction-based framework for online learning with delayed feedback that recovers and improves upon existing results for both first-order and bandit convex optimization. Our app...
- hSNMF: Hybrid Spatially Regularized NMF for Image-Derived Spatial Transcriptomics : Abstract: High-resolution spatial transcriptomics platforms, such as Xenium, generate single-cell images that capture both molecular and spatial context, but their extremely high dimensionality poses ...
- MARA: Continuous SE(3)-Equivariant Attention for Molecular Force Fields : Abstract: Machine learning force fields (MLFFs) have become essential for accurate and efficient atomistic modeling. Despite their high accuracy, most existing approaches rely on fixed angular expansi...
- FlexRank: Nested Low-Rank Knowledge Decomposition for Adaptive Model Deployment : Abstract: The growing scale of deep neural networks, encompassing large language models (LLMs) and vision transformers (ViTs), has made training from scratch prohibitively expensive and deployment inc...
- Expert-Data Alignment Governs Generation Quality in Decentralized Diffusion Models : Abstract: Decentralized Diffusion Models (DDMs) route denoising through experts trained independently on disjoint data clusters, which can strongly disagree in their predictions. What governs the qual...
- Sparsely Supervised Diffusion : Abstract: Diffusion models have shown remarkable success across a wide range of generative tasks. However, they often suffer from spatially inconsistent generation, arguably due to the inherent locali...
- Every Bit Counts: A Theoretical Study of Precision-Expressivity Tradeoffs in Quantized Transformers : Abstract: Quantization reduces the numerical precision of Transformer computations and is widely used to accelerate inference, yet its effect on expressivity remains poorly characterized. We demonstra...
- BinaryPPO: Efficient Policy Optimization for Binary Classification : Abstract: Supervised fine-tuning (SFT) is the standard approach for binary classification tasks such as toxicity detection, factuality verification, and causal inference. However, SFT often performs p...
- Maximum Likelihood Reinforcement Learning : Abstract: Reinforcement learning is the method of choice to train models in sampling-based setups with binary outcome feedback, such as navigation, code generation, and mathematical problem solving. I...
- Towards Understanding Steering Strength : Abstract: A popular approach to post-training control of large language models (LLMs) is the steering of intermediate latent representations. Namely, identify a well-chosen direction depending on the ...
- Neural Probabilistic Amplitude Shaping for Nonlinear Fiber Channels : Abstract: We introduce neural probabilistic amplitude shaping, a joint-distribution learning framework for coherent fiber systems. The proposed scheme provides a 0.5 dB signal-to-noise ratio gain over...
- Hierarchical Entity-centric Reinforcement Learning with Factored Subgoal Diffusion : Abstract: We propose a hierarchical entity-centric framework for offline Goal-Conditioned Reinforcement Learning (GCRL) that combines subgoal decomposition with factored structure to solve long-horizo...
- Automated Dysphagia Screening Using Noninvasive Neck Acoustic Sensing : Abstract: Pharyngeal health plays a vital role in essential human functions such as breathing, swallowing, and vocalization. Early detection of swallowing abnormalities, also known as dysphagia, is cr...
- Vector Quantized Latent Concepts: A Scalable Alternative to Clustering-Based Concept Discovery : Abstract: Deep Learning models encode rich semantic information in their hidden representations. However, it remains challenging to understand which parts of this information models actually rely on w...
- Search-Augmented Masked Diffusion Models for Constrained Generation : Abstract: Discrete diffusion models generate sequences by iteratively denoising samples corrupted by categorical noise, offering an appealing alternative to autoregressive decoding for structured and ...
- CAPS: Unifying Attention, Recurrence, and Alignment in Transformer-based Time Series Forecasting : Abstract: This paper presents $\textbf{CAPS}$ (Clock-weighted Aggregation with Prefix-products and Softmax), a structured attention mechanism for time series forecasting that decouples three distinct ...
- TabPFN for Zero-shot Parametric Engineering Design Generation : Abstract: Deep generative models for engineering design often require substantial computational cost, large training datasets, and extensive retraining when design requirements or datasets change, lim...
- TopoPrune: Robust Data Pruning via Unified Latent Space Topology : Abstract: Geometric data pruning methods, while practical for leveraging pretrained models, are fundamentally unstable. Their reliance on extrinsic geometry renders them highly sensitive to latent spa...
- Entropy-Guided Dynamic Tokens for Graph-LLM Alignment in Molecular Understanding : Abstract: Molecular understanding is central to advancing areas such as scientific discovery, yet Large Language Models (LLMs) struggle to understand molecular graphs effectively. Existing graph-LLM b...
- On the Sample Efficiency of Inverse Dynamics Models for Semi-Supervised Imitation Learning : Abstract: Semi-supervised imitation learning (SSIL) consists in learning a policy from a small dataset of action-labeled trajectories and a much larger dataset of action-free trajectories. Some SSIL m...
- Exposing Vulnerabilities in Explanation for Time Series Classifiers via Dual-Target Attacks : Abstract: Interpretable time series deep learning systems are often assessed by checking temporal consistency on explanations, implicitly treating this as evidence of robustness. We show that this ass...
- Privately Fine-Tuned LLMs Preserve Temporal Dynamics in Tabular Data : Abstract: Research on differentially private synthetic tabular data has largely focused on independent and identically distributed rows where each record corresponds to a unique individual. This persp...
- Provable Effects of Data Replay in Continual Learning: A Feature Learning Perspective : Abstract: Continual learning (CL) aims to train models on a sequence of tasks while retaining performance on previously learned ones. A core challenge in this setting is catastrophic forgetting, where...
- BiTimeCrossNet: Time-Aware Self-Supervised Learning for Pediatric Sleep : Abstract: We present BiTimeCrossNet (BTCNet), a multimodal self-supervised learning framework for long physiological recordings such as overnight sleep studies. While many existing approaches train on...
- VerIde ECG Biometrics: Verification and Identification : Abstract: This work studies electrocardiogram (ECG) biometrics at large scale, evaluating how strongly an ECG can be linked to an individual and, consequently, how its anonymization may be compromised...
- Cross-Temporal Attention Fusion (CTAF) for Multimodal Physiological Signals in Self-Supervised Learning : Abstract: We study multimodal affect modeling when EEG and peripheral physiology are asynchronous, which most fusion methods ignore or handle with costly warping. We propose Cross-Temporal Attention F...
- LEMON: Local Explanations via Modality-aware OptimizatioN : Abstract: Multimodal models are ubiquitous, yet existing explainability methods are often single-modal, architecture-dependent, or too computationally expensive to run at scale. We introduce LEMON (Lo...
- Structure-Preserving Learning Improves Geometry Generalization in Neural PDEs : Abstract: We aim to develop physics foundation models for science and engineering that provide real-time solutions to Partial Differential Equations (PDEs) which preserve structure and accuracy under ...
- Causality--\Delta: Jacobian-Based Dependency Analysis in Flow Matching Models : Abstract: Flow matching learns a velocity field that transports a base distribution to data. We study how small latent perturbations propagate through these flows and show that Jacobian-vector product...
- Joint Learning of Hierarchical Neural Options and Abstract World Model : Abstract: Building agents that can perform new skills by composing existing skills is a long-standing goal of AI agent research. Towards this end, we investigate how to efficiently acquire a sequence ...
- Membership Inference Attacks from Causal Principles : Abstract: Membership Inference Attacks (MIAs) are widely used to quantify training data memorization and assess privacy risks. Standard evaluation requires repeated retraining, which is computationall...
- From Tokens to Numbers: Continuous Number Modeling for SVG Generation : Abstract: For certain image generation tasks, vector graphics such as Scalable Vector Graphics (SVGs) offer clear benefits such as increased flexibility, size efficiency, and editing ease, but remain ...
- A Single Revision Step Improves Token-Efficient LLM Reasoning : Abstract: Large language models (LLMs) achieve higher accuracy on challenging reasoning tasks by scaling test-time compute through multiple trajectory sampling. However, standard aggregation methods l...
- SC3D: Dynamic and Differentiable Causal Discovery for Temporal and Instantaneous Graphs : Abstract: Discovering causal structures from multivariate time series is a key problem because interactions span across multiple lags and possibly involve instantaneous dependencies. Additionally, the...
- UNSO: Unified Newton Schulz Orthogonalization : Abstract: The Newton-Schulz (NS) iteration has gained increasing interest for its role in the Muon optimizer and the Stiefel manifold. However, the conventional NS iteration suffers from inefficiency ...
- Augmenting Parameter-Efficient Pre-trained Language Models with Large Language Models : Abstract: Training AI models in cybersecurity with help of vast datasets offers significant opportunities to mimic real-world behaviors effectively. However, challenges like data drift and scarcity of...
- Sparse Adapter Fusion for Continual Learning in NLP : Abstract: Continual learning in natural language processing plays a crucial role in adapting to evolving data and preventing catastrophic forgetting. Despite significant progress, existing methods sti...
- Learning ORDER-Aware Multimodal Representations for Composite Materials Design : Abstract: Artificial intelligence (AI) has shown remarkable success in materials discovery and property prediction, particularly for crystalline and polymer systems where material properties and struc...
- What Drives Length of Stay After Elective Spine Surgery? Insights from a Decade of Predictive Modeling : Abstract: Objective: Predicting length of stay after elective spine surgery is essential for optimizing patient outcomes and hospital resource use. This systematic review synthesizes computational met...
Research Sources: 577 | Generated: 2/4/2026
