AI RESEARCH PAPERS & ACADEMIC SOURCES
- Uncertainty-Aware Domain Adaptation for Vitiligo Segmentation in Clinical Photographs : Abstract: Accurately quantifying vitiligo extent in routine clinical photographs is crucial for longitudinal monitoring of treatment response. We propose a trustworthy, frequency-aware segmentation fr...
- Structure From Tracking: Distilling Structure-Preserving Motion for Video Generation : Abstract: Reality is a dance between rigid constraints and deformable structures. For video models, that means generating motion that preserves fidelity as well as structure. Despite progress in diffu...
- V-RGBX: Video Editing with Accurate Controls over Intrinsic Properties : Abstract: Large-scale video generation models have shown remarkable potential in modeling photorealistic appearance and lighting interactions in real-world scenes. However, a closed-loop framework tha...
- Moment-Based 3D Gaussian Splatting: Resolving Volumetric Occlusion with Order-Independent Transmittance : Abstract: The recent success of 3D Gaussian Splatting (3DGS) has reshaped novel view synthesis by enabling fast optimization and real-time rendering of high-quality radiance fields. However, it relies...
- Seeing to Act, Prompting to Specify: A Bayesian Factorization of Vision Language Action Policy : Abstract: The pursuit of out-of-distribution generalization in Vision-Language-Action (VLA) models is often hindered by catastrophic forgetting of the Vision-Language Model (VLM) backbone during fine-...
- Stochastics of shapes and Kunita flows : Abstract: Stochastic processes of evolving shapes are used in applications including evolutionary biology, where morphology changes stochastically as a function of evolutionary processes. Due to the n...
- Particle Image Velocimetry Refinement via Consensus ADMM : Abstract: Particle Image Velocimetry (PIV) is an imaging technique in experimental fluid dynamics that quantifies flow fields around bluff bodies by analyzing the displacement of neutrally buoyant tra...
- mViSE: A Visual Search Engine for Analyzing Multiplex IHC Brain Tissue Images : Abstract: Whole-slide multiplex imaging of brain tissue generates massive information-dense images that are challenging to analyze and require custom software. We present an alternative query-driven p...
- AnchorDream: Repurposing Video Diffusion for Embodiment-Aware Robot Data Synthesis : Abstract: The collection of large-scale and diverse robot demonstrations remains a major bottleneck for imitation learning, as real-world data acquisition is costly and simulators offer limited divers...
- Efficient Action Counting with Dynamic Queries : Abstract: Temporal repetition counting aims to quantify the repeated action cycles within a video. The majority of existing methods rely on the similarity correlation matrix to characterize the repeti...
- Conditional Text-to-Image Generation with Reference Guidance : Abstract: Text-to-image diffusion models have demonstrated tremendous success in synthesizing visually stunning images given textual instructions. Despite remarkable progress in creating high-fidelity...
- Denoising Diffusion Models for Anomaly Localization in Medical Images : Abstract: This review explores anomaly localization in medical images using denoising diffusion models. After providing a brief methodological background of these models, including their application t...
- Building Patient Journeys in Hebrew: A Language Model for Clinical Timeline Extraction : Abstract: We present a new Hebrew medical language model designed to extract structured clinical timelines from electronic health records, enabling the construction of patient journeys. Our model is b...
- Extending a Parliamentary Corpus with MPs' Tweets: Automatic Annotation and Evaluation Using MultiParTweet : Abstract: Social media serves as a critical medium in modern politics because it both reflects politicians' ideologies and facilitates communication with younger generations. We present MultiParTweet,...
- Speculative Decoding Speed-of-Light: Optimal Lower Bounds via Branching Random Walks : Abstract: Speculative generation has emerged as a promising technique to accelerate inference in large language models (LLMs) by leveraging parallelism to verify multiple draft tokens simultaneously. ...
- SUMFORU: An LLM-Based Review Summarization Framework for Personalized Purchase Decision Support : Abstract: Online product reviews contain rich but noisy signals that overwhelm users and hinder effective decision-making. Existing LLM-based summarizers remain generic and fail to account for individ...
- SCOUT: A Defense Against Data Poisoning Attacks in Fine-Tuned Language Models : Abstract: Backdoor attacks create significant security threats to language models by embedding hidden triggers that manipulate model behavior during inference, presenting critical risks for AI systems...
- HFS: Holistic Query-Aware Frame Selection for Efficient Video Reasoning : Abstract: Key frame selection in video understanding presents significant challenges. Traditional top-K selection methods, which score frames independently, often fail to optimize the selection as a w...
- Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation : Abstract: Despite progress in melody-to-lyric generation, a substantial singability gap remains between machine-generated lyrics and those written by human lyricists. In this work, we aim to narrow th...
- Weakly Supervised Tuberculosis Localization in Chest X-rays through Knowledge Distillation : Abstract: Tuberculosis (TB) remains one of the leading causes of mortality worldwide, particularly in resource-limited countries. Chest X-ray (CXR) imaging serves as an accessible and cost-effective d...
- Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning : Abstract: Vision-Language Models (VLMs) offer a promising path toward interpretable medical diagnosis by allowing users to ask about clinical explanations alongside predictions and across different mo...
- VDAWorld: World Modelling via VLM-Directed Abstraction and Simulation : Abstract: Generative video models, a leading approach to world modeling, face fundamental limitations. They often violate physical and logical rules, lack interactivity, and operate as opaque black bo...
- E-CHUM: Event-based Cameras for Human Detection and Urban Monitoring : Abstract: Understanding human movement and city dynamics has always been challenging. From traditional methods of manually observing the city's inhabitant, to using cameras, to now using sensors and m...
- Vision-Language Models for Infrared Industrial Sensing in Additive Manufacturing Scene Description : Abstract: Many manufacturing environments operate in low-light conditions or within enclosed machines where conventional vision systems struggle. Infrared cameras provide complementary advantages in s...
- VGent: Visual Grounding via Modular Design for Disentangling Reasoning and Prediction : Abstract: Current visual grounding models are either based on a Multimodal Large Language Model (MLLM) that performs auto-regressive decoding, which is slow and risks hallucinations, or on re-aligning...
- Information-driven Fusion of Pathology Foundation Models for Enhanced Disease Characterization : Abstract: Foundation models (FMs) have demonstrated strong performance across diverse pathology tasks. While there are similarities in the pre-training objectives of FMs, there is still limited unders...
- Learning from a Generative Oracle: Domain Adaptation for Restoration : Abstract: Pre-trained image restoration models often fail on real-world, out-of-distribution degradations due to significant domain gaps. Adapting to these unseen domains is challenging, as out-of-dis...
- Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching : Abstract: Stereo foundation models achieve strong zero-shot generalization but remain computationally prohibitive for real-time applications. Efficient stereo architectures, on the other hand, sacrifi...
- Learning complete and explainable visual representations from itemized text supervision : Abstract: Training vision models with language supervision enables general and transferable representations. However, many visual domains, especially non-object-centric domains such as medical imaging...
- Lightweight 3D Gaussian Splatting Compression via Video Codec : Abstract: Current video-based GS compression methods rely on using Parallel Linear Assignment Sorting (PLAS) to convert 3D GS into smooth 2D maps, which are computationally expensive and time-consumin...
- Multi-task Learning with Extended Temporal Shift Module for Temporal Action Localization : Abstract: We present our solution to the BinEgo-360 Challenge at ICCV 2025, which focuses on temporal action localization (TAL) in multi-perspective and multi-modal video settings. The challenge provi...
- AutoRefiner: Improving Autoregressive Video Diffusion Models via Reflective Refinement Over the Stochastic Sampling Path : Abstract: Autoregressive video diffusion models (AR-VDMs) show strong promise as scalable alternatives to bidirectional VDMs, enabling real-time and interactive applications. Yet there remains room fo...
- SmokeBench: Evaluating Multimodal Large Language Models for Wildfire Smoke Detection : Abstract: Wildfire smoke is transparent, amorphous, and often visually confounded with clouds, making early-stage detection particularly challenging. In this work, we introduce a benchmark, called Smo...
- FutureX: Enhance End-to-End Autonomous Driving via Latent Chain-of-Thought World Model : Abstract: In autonomous driving, end-to-end planners learn scene representations from raw sensor data and utilize them to generate a motion plan or control actions. However, exclusive reliance on the ...
- REST: Diffusion-based Real-time End-to-end Streaming Talking Head Generation via ID-Context Caching and Asynchronous Streaming Distillation : Abstract: Diffusion models have significantly advanced the field of talking head generation. However, the slow inference speeds and non-autoregressive paradigms severely constrain the application of d...
- RoomPilot: Controllable Synthesis of Interactive Indoor Environments via Multimodal Semantic Parsing : Abstract: Generating controllable and interactive indoor scenes is fundamental to applications in game development, architectural visualization, and embodied AI training. Yet existing approaches eithe...
- WildCap: Facial Appearance Capture in the Wild via Hybrid Inverse Rendering : Abstract: Existing methods achieve high-quality facial appearance capture under controllable lighting, which increases capture cost and limits usability. We propose WildCap, a novel method for high-qu...
- Cross-modal Prompting for Balanced Incomplete Multi-modal Emotion Recognition : Abstract: Incomplete multi-modal emotion recognition (IMER) aims at understanding human intentions and sentiments by comprehensively exploring the partially observed multi-source data. Although the mu...
- PersonaLive! Expressive Portrait Image Animation for Live Streaming : Abstract: Current diffusion-based portrait animation models predominantly focus on enhancing visual quality and expression realism, while overlooking generation latency and real-time performance, whic...
- Do We Need Reformer for Vision? An Experimental Comparison with Vision Transformers : Abstract: Transformers have recently demonstrated strong performance in computer vision, with Vision Transformers (ViTs) leveraging self-attention to capture both low-level and high-level image featur...
- Evaluating the Efficacy of Sentinel-2 versus Aerial Imagery in Serrated Tussock Classification : Abstract: Invasive species pose major global threats to ecosystems and agriculture. Serrated tussock (\textit{Nassella trichotoma}) is a highly competitive invasive grass species that disrupts native ...
- FilmWeaver: Weaving Consistent Multi-Shot Videos with Cache-Guided Autoregressive Diffusion : Abstract: Current video generation models perform well at single-shot synthesis but struggle with multi-shot videos, facing critical challenges in maintaining character and background consistency acro...
- RcAE: Recursive Reconstruction Framework for Unsupervised Industrial Anomaly Detection : Abstract: Unsupervised industrial anomaly detection requires accurately identifying defects without labeled data. Traditional autoencoder-based methods often struggle with incomplete anomaly suppressi...
- Autoregressive Video Autoencoder with Decoupled Temporal and Spatial Context : Abstract: Video autoencoders compress videos into compact latent representations for efficient reconstruction, playing a vital role in enhancing the quality and efficiency of video generation. However...
- MultiEgo: A Multi-View Egocentric Video Dataset for 4D Scene Reconstruction : Abstract: Multi-view egocentric dynamic scene reconstruction holds significant research value for applications in holographic documentation of social interactions. However, existing reconstruction dat...
- SATMapTR: Satellite Image Enhanced Online HD Map Construction : Abstract: High-definition (HD) maps are evolving from pre-annotated to real-time construction to better support autonomous driving in diverse scenarios. However, this process is hindered by low-qualit...
- KeyframeFace: From Text to Expressive Facial Keyframes : Abstract: Generating dynamic 3D facial animation from natural language requires understanding both temporally structured semantics and fine-grained expression changes. Existing datasets and methods ma...
- Physics-Informed Video Flare Synthesis and Removal Leveraging Motion Independence between Flare and Scene : Abstract: Lens flare is a degradation phenomenon caused by strong light sources. Existing researches on flare removal have mainly focused on images, while the spatiotemporal characteristics of video f...
- FreqDINO: Frequency-Guided Adaptation for Generalized Boundary-Aware Ultrasound Image Segmentation : Abstract: Ultrasound image segmentation is pivotal for clinical diagnosis, yet challenged by speckle noise and imaging artifacts. Recently, DINOv3 has shown remarkable promise in medical image segment...
- UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models : Abstract: With the advancement of multi-modal Large Language Models (LLMs), Video LLMs have been further developed to perform on holistic and specialized video understanding. However, existing works a...
- Task-Specific Distance Correlation Matching for Few-Shot Action Recognition : Abstract: Few-shot action recognition (FSAR) has recently made notable progress through set matching and efficient adaptation of large-scale pre-trained models. However, two key limitations persist. F...
- A Multi-Mode Structured Light 3D Imaging System with Multi-Source Information Fusion for Underwater Pipeline Detection : Abstract: Underwater pipelines are highly susceptible to corrosion, which not only shorten their service life but also pose significant safety risks. Compared with manual inspection, the intelligent r...
- Prior-Enhanced Gaussian Splatting for Dynamic Scene Reconstruction from Casual Video : Abstract: We introduce a fully automatic pipeline for dynamic scene reconstruction from casually captured monocular RGB videos. Rather than designing a new scene representation, we enhance the priors ...
- Reliable Detection of Minute Targets in High-Resolution Aerial Imagery across Temporal Shifts : Abstract: Efficient crop detection via Unmanned Aerial Vehicles is critical for scaling precision agriculture, yet it remains challenging due to the small scale of targets and environmental variabilit...
- Assisted Refinement Network Based on Channel Information Interaction for Camouflaged and Salient Object Detection : Abstract: Camouflaged Object Detection (COD) stands as a significant challenge in computer vision, dedicated to identifying and segmenting objects visually highly integrated with their backgrounds. Cu...
- The N-Body Problem: Parallel Execution from Single-Person Egocentric Video : Abstract: Humans can intuitively parallelise complex activities, but can a model learn this from observing a single person? Given one egocentric video, we introduce the N-Body Problem: how N individua...
- FlowDC: Flow-Based Decoupling-Decay for Complex Image Editing : Abstract: With the surge of pre-trained text-to-image flow matching models, text-based image editing performance has gained remarkable improvement, especially for \underline{simple editing} that only ...
- Collaborative Reconstruction and Repair for Multi-class Industrial Anomaly Detection : Abstract: Industrial anomaly detection is a challenging open-set task that aims to identify unknown anomalous patterns deviating from normal data distribution. To avoid the significant memory consumpt...
- JoyAvatar: Real-time and Infinite Audio-Driven Avatar Generation with Autoregressive Diffusion : Abstract: Existing DiT-based audio-driven avatar generation methods have achieved considerable progress, yet their broader application is constrained by limitations such as high computational overhead...
- YawDD+: Frame-level Annotations for Accurate Yawn Prediction : Abstract: Driver fatigue remains a leading cause of road accidents, with 24\% of crashes involving drowsy drivers. While yawning serves as an early behavioral indicator of fatigue, existing machine le...
- CADMorph: Geometry-Driven Parametric CAD Editing via a Plan-Generate-Verify Loop : Abstract: A Computer-Aided Design (CAD) model encodes an object in two coupled forms: a parametric construction sequence and its resulting visible geometric shape. During iterative design, adjustments...
- VLM2GeoVec: Toward Universal Multimodal Embeddings for Remote Sensing : Abstract: Satellite imagery differs fundamentally from natural images: its aerial viewpoint, very high resolution, diverse scale variations, and abundance of small objects demand both region-level spa...
- TSkel-Mamba: Temporal Dynamic Modeling via State Space Model for Human Skeleton-based Action Recognition : Abstract: Skeleton-based action recognition has garnered significant attention in the computer vision community. Inspired by the recent success of the selective state-space model (SSM) Mamba in modeli...
- SSA3D: Text-Conditioned Assisted Self-Supervised Framework for Automatic Dental Abutment Design : Abstract: Abutment design is a critical step in dental implant restoration. However, manual design involves tedious measurement and fitting, and research on automating this process with AI is limited,...
- On Geometric Understanding and Learned Data Priors in VGGT : Abstract: The Visual Geometry Grounded Transformer (VGGT) is a 3D foundation model that infers camera geometry and scene structure in a single feed-forward pass. Trained in a supervised, single-step f...
- Reconstruction as a Bridge for Event-Based Visual Question Answering : Abstract: Integrating event cameras with Multimodal Large Language Models (MLLMs) promises general scene understanding in challenging visual conditions, yet requires navigating a trade-off between pre...
- Infinity and Beyond: Compositional Alignment in VAR and Diffusion T2I Models : Abstract: Achieving compositional alignment between textual descriptions and generated images - covering objects, attributes, and spatial relationships - remains a core challenge for modern text-to-im...
- SSL-MedSAM2: A Semi-supervised Medical Image Segmentation Framework Powered by Few-shot Learning of SAM2 : Abstract: Despite the success of deep learning based models in medical image segmentation, most state-of-the-art (SOTA) methods perform fully-supervised learning, which commonly rely on large scale an...
- 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation : Abstract: 3D teeth segmentation, involving the localization of tooth instances and their semantic categorization in 3D dental models, is a critical yet challenging task in digital dentistry due to the...
- Evaluating Foundation Models' 3D Understanding Through Multi-View Correspondence Analysis : Abstract: Benchmarking 3D spatial understanding of foundation models is essential for real-world applications such as robotics and autonomous driving. Existing evaluations often rely on downstream fin...
- Using GUI Agent for Electronic Design Automation : Abstract: Graphical User Interface (GUI) agents adopt an end-to-end paradigm that maps a screenshot to an action sequence, thereby automating repetitive tasks in virtual environments. However, existin...
- Embodied Image Compression : Abstract: Image Compression for Machines (ICM) has emerged as a pivotal research direction in the field of visual data compression. However, with the rapid evolution of machine intelligence, the targe...
- Fast and Explicit: Slice-to-Volume Reconstruction via 3D Gaussian Primitives with Analytic Point Spread Function Modeling : Abstract: Recovering high-fidelity 3D images from sparse or degraded 2D images is a fundamental challenge in medical imaging, with broad applications ranging from 3D ultrasound reconstruction to MRI s...
- FactorPortrait: Controllable Portrait Animation via Disentangled Expression, Pose, and Viewpoint : Abstract: We introduce FactorPortrait, a video diffusion method for controllable portrait animation that enables lifelike synthesis from disentangled control signals of facial expressions, head moveme...
- Kinetic Mining in Context: Few-Shot Action Synthesis via Text-to-Motion Distillation : Abstract: The acquisition cost for large, annotated motion datasets remains a critical bottleneck for skeletal-based Human Activity Recognition (HAR). Although Text-to-Motion (T2M) generative models o...
- Cross-modal Context-aware Learning for Visual Prompt Guided Multimodal Image Understanding in Remote Sensing : Abstract: Recent advances in image understanding have enabled methods that leverage large language models for multimodal reasoning in remote sensing. However, existing approaches still struggle to ste...
- Depth-Copy-Paste: Multimodal and Depth-Aware Compositing for Robust Face Detection : Abstract: Data augmentation is crucial for improving the robustness of face detection systems, especially under challenging conditions such as occlusion, illumination variation, and complex environmen...
- Text images processing system using artificial intelligence models : Abstract: This is to present a text image classifier device that identifies textual content in images and then categorizes each image into one of four predefined categories, including Invoice, Form, L...
- EditMGT: Unleashing Potentials of Masked Generative Transformers in Image Editing : Abstract: Recent advances in diffusion models (DMs) have achieved exceptional visual quality in image editing tasks. However, the global denoising dynamics of DMs inherently conflate local editing tar...
- Referring Change Detection in Remote Sensing Imagery : Abstract: Change detection in remote sensing imagery is essential for applications such as urban planning, environmental monitoring, and disaster management. Traditional change detection methods typic...
- Reframing Music-Driven 2D Dance Pose Generation as Multi-Channel Image Generation : Abstract: Recent pose-to-video models can translate 2D pose sequences into photorealistic, identity-preserving dance videos, so the key challenge is to generate temporally coherent, rhythm-aligned 2D ...
- Weak-to-Strong Generalization Enables Fully Automated De Novo Training of Multi-head Mask-RCNN Model for Segmenting Densely Overlapping Cell Nuclei in Multiplex Whole-slice Brain Images : Abstract: We present a weak to strong generalization methodology for fully automated training of a multi-head extension of the Mask-RCNN method with efficient channel attention for reliable segmentati...
- SVG-T2I: Scaling Up Text-to-Image Latent Diffusion Model Without Variational Autoencoder : Abstract: Visual generation grounded in Visual Foundation Model (VFM) representations offers a highly promising unified pathway for integrating visual understanding, perception, and generation. Despit...
- Reducing Domain Gap with Diffusion-Based Domain Adaptation for Cell Counting : Abstract: Generating realistic synthetic microscopy images is critical for training deep learning models in label-scarce environments, such as cell counting with many cells per image. However, traditi...
- MatAnyone 2: Scaling Video Matting via a Learned Quality Evaluator : Abstract: Video matting remains limited by the scale and realism of existing datasets. While leveraging segmentation data can enhance semantic stability, the lack of effective boundary supervision oft...
- LUCID: Learning-Enabled Uncertainty-Aware Certification of Stochastic Dynamical Systems : Abstract: Ensuring the safety of AI-enabled systems, particularly in high-stakes domains such as autonomous driving and healthcare, has become increasingly critical. Traditional formal verification to...
- Learning Minimal Representations of Fermionic Ground States : Abstract: We introduce an unsupervised machine-learning framework that discovers optimally compressed representations of quantum many-body ground states. Using an autoencoder neural network architectu...
- Personalized Federated Learning with Exact Stochastic Gradient Descent : Abstract: We propose a Stochastic Gradient Descent (SGD)-type algorithm for Personalized Federated Learning which can be particularly attractive for mobile energy-limited regimes due to its low per-cl...
- Data as Voters: Core Set Selection Using Approval-Based Multi-Winner Voting : Abstract: We present a novel approach to the core set/instance selection problem in machine learning. Our approach is based on recent results on (proportional) representation in approval-based multi-w...
- M2NO: An Efficient Multi-Resolution Operator Framework for Dynamic Multi-Scale PDE Solvers : Abstract: Solving high-dimensional partial differential equations (PDEs) efficiently requires handling multi-scale features across varying resolutions. To address this challenge, we present the Multiw...
- TAEGAN: Generating Synthetic Tabular Data For Data Augmentation : Abstract: Synthetic tabular data generation has gained significant attention for its potential in data augmentation and privacy-preserving data sharing. While recent methods like diffusion and auto-re...
- Breaking the Frozen Subspace: Importance Sampling for Low-Rank Optimization in LLM Pretraining : Abstract: Low-rank optimization has emerged as a promising approach to enabling memory-efficient training of large language models (LLMs). Existing low-rank optimization methods typically project grad...
- GSplit: Scaling Graph Neural Network Training on Large Graphs via Split-Parallelism : Abstract: Graph neural networks (GNNs), an emerging class of machine learning models for graphs, have gained popularity for their superior performance in various graph analytical tasks. Mini-batch tra...
- The Expressive Capacity of State Space Models: A Formal Language Perspective : Abstract: Recently, recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers. However, there is little unde...
- HyperSBINN: A Hypernetwork-Enhanced Systems Biology-Informed Neural Network for Efficient Drug Cardiosafety Assessment : Abstract: Mathematical modeling in systems toxicology enables a comprehensive understanding of the effects of pharmaceutical substances on cardiac health. However, the complexity of these models limit...
- Introducing physics-informed generative models for targeting structural novelty in the exploration of chemical space : Abstract: Discovering materials with new structural chemistry is key to achieving transformative functionality. Generative artificial intelligence offers a scalable route to propose candidate crystal ...
- ASR Under the Stethoscope: Evaluating Biases in Clinical Speech Recognition across Indian Languages : Abstract: Automatic Speech Recognition (ASR) is increasingly used to document clinical encounters, yet its reliability in multilingual and demographically diverse Indian healthcare contexts remains la...
- Benchmarking Automatic Speech Recognition Models for African Languages : Abstract: Automatic speech recognition (ASR) for African languages remains constrained by limited labeled data and the lack of systematic guidance on model selection, data scaling, and decoding strate...
- KBQA-R1: Reinforcing Large Language Models for Knowledge Base Question Answering : Abstract: Knowledge Base Question Answering (KBQA) challenges models to bridge the gap between natural language and strict knowledge graph schemas by generating executable logical forms. While Large L...
- PIAST: Rapid Prompting with In-context Augmentation for Scarce Training data : Abstract: LLMs are highly sensitive to prompt design, but handcrafting effective prompts is difficult and often requires intricate crafting of few-shot examples. We propose a fast automatic prompt con...
- Applying NLP to iMessages: Understanding Topic Avoidance, Responsiveness, and Sentiment : Abstract: What is your messaging data used for? While many users do not often think about the information companies can gather based off of their messaging platform of choice, it is nonetheless import...
- SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing : Abstract: SciLaD is a novel, large-scale dataset of scientific language constructed entirely using open-source frameworks and publicly available data sources. It comprises a curated English split cont...
- Leveraging LLMs for Title and Abstract Screening for Systematic Review: A Cost-Effective Dynamic Few-Shot Learning Approach : Abstract: Systematic reviews are a key component of evidence-based medicine, playing a critical role in synthesizing existing research evidence and guiding clinical decisions. However, with the rapid ...
- AdaSD: Adaptive Speculative Decoding for Efficient Language Model Inference : Abstract: Large language models (LLMs) have achieved remarkable performance across a wide range of tasks, but their increasing parameter sizes significantly slow down inference. Speculative decoding m...
- LegalRikai: Open Benchmark -- A Benchmark for Complex Japanese Corporate Legal Tasks : Abstract: This paper introduces LegalRikai: Open Benchmark, a new benchmark comprising four complex tasks that emulate Japanese corporate legal practices. The benchmark was created by legal profession...
- Unifying Dynamic Tool Creation and Cross-Task Experience Sharing through Cognitive Memory Architecture : Abstract: Large Language Model agents face fundamental challenges in adapting to novel tasks due to limitations in tool availability and experience reuse. Existing approaches either rely on predefined...
- qa-FLoRA: Data-free query-adaptive Fusion of LoRAs for LLMs : Abstract: The deployment of large language models for specialized tasks often requires domain-specific parameter-efficient finetuning through Low-Rank Adaptation (LoRA) modules. However, effectively f...
- Mining Legal Arguments to Study Judicial Formalism : Abstract: Courts must justify their decisions, but systematically analyzing judicial reasoning at scale remains difficult. This study refutes claims about formalistic judging in Central and Eastern Eu...
- Improving Translation Quality by Selecting Better Data for LLM Fine-Tuning: A Comparative Analysis : Abstract: We investigated the impact of data selection on machine translation fine-tuning for open LLMs. Using Japanese-English corpora, we compare five selectors: TF-IDF, COMET Kiwi, QuRate, FD-Score...
- Minimal Clips, Maximum Salience: Long Video Summarization via Key Moment Extraction : Abstract: Vision-Language Models (VLMs) are able to process increasingly longer videos. Yet, important visual information is easily lost throughout the entire context and missed by VLMs. Also, it is i...
- CLINIC: Evaluating Multilingual Trustworthiness in Language Models for Healthcare : Abstract: Integrating language models (LMs) in healthcare systems holds great promise for improving medical workflows and decision-making. However, a critical barrier to their real-world adoption is t...
- Mistake Notebook Learning: Selective Batch-Wise Context Optimization for In-Context Learning : Abstract: Large language models (LLMs) adapt to tasks via gradient fine-tuning (heavy computation, catastrophic forgetting) or In-Context Learning (ICL: low robustness, poor mistake learning). To fix ...
- RMSup: Physics-Informed Radio Map Super-Resolution for Compute-Enhanced Integrated Sensing and Communications : Abstract: Radio maps (RMs) provide a spatially continuous description of wireless propagation, enabling cross-layer optimization and unifying communication and sensing for integrated sensing and commu...
- Generalization of Long-Range Machine Learning Potentials in Complex Chemical Spaces : Abstract: The vastness of chemical space makes generalization a central challenge in the development of machine learning interatomic potentials (MLIPs). While MLIPs could enable large-scale atomistic ...
- STARK denoises spatial transcriptomics images via adaptive regularization : Abstract: We present an approach to denoising spatial transcriptomics images that is particularly effective for uncovering cell identities in the regime of ultra-low sequencing depths, and also allows...
- Boosted Random Forests for Predicting Treatment Failure of Chemotherapy Regimens : Abstract: Cancer patients may undergo lengthy and painful chemotherapy treatments, comprising several successive regimens or plans. Treatment inefficacy and other adverse events can lead to discontinu...
- An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees : Abstract: We study outlier (a.k.a., anomaly) detection for single-pass non-stationary streaming data. In the well-studied offline or batch outlier detection problem, traditional methods such as kernel...
- Provable Recovery of Locally Important Signed Features and Interactions from Random Forest : Abstract: Feature and Interaction Importance (FII) methods are essential in supervised learning for assessing the relevance of input variables and their interactions in complex prediction models. In m...
- TPV: Parameter Perturbations Through the Lens of Test Prediction Variance : Abstract: We identify test prediction variance (TPV) -- the first-order sensitivity of model outputs to parameter perturbations around a trained solution -- as a unifying quantity that links several c...
- Data-Driven Model Reduction using WeldNet: Windowed Encoders for Learning Dynamics : Abstract: Many problems in science and engineering involve time-dependent, high dimensional datasets arising from complex physical processes, which are costly to simulate. In this work, we propose Wel...
- CADKnitter: Compositional CAD Generation from Text and Geometry Guidance : Abstract: Crafting computer-aided design (CAD) models has long been a painstaking and time-intensive task, demanding both precision and expertise from designers. With the emergence of 3D generation, t...
- Theoretical Foundations of GPU-Native Compilation for Rapid Code Iteration : Abstract: Current AI code generation systems suffer from significant latency bottlenecks due to CPU-GPU data transfers during compilation, execution, and testing phases. We establish theoretical found...
- Multi-Objective Reinforcement Learning for Large-Scale Mixed Traffic Control : Abstract: Effective mixed traffic control requires balancing efficiency, fairness, and safety. Existing approaches excel at optimizing efficiency and enforcing safety constraints but lack mechanisms t...
- Integrated Prediction and Multi-period Portfolio Optimization : Abstract: Multi-period portfolio optimization is important for real portfolio management, as it accounts for transaction costs, path-dependent risks, and the intertemporal structure of trading decisio...
- When Actions Teach You to Think: Reasoning-Action Synergy via Reinforcement Learning in Conversational Agents : Abstract: Supervised fine-tuning (SFT) has emerged as one of the most effective ways to improve the performance of large language models (LLMs) in downstream tasks. However, SFT can have difficulty ge...
- Maritime object classification with SAR imagery using quantum kernel methods : Abstract: Illegal, unreported, and unregulated (IUU) fishing causes global economic losses of \$10-25 billion annually and undermines marine sustainability and governance. Synthetic Aperture Radar (SA...
- Out-of-Distribution Segmentation via Wasserstein-Based Evidential Uncertainty : Abstract: Deep neural networks achieve superior performance in semantic segmentation, but are limited to a predefined set of classes, which leads to failures when they encounter unknown objects in ope...
- Emergence of Nonequilibrium Latent Cycles in Unsupervised Generative Modeling : Abstract: We show that nonequilibrium dynamics can play a constructive role in unsupervised machine learning by inducing the spontaneous emergence of latent-state cycles. We introduce a model in which...
- DOS: Distilling Observable Softmaps of Zipfian Prototypes for Self-Supervised Point Representation : Abstract: Recent advances in self-supervised learning (SSL) have shown tremendous potential for learning 3D point cloud representations without human annotations. However, SSL for 3D point clouds stil...
- FRQI Pairs method for image classification using Quantum Recurrent Neural Network : Abstract: This study aims to introduce the FRQI Pairs method to a wider audience, a novel approach to image classification using Quantum Recurrent Neural Networks (QRNN) with Flexible Representation f...
- Super-Resolved Canopy Height Mapping from Sentinel-2 Time Series Using LiDAR HD Reference Data across Metropolitan France : Abstract: Fine-scale forest monitoring is essential for understanding canopy structure and its dynamics, which are key indicators of carbon stocks, biodiversity, and forest health. Deep learning is pa...
- Visualizing token importance for black-box language models : Abstract: We consider the problem of auditing black-box large language models (LLMs) to ensure they behave reliably when deployed in production settings, particularly in high-stakes domains such as le...
- In-Context Learning for Seismic Data Processing : Abstract: Seismic processing transforms raw data into subsurface images essential for geophysical applications. Traditional methods face challenges, such as noisy data, and manual parameter tuning, am...
- Safe Bayesian optimization across noise models via scenario programming : Abstract: Safe Bayesian optimization (BO) with Gaussian processes is an effective tool for tuning control policies in safety-critical real-world systems, specifically due to its sample efficiency and ...
- Neural Network-based Partial-Linear Single-Index Models for Environmental Mixtures Analysis : Abstract: Evaluating the health effects of complex environmental mixtures remains a central challenge in environmental health research. Existing approaches vary in their flexibility, interpretability,...
- Stable spectral neural operator for learning stiff PDE systems from limited data : Abstract: Accurate modeling of spatiotemporal dynamics is crucial to understanding complex phenomena across science and engineering. However, this task faces a fundamental challenge when the governing...
- ECCO: Leveraging Cross-Camera Correlations for Efficient Live Video Continuous Learning : Abstract: Recent advances in video analytics address real-time data drift by continuously retraining specialized, lightweight DNN models for individual cameras. However, the current practice of retrai...
- RECAP: REwriting Conversations for Intent Understanding in Agentic Planning : Abstract: Understanding user intent is essential for effective planning in conversational assistants, particularly those powered by large language models (LLMs) coordinating multiple agents. However, ...
- KVSwap: Disk-aware KV Cache Offloading for Long-Context On-device Inference : Abstract: Language models (LMs) underpin emerging mobile and embedded AI applications like meeting and video summarization and document analysis, which often require processing multiple long-context i...
- MoB: Mixture of Bidders : Abstract: Mixture of Experts (MoE) architectures have demonstrated remarkable success in scaling neural networks, yet their application to continual learning remains fundamentally limited by a critica...
- TECM*: A Data-Driven Assessment to Reinforcement Learning Methods and Application to Heparin Treatment Strategy for Surgical Sepsis : Abstract: Objective: Sepsis is a life-threatening condition caused by severe infection leading to acute organ dysfunction. This study proposes a data-driven metric and a continuous reward function to ...
- Memoryless Policy Iteration for Episodic POMDPs : Abstract: Memoryless and finite-memory policies offer a practical alternative for solving partially observable Markov decision processes (POMDPs), as they operate directly in the output space rather t...
- Investigating ECG Diagnosis with Ambiguous Labels using Partial Label Learning : Abstract: Label ambiguity is an inherent problem in real-world electrocardiogram (ECG) diagnosis, arising from overlapping conditions and diagnostic disagreement. However, current ECG models are train...
- Limits and Gains of Test-Time Scaling in Vision-Language Reasoning : Abstract: Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning ability of Large Language Models (LLMs) by allocating additional computation at inference, yet its appl...
- Refining Graphical Neural Network Predictions Using Flow Matching for Optimal Power Flow with Constraint-Satisfaction Guarantee : Abstract: The DC Optimal Power Flow (DC-OPF) problem is fundamental to power system operations, requiring rapid solutions for real-time grid management. While traditional optimization solvers provide ...
- The Vekua Layer: Exact Physical Priors for Implicit Neural Representations via Generalized Analytic Functions : Abstract: Implicit Neural Representations (INRs) have emerged as a powerful paradigm for parameterizing physical fields, yet they often suffer from spectral bias and the computational expense of non-c...
- Harnessing Rich Multi-Modal Data for Spatial-Temporal Homophily-Embedded Graph Learning Across Domains and Localities : Abstract: Modern cities are increasingly reliant on data-driven insights to support decision making in areas such as transportation, public safety and environmental impact. However, city-level data of...
- Bandwidth-constrained Variational Message Encoding for Cooperative Multi-agent Reinforcement Learning : Abstract: Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent meth...
- Progress over Points: Reframing LM Benchmarks Around Scientific Objectives : Abstract: Current benchmarks that test LLMs on static, already-solved problems (e.g., math word problems) effectively demonstrated basic capability acquisition. The natural progression has been toward...
- On the failure of ReLU activation for physics-informed machine learning : Abstract: Physics-informed machine learning uses governing ordinary and/or partial differential equations to train neural networks to represent the solution field. Like any machine learning problem, t...
- Beyond Memorization: Gradient Projection Enables Selective Learning in Diffusion Models : Abstract: Memorization in large-scale text-to-image diffusion models poses significant security and intellectual property risks, enabling adversarial attribute extraction and the unauthorized reproduc...
- Latent Variable Causal Discovery under Selection Bias : Abstract: Addressing selection bias in latent variable causal discovery is important yet underexplored, largely due to a lack of suitable statistical tools: While various tools beyond basic conditiona...
- Task-Aware Multi-Expert Architecture For Lifelong Deep Learning : Abstract: Lifelong deep learning (LDL) trains neural networks to learn sequentially across tasks while preserving prior knowledge. We propose Task-Aware Multi-Expert (TAME), a continual learning algor...
- Insight Miner: A Time Series Analysis Dataset for Cross-Domain Alignment with Natural Language : Abstract: Time-series data is critical across many scientific and industrial domains, including environmental analysis, agriculture, transportation, and finance. However, mining insights from this dat...
- Features Emerge as Discrete States: The First Application of SAEs to 3D Representations : Abstract: Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value des...
- SRLR: Symbolic Regression based Logic Recovery to Counter Programmable Logic Controller Attacks : Abstract: Programmable Logic Controllers (PLCs) are critical components in Industrial Control Systems (ICSs). Their potential exposure to external world makes them susceptible to cyber-attacks. Existi...
- QGEC : Quantum Golay Code Error Correction : Abstract: Quantum computers have the possibility of a much reduced calculation load compared with classical computers in specific problems. Quantum error correction (QEC) is vital for handling qubits,...
- Benchmarking the Generality of Vision-Language-Action Models : Abstract: Generalist multimodal agents are expected to unify perception, language, and control - operating robustly across diverse real world domains. However, current evaluation practices remain frag...
- Pace: Physics-Aware Attentive Temporal Convolutional Network for Battery Health Estimation : Abstract: Batteries are critical components in modern energy systems such as electric vehicles and power grid energy storage. Effective battery health management is essential for battery system safety...
- Spectral entropy prior-guided deep feature fusion architecture for magnetic core loss : Abstract: Accurate core loss modeling is critical for the design of high-efficiency power electronic systems. Traditional core loss modeling methods have limitations in prediction accuracy. To advance...
- DAPO: Design Structure-Aware Pass Ordering in High-Level Synthesis with Graph Contrastive and Reinforcement Learning : Abstract: High-Level Synthesis (HLS) tools are widely adopted in FPGA-based domain-specific accelerator design. However, existing tools rely on fixed optimization strategies inherited from software co...
- Symmetry-Aware Steering of Equivariant Diffusion Policies: Benefits and Limits : Abstract: Equivariant diffusion policies (EDPs) combine the generative expressivity of diffusion models with the strong generalization and sample efficiency afforded by geometric symmetries. While ste...
- CAT: Can Trust be Predicted with Context-Awareness in Dynamic Heterogeneous Networks? : Abstract: Trust prediction provides valuable support for decision-making, risk mitigation, and system security enhancement. Recently, Graph Neural Networks (GNNs) have emerged as a promising approach ...
- Attacking and Securing Community Detection: A Game-Theoretic Framework : Abstract: It has been demonstrated that adversarial graphs, i.e., graphs with imperceptible perturbations, can cause deep graph models to fail on classification tasks. In this work, we extend the conc...
- Mitigating the Safety Alignment Tax with Null-Space Constrained Policy Optimization : Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world applications, it is important to ensure their behaviors align with human values, societal norms, and ethical principle...
- Bhargava Cube--Inspired Quadratic Regularization for Structured Neural Embeddings : Abstract: We present a novel approach to neural representation learning that incorporates algebraic constraints inspired by Bhargava cubes from number theory. Traditional deep learning methods learn r...
- Sliced ReLU attention: Quasi-linear contextual expressivity via sorting : Abstract: We introduce sliced ReLU attention, a new attention mechanism that departs structurally from both softmax and ReLU-based alternatives. Instead of applying a nonlinearity to pairwise dot prod...
- Hyperbolic Gaussian Blurring Mean Shift: A Statistical Mode-Seeking Framework for Clustering in Curved Spaces : Abstract: Clustering is a fundamental unsupervised learning task for uncovering patterns in data. While Gaussian Blurring Mean Shift (GBMS) has proven effective for identifying arbitrarily shaped clus...
- Rethinking Expert Trajectory Utilization in LLM Post-training : Abstract: While effective post-training integrates Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), the optimal mechanism for utilizing expert trajectories remains unresolved. We propose ...
- xGR: Efficient Generative Recommendation Serving at Scale : Abstract: Recommendation system delivers substantial economic benefits by providing personalized predictions. Generative recommendation (GR) integrates LLMs to enhance the understanding of long user-i...
- Parametric Numerical Integration with (Differential) Machine Learning : Abstract: In this work, we introduce a machine/deep learning methodology to solve parametric integrals. Besides classical machine learning approaches, we consider a differential learning framework tha...
- A Multi-Criteria Automated MLOps Pipeline for Cost-Effective Cloud-Based Classifier Retraining in Response to Data Distribution Shifts : Abstract: The performance of machine learning (ML) models often deteriorates when the underlying data distribution changes over time, a phenomenon known as data distribution drift. When this happens, ...
- Elastic-Net Multiple Kernel Learning: Combining Multiple Data Sources for Prediction : Abstract: Multiple Kernel Learning (MKL) models combine several kernels in supervised and unsupervised settings to integrate multiple data representations or sources, each represented by a different k...
- Fully Inductive Node Representation Learning via Graph View Transformation : Abstract: Generalizing a pretrained model to unseen datasets without retraining is an essential step toward a foundation model. However, achieving such cross-dataset, fully inductive inference is diff...
- Brain-Semantoks: Learning Semantic Tokens of Brain Dynamics with a Self-Distilled Foundation Model : Abstract: The development of foundation models for functional magnetic resonance imaging (fMRI) time series holds significant promise for predicting phenotypes related to disease and cognition. Curren...
- Gradient Descent as a Perceptron Algorithm: Understanding Dynamics and Implicit Acceleration : Abstract: Even for the gradient descent (GD) method applied to neural network training, understanding its optimization dynamics, including convergence rate, iterate trajectories, function value oscill...
- A Fast Interpretable Fuzzy Tree Learner : Abstract: Fuzzy rule-based systems have been mostly used in interpretable decision-making because of their interpretable linguistic rules. However, interpretability requires both sensible linguistic p...
- Bridging Streaming Continual Learning via In-Context Large Tabular Models : Abstract: In streaming scenarios, models must learn continuously, adapting to concept drifts without erasing previously acquired knowledge. However, existing research communities address these challen...
- High-Dimensional Surrogate Modeling for Closed-Loop Learning of Neural-Network-Parameterized Model Predictive Control : Abstract: Learning controller parameters from closed-loop data has been shown to improve closed-loop performance. Bayesian optimization, a widely used black-box and sample-efficient learning method, c...
- SpectralKrum: A Spectral-Geometric Defense Against Byzantine Attacks in Federated Learning : Abstract: Federated Learning (FL) distributes model training across clients who retain their data locally, but this architecture exposes a fundamental vulnerability: Byzantine clients can inject arbit...
- The Adaptive Vekua Cascade: A Differentiable Spectral-Analytic Solver for Physics-Informed Representation : Abstract: Coordinate-based neural networks have emerged as a powerful tool for representing continuous physical fields, yet they face two fundamental pathologies: spectral bias, which hinders the lear...
- Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective : Abstract: Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based ...
- A General Algorithm for Detecting Higher-Order Interactions via Random Sequential Additions : Abstract: Many systems exhibit complex interactions between their components: some features or actions amplify each other's effects, others provide redundant information, and some contribute independe...
- Developmental Symmetry-Loss: A Free-Energy Perspective on Brain-Inspired Invariance Learning : Abstract: We propose Symmetry-Loss, a brain-inspired algorithmic principle that enforces invariance and equivariance through a differentiable constraint derived from environmental symmetries. The fram...
- Marti-5: A Mathematical Model of "Self in the World" as a First Step Toward Self-Awareness : Abstract: The existence of 'what' and 'where' pathways of information processing in the brain was proposed almost 30 years ago, but there is still a lack of a clear mathematical model that could show ...
- Mathematics of natural intelligence : Abstract: In the process of evolution, the brain has achieved such perfection that artificial intelligence systems do not have and which needs its own mathematics. The concept of cognitome, introduced...
- Dora: QoE-Aware Hybrid Parallelism for Distributed Edge AI : Abstract: With the proliferation of edge AI applications, satisfying user quality of experience (QoE) requirements, such as model inference latency, has become a first class objective, as these models...
- MolSculpt: Sculpting 3D Molecular Geometries from Chemical Syntax : Abstract: Generating precise 3D molecular geometries is crucial for drug discovery and material science. While prior efforts leverage 1D representations like SELFIES to ensure molecular validity, they...
- MedBioRAG: Semantic Search and Retrieval-Augmented Generation with Large Language Models for Medical and Biological QA : Abstract: Recent advancements in retrieval-augmented generation (RAG) have significantly enhanced the ability of large language models (LLMs) to perform complex question-answering (QA) tasks. In this ...
- Unambiguous Representations in Neural Networks: An Information-Theoretic Approach to Intentionality : Abstract: Representations pervade our daily experience, from letters representing sounds to bit strings encoding digital files. While such representations require externally defined decoders to convey...
- Beyond Memristor: Neuromorphic Computing Using Meminductor : Abstract: Memristor (resistor with memory), inductor with memory (meminductor) and capacitor with memory (memcapacitor) have different roles to play in novel computing architectures. We found that a c...
- Leveraging Text Guidance for Enhancing Demographic Fairness in Gender Classification : Abstract: In the quest for fairness in artificial intelligence, novel approaches to enhance it in facial image based gender classification algorithms using text guided methodologies are presented. The...
- SoccerMaster: A Vision Foundation Model for Soccer Understanding : Abstract: Soccer understanding has recently garnered growing research interest due to its domain-specific complexity and unique challenges. Unlike prior works that typically rely on isolated, task-spe...
- WholeBodyVLA: Towards Unified Latent VLA for Whole-Body Loco-Manipulation Control : Abstract: Humanoid robots require precise locomotion and dexterous manipulation to perform challenging loco-manipulation tasks. Yet existing approaches, modular or end-to-end, are deficient in manipul...
- KathDB: Explainable Multimodal Database Management System with Human-AI Collaboration : Abstract: Traditional DBMSs execute user- or application-provided SQL queries over relational data with strong semantic guarantees and advanced query optimization, but writing complex SQL is hard and ...
- MultiScript30k: Leveraging Multilingual Embeddings to Extend Cross Script Parallel Data : Abstract: Multi30k is frequently cited in the multimodal machine translation (MMT) literature, offering parallel text data for training and fine-tuning deep learning models. However, it is limited to ...
- Fast, accurate measurement of the worker populations of honey bee colonies using deep learning : Abstract: Honey bees play a crucial role in pollination, contributing significantly to global agriculture and ecosystems. Accurately estimating hive populations is essential for understanding the effe...
- A probabilistic foundation model for crystal structure denoising, phase classification, and order parameters : Abstract: Atomistic simulations generate large volumes of noisy structural data, but extracting phase labels, order parameters (OPs), and defect information in a way that is universal, robust, and int...
- Clip-and-Verify: Linear Constraint-Driven Domain Clipping for Accelerating Neural Network Verification : Abstract: State-of-the-art neural network (NN) verifiers demonstrate that applying the branch-and-bound (BaB) procedure with fast bounding techniques plays a key role in tackling many challenging veri...
- Explanation Bias is a Product: Revealing the Hidden Lexical and Position Preferences in Post-Hoc Feature Attribution : Abstract: Good quality explanations strengthen the understanding of language models and data. Feature attribution methods, such as Integrated Gradient, are a type of post-hoc explainer that can provid...
- FIBER: A Multilingual Evaluation Resource for Factual Inference Bias : Abstract: Large language models are widely used across domains, yet there are concerns about their factual reliability and biases. Factual knowledge probing offers a systematic means to evaluate these...
- In-Context Multi-Objective Optimization : Abstract: Balancing competing objectives is omnipresent across disciplines, from drug design to autonomous systems. Multi-objective Bayesian optimization is a promising solution for such expensive, bl...
- Fairness-Regularized Online Optimization with Switching Costs : Abstract: Fairness and action smoothness are two crucial considerations in many online optimization problems, but they have yet to be addressed simultaneously. In this paper, we study a new and challe...
- Autoencoder-based Semi-Supervised Dimensionality Reduction and Clustering for Scientific Ensembles : Abstract: Analyzing and visualizing scientific ensemble datasets with high dimensionality and complexity poses significant challenges. Dimensionality reduction techniques and autoencoders are powerful...
- MiniScope: A Least Privilege Framework for Authorizing Tool Calling Agents : Abstract: Tool calling agents are an emerging paradigm in LLM deployment, with major platforms such as ChatGPT, Claude, and Gemini adding connectors and autonomous capabilities. However, the inherent ...
- Image Tiling for High-Resolution Reasoning: Balancing Local Detail with Global Context : Abstract: Reproducibility remains a cornerstone of scientific progress, yet complex multimodal models often lack transparent implementation details and accessible training infrastructure. In this work...
- Fast EXP3 Algorithms : Abstract: We point out that EXP3 can be implemented in constant time per round, propose more practical algorithms, and analyze the trade-offs between the regret bounds and time complexities of these a...
- amc: The Automated Mission Classifier for Telescope Bibliographies : Abstract: Telescope bibliographies record the pulse of astronomy research by capturing publication statistics and citation metrics for telescope facilities. Robust and scalable bibliographies ensure t...
- Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery: Sublinear Memory Growth for Efficient LLM Inference : Abstract: We present Adaptive Soft Rolling KV Freeze with Entropy-Guided Recovery (ASR-KF-EGR), a training-free inference-time framework for efficient large language model generation. Our method intro...
- VFMF: World Modeling by Forecasting Vision Foundation Model Features : Abstract: Forecasting from partial observations is central to world modeling. Many recent methods represent the world through images, and reduce forecasting to stochastic video generation. Although su...
- A Simple Generalisation of the Implicit Dynamics of In-Context Learning : Abstract: In-context learning (ICL) refers to the ability of a model to learn new tasks from examples in its input without any parameter updates. In contrast to previous theories of ICL relying on toy...
- Multi-Intent Spoken Language Understanding: Methods, Trends, and Challenges : Abstract: Multi-intent spoken language understanding (SLU) involves two tasks: multiple intent detection and slot filling, which jointly handle utterances containing more than one intent. Owing to thi...
- A Scalable Multi-GPU Framework for Encrypted Large-Model Inference : Abstract: Encrypted AI using fully homomorphic encryption (FHE) provides strong privacy guarantees; but its slow performance has limited practical deployment. Recent works proposed ASICs to accelerate...
- Words to Describe What I'm Feeling: Exploring the Potential of AI Agents for High Subjectivity Decisions in Advance Care Planning : Abstract: Serious illness can deprive patients of the capacity to speak for themselves. As populations age and caregiver networks shrink, the need for reliable support in Advance Care Planning (ACP) g...
- CIP: A Plug-and-Play Causal Prompting Framework for Mitigating Hallucinations under Long-Context Noise : Abstract: Large language models often hallucinate when processing long and noisy retrieval contexts because they rely on spurious correlations rather than genuine causal relationships. We propose CIP,...
- AI Autonomy or Human Dependency? Defining the Boundary in Responsible AI with the ${\alpha}$-Coefficient : Abstract: The integrity of contemporary AI systems is undermined by a critical design flaw: the misappropriation of Human-in-the-Loop (HITL) models to mask systems that are fundamentally reliant on hu...
- Few-Shot VLM-Based G-Code and HMI Verification in CNC Machining : Abstract: Manual generation of G-code is important for learning the operation of CNC machines. Prior work in G-code verification uses Large-Language Models (LLMs), which primarily examine errors in th...
- Condensation-Concatenation Framework for Dynamic Graph Continual Learning : Abstract: Dynamic graphs are prevalent in real-world scenarios, where continuous structural changes induce catastrophic forgetting in graph neural networks (GNNs). While continual learning has been ex...
- MLLM Machine Unlearning via Visual Knowledge Distillation : Abstract: Recently, machine unlearning approaches have been proposed to remove sensitive information from well-trained large models. However, most existing methods are tailored for LLMs, while MLLM-or...
- Surveillance Video-Based Traffic Accident Detection Using Transformer Architecture : Abstract: Road traffic accidents represent a leading cause of mortality globally, with incidence rates rising due to increasing population, urbanization, and motorization. Rising accident rates raise ...
- REMODEL-LLM: Transforming C code to Java using LLMs : Abstract: The automated translation of C code to Java code is a notoriously difficult task, fraught with challenges stemming from fundamental paradigm shifts (procedural vs. Object Oriented), memory m...
- Task-Specific Sparse Feature Masks for Molecular Toxicity Prediction with Chemical Language Models : Abstract: Reliable in silico molecular toxicity prediction is a cornerstone of modern drug discovery, offering a scalable alternative to experimental screening. However, the black-box nature of state-...
- Flowception: Temporally Expansive Flow Matching for Video Generation : Abstract: We present Flowception, a novel non-autoregressive and variable-length video generation framework. Flowception learns a probability path that interleaves discrete frame insertions with conti...
- Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation : Abstract: We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen ...
- Exploring MLLM-Diffusion Information Transfer with MetaCanvas : Abstract: Multimodal learning has rapidly advanced visual understanding, largely via multimodal large language models (MLLMs) that use powerful LLMs as cognitive cores. In visual generation, however, ...
- Towards Privacy-Preserving Code Generation: Differentially Private Code Language Models : Abstract: Large language models specialized for code (CodeLLMs) have demonstrated remarkable capabilities in generating code snippets, documentation, and test cases. However, despite their promising c...
- Does Less Hallucination Mean Less Creativity? An Empirical Investigation in LLMs : Abstract: Large Language Models (LLMs) exhibit remarkable capabilities in natural language understanding and reasoning, but suffer from hallucination: the generation of factually incorrect content. Wh...
- NeuralOGCM: Differentiable Ocean Modeling with Learnable Physics : Abstract: High-precision scientific simulation faces a long-standing trade-off between computational efficiency and physical fidelity. To address this challenge, we propose NeuralOGCM, an ocean modeli...
- Contrastive Time Series Forecasting with Anomalies : Abstract: Time series forecasting predicts future values from past data. In real-world settings, some anomalous events have lasting effects and influence the forecast, while others are short-lived and...
- Parallax: Runtime Parallelization for Operator Fallbacks in Heterogeneous Edge Systems : Abstract: The growing demand for real-time DNN applications on edge devices necessitates faster inference of increasingly complex models. Although many devices include specialized accelerators (e.g., ...
- Graph Embedding with Mel-spectrograms for Underwater Acoustic Target Recognition : Abstract: Underwater acoustic target recognition (UATR) is extremely challenging due to the complexity of ship-radiated noise and the variability of ocean environments. Although deep learning (DL) app...
- Optimizing the Training Diet: Data Mixture Search for Robust Time Series Forecasting : Abstract: The standard paradigm for training deep learning models on sensor data assumes that more data is always better. However, raw sensor streams are often imbalanced and contain significant redun...
- DentalGPT: Incentivizing Multimodal Complex Reasoning in Dentistry : Abstract: Reliable interpretation of multimodal data in dentistry is essential for automated oral healthcare, yet current multimodal large language models (MLLMs) struggle to capture fine-grained dent...
- Multi-temporal Calving Front Segmentation : Abstract: The calving fronts of marine-terminating glaciers undergo constant changes. These changes significantly affect the glacier's mass and dynamics, demanding continuous monitoring. To address th...
- Atomic Action Slicing: Planner-Aligned Options for Generalist VLA Agents : Abstract: Current vision-language-action (VLA) models generalize poorly, particularly when tasks require new compositions of skills or objects. We introduce Atomic Action Slicing (AAS), a planner-alig...
- Bounding Hallucinations: Information-Theoretic Guarantees for RAG Systems via Merlin-Arthur Protocols : Abstract: Retrieval-augmented generation (RAG) models rely on retrieved evidence to guide large language model (LLM) generators, yet current systems treat retrieval as a weak heuristic rather than ver...
- Automating Historical Insight Extraction from Large-Scale Newspaper Archives via Neural Topic Modeling : Abstract: Extracting coherent and human-understandable themes from large collections of unstructured historical newspaper archives presents significant challenges due to topic evolution, Optical Chara...
- From Verification Burden to Trusted Collaboration: Design Goals for LLM-Assisted Literature Reviews : Abstract: Large Language Models (LLMs) are increasingly embedded in academic writing practices. Although numerous studies have explored how researchers employ these tools for scientific writing, their...
- From Signal to Turn: Interactional Friction in Modular Speech-to-Speech Pipelines : Abstract: While voice-based AI systems have achieved remarkable generative capabilities, their interactions often feel conversationally broken. This paper examines the interactional friction that emer...
- CogniSNN: Enabling Neuron-Expandability, Pathway-Reusability, and Dynamic-Configurability with Random Graph Architectures in Spiking Neural Networks : Abstract: Spiking neural networks (SNNs), regarded as the third generation of artificial neural networks, are expected to bridge the gap between artificial intelligence and computational neuroscience....
- Generative Parametric Design (GPD): A framework for real-time geometry generation and on-the-fly multiparametric approximation : Abstract: This paper presents a novel paradigm in simulation-based engineering sciences by introducing a new framework called Generative Parametric Design (GPD). The GPD framework enables the generati...
- Smudged Fingerprints: A Systematic Evaluation of the Robustness of AI Image Fingerprints : Abstract: Model fingerprint detection techniques have emerged as a promising approach for attributing AI-generated images to their source models, but their robustness under adversarial conditions rema...
- Conditional Coverage Diagnostics for Conformal Prediction : Abstract: Evaluating conditional coverage remains one of the most persistent challenges in assessing the reliability of predictive systems. Although conformal methods can give guarantees on marginal c...
- Agile Flight Emerges from Multi-Agent Competitive Racing : Abstract: Through multi-agent competition and the sparse high-level objective of winning a race, we find that both agile flight (e.g., high-speed motion pushing the platform to its physical limits) an...
- Super Suffixes: Bypassing Text Generation Alignment and Guard Models Simultaneously : Abstract: The rapid deployment of Large Language Models (LLMs) has created an urgent need for enhanced security and privacy measures in Machine Learning (ML). LLMs are increasingly being used to proce...
- Particulate: Feed-Forward 3D Object Articulation : Abstract: We present Particulate, a feed-forward approach that, given a single static 3D mesh of an everyday object, directly infers all attributes of the underlying articulated structure, including i...
- Probability Bracket Notation: Multivariable Systems and Static Bayesian Networks : Abstract: We expand the Probability Bracket Notation (PBN), a symbolic framework inspired by the Dirac notation in quantum mechanics, to multivariable probability systems and static Bayesian networks ...
- AI and Jobs: Has the Inflection Point Arrived? Evidence from an Online Labor Platform : Abstract: This study investigates how artificial intelligence (AI) influences various online labor markets (OLMs) over time. Employing the Difference-in-Differences method, we discovered two distinct ...
- Grammar-Aligned Decoding : Abstract: Large Language Models (LLMs) struggle with reliably generating highly structured outputs, such as program code, mathematical formulas, or well-formed markup. Constrained decoding approaches ...
- UpBench: A Dynamically Evolving Real-World Labor-Market Agentic Benchmark Framework Built for Human-Centric AI : Abstract: As large language model (LLM) agents increasingly undertake digital work, reliable frameworks are needed to evaluate their real-world competence, adaptability, and capacity for human collabo...
- MedRule-KG: A Knowledge-Graph--Steered Scaffold for Reliable Mathematical and Biomedical Reasoning : Abstract: We study how to impose domain-consistent structure on large language models (LLMs) used for scientific reasoning and early-stage drug discovery. We present MedRule-KG, a compact knowledge-gr...
- Unified Smart Factory Model: A model-based Approach for Integrating Industry 4.0 and Sustainability for Manufacturing Systems : Abstract: This paper presents the Unified Smart Factory Model (USFM), a comprehensive framework designed to translate high-level sustainability goals into measurable factory-level indicators with a sy...
- 3DSGrasp: 3D Shape-Completion for Robotic Grasp : Abstract: Real-world robotic grasping can be done robustly if a complete 3D Point Cloud Data (PCD) of an object is available. However, in practice, PCDs are often incomplete when objects are viewed fr...
- M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation : Abstract: In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Mult...
- Multimodal Learning for Scalable Representation of High-Dimensional Medical Data : Abstract: Integrating artificial intelligence (AI) with healthcare data is rapidly transforming medical diagnostics and driving progress toward precision medicine. However, effectively leveraging mult...
- CORL: Reinforcement Learning of MILP Policies Solved via Branch and Bound : Abstract: Combinatorial sequential decision making problems are typically modeled as mixed integer linear programs (MILPs) and solved via branch and bound (B&B) algorithms. The inherent difficulty of ...
- Deep Learning--Accelerated Multi-Start Large Neighborhood Search for Real-time Freight Bundling : Abstract: Online Freight Exchange Systems (OFEX) play a crucial role in modern freight logistics by facilitating real-time matching between shippers and carrier. However, efficient combinatorial bundl...
- FutureWeaver: Planning Test-Time Compute for Multi-Agent Systems with Modularized Collaboration : Abstract: Scaling test-time computation improves large language model performance without additional training. Recent work demonstrates that techniques such as repeated sampling, self-verification, an...
- A-LAMP: Agentic LLM-Based Framework for Automated MDP Modeling and Policy Generation : Abstract: Applying reinforcement learning (RL) to real-world tasks requires converting informal descriptions into a formal Markov decision process (MDP), implementing an executable environment, and tr...
- TriFlow: A Progressive Multi-Agent Framework for Intelligent Trip Planning : Abstract: Real-world trip planning requires transforming open-ended user requests into executable itineraries under strict spatial, temporal, and budgetary constraints while aligning with user prefere...
- CAPTURE: A Benchmark and Evaluation for LVLMs in CAPTCHA Resolving : Abstract: Benefiting from strong and efficient multi-modal alignment strategies, Large Visual Language Models (LVLMs) are able to simulate human visual and reasoning capabilities, such as solving CAPT...
- Towards Trustworthy Multi-Turn LLM Agents via Behavioral Guidance : Abstract: Large Language Models demonstrate strong reasoning and generation abilities, yet their behavior in multi-turn tasks often lacks reliability and verifiability. We present a task completion fr...
- AgentBalance: Backbone-then-Topology Design for Cost-Effective Multi-Agent Systems under Budget Constraints : Abstract: Large Language Model (LLM)-based multi-agent systems (MAS) are becoming indispensable building blocks for web-scale applications such as web search, social network analytics, and online cust...
- Back to the Baseline: Examining Baseline Effects on Explainability Metrics : Abstract: Attribution methods are among the most prevalent techniques in Explainable Artificial Intelligence (XAI) and are usually evaluated and compared using Fidelity metrics, with Insertion and Del...
- Motif-2-12.7B-Reasoning: A Practitioner's Guide to RL Training Recipes : Abstract: We introduce Motif-2-12.7B-Reasoning, a 12.7B parameter language model designed to bridge the gap between open-weight systems and proprietary frontier models in complex reasoning and long-co...
- Three methods, one problem: Classical and AI approaches to no-three-in-line : Abstract: The No-Three-In-Line problem asks for the maximum number of points that can be placed on an n by n grid with no three collinear, representing a famous problem in combinatorial geometry. Whil...
- General-purpose AI models can generate actionable knowledge on agroecological crop protection : Abstract: Generative artificial intelligence (AI) offers potential for democratizing scientific knowledge and converting this to clear, actionable information, yet its application in agri-food science...
- BAID: A Benchmark for Bias Assessment of AI Detectors : Abstract: AI-generated text detectors have recently gained adoption in educational and professional contexts. Prior research has uncovered isolated cases of bias, particularly against English Language...
- EmeraldMind: A Knowledge Graph-Augmented Framework for Greenwashing Detection : Abstract: As AI and web agents become pervasive in decision-making, it is critical to design intelligent systems that not only support sustainability efforts but also guard against misinformation. Gre...
- AI-MASLD Metabolic Dysfunction and Information Steatosis of Large Language Models in Unstructured Clinical Narratives : Abstract: This study aims to simulate real-world clinical scenarios to systematically evaluate the ability of Large Language Models (LLMs) to extract core medical information from patient chief compla...
- AI Benchmark Democratization and Carpentry : Abstract: Benchmarks are a cornerstone of modern machine learning, enabling reproducibility, comparison, and scientific progress. However, AI benchmarks are increasingly complex, requiring dynamic, AI...
- Causal Inference in Energy Demand Prediction : Abstract: Energy demand prediction is critical for grid operators, industrial energy consumers, and service providers. Energy demand is influenced by multiple factors, including weather conditions...
- MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition : Abstract: Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, a...
- Measuring skill-based uplift from AI in a real biological laboratory : Abstract: Understanding how AI systems are used by people in real situations that mirror aspects of both legitimate and illegitimate use is key to predicting the risks and benefits of AI systems. This...
- AI as Cognitive Amplifier: Rethinking Human Judgment in the Age of Generative AI : Abstract: Through extensive experience training professionals and individual users in AI tool adoption since the GPT-3 era, I have observed a consistent pattern: the same AI tool produces dramatically...
- Scalable Data Synthesis for Computer Use Agents with Step-Level Filtering : Abstract: Computer use agents (CUAs) can operate real-world digital interfaces but remain difficult to train due to the high cost of graphical user interface (GUI) interaction and the scarcity of high...
- Emotion-Driven Personalized Recommendation for AI-Generated Content Using Multi-Modal Sentiment and Intent Analysis : Abstract: With the rapid growth of AI-generated content (AIGC) across domains such as music, video, and literature, the demand for emotionally aware recommendation systems has become increasingly impo...
- Multimodal Fusion of Regional Brain Experts for Interpretable Alzheimer's Disease Diagnosis : Abstract: Accurate and early diagnosis of Alzheimer's disease (AD) can benefit from integrating complementary information from multiple modalities, mirroring clinical practice. However, conventional f...
- Agent-Based Modular Learning for Multimodal Emotion Recognition in Human-Agent Systems : Abstract: Effective human-agent interaction (HAI) relies on accurate and adaptive perception of human emotional states. While multimodal deep learning models - leveraging facial expressions, speech, a...
- Cognitive Mirrors: Exploring the Diverse Functional Roles of Attention Heads in LLM Reasoning : Abstract: Large language models (LLMs) have achieved state-of-the-art performance in a variety of tasks, but remain largely opaque in terms of their internal mechanisms. Understanding these mechanisms...
- Reducing Fragmentation and Starvation in GPU Clusters through Dynamic Multi-Objective Scheduling : Abstract: GPU clusters have become essential for training and deploying modern AI systems, yet real deployments continue to report average utilization near 50%. This inefficiency is largely caused by ...
Research Sources: 281 | Generated: 12/15/2025
