AI Research News Feeds for January 1st, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

AI-Driven Evaluation of Surgical Skill via Action Recognition : Abstract: The development of effective training and evaluation strategies is critical. Conventional methods for assessing surgical proficiency typically rely on expert supervision, either through onsi...
Exploring Compositionality in Vision Transformers using Wavelet Representations : Abstract: While insights into the workings of the transformer model have largely emerged by analysing their behaviour on language tasks, this work investigates the representations learnt by the Vision...
Using Large Language Models To Translate Machine Results To Human Results : Abstract: Artificial intelligence (AI) has transformed medical imaging, with computer vision (CV) systems achieving state-of-the-art performance in classification and detection tasks. However, these s...
Hierarchical Vector-Quantized Latents for Perceptual Low-Resolution Video Compression : Abstract: The exponential growth of video traffic has placed increasing demands on bandwidth and storage infrastructure, particularly for content delivery networks (CDNs) and edge devices. While tradi...
PhyGDPO: Physics-Aware Groupwise Direct Preference Optimization for Physically Consistent Text-to-Video Generation : Abstract: Recent advances in text-to-video (T2V) generation have achieved good visual quality, yet synthesizing videos that faithfully follow physical laws remains an open challenge. Existing methods ...
OCP-LS: An Efficient Algorithm for Visual Localization : Abstract: This paper proposes a novel second-order optimization algorithm. It aims to address large-scale optimization problems in deep learning because it incorporates the OCP method and appropriatel...
RGBT-Ground Benchmark: Visual Grounding Beyond RGB in Complex Real-World Scenarios : Abstract: Visual Grounding (VG) aims to localize specific objects in an image according to natural language expressions, serving as a fundamental task in vision-language understanding. However, existi...
Improving Few-Shot Change Detection Visual Question Answering via Decision-Ambiguity-guided Reinforcement Fine-Tuning : Abstract: Change detection visual question answering (CDVQA) requires answering text queries by reasoning about semantic changes in bi-temporal remote sensing images. A straightforward approach is to ...
SliceLens: Fine-Grained and Grounded Error Slice Discovery for Multi-Instance Vision Tasks : Abstract: Systematic failures of computer vision models on subsets with coherent visual patterns, known as error slices, pose a critical challenge for robust model evaluation. Existing slice discovery...
Collaborative Low-Rank Adaptation for Pre-Trained Vision Transformers : Abstract: Low-rank adaptation (LoRA) has achieved remarkable success in fine-tuning pre-trained vision transformers for various downstream tasks. Existing studies mainly focus on exploring more parame...
MoniRefer: A Real-world Large-scale Multi-modal Dataset based on Roadside Infrastructure for 3D Visual Grounding : Abstract: 3D visual grounding aims to localize the object in 3D point cloud scenes that semantically corresponds to given natural language sentences. It is very critical for roadside infrastructure sy...
LLHA-Net: A Hierarchical Attention Network for Two-View Correspondence Learning : Abstract: Establishing the correct correspondence of feature points is a fundamental task in computer vision. However, the presence of numerous outliers among the feature points can significantly affe...
FireRescue: A UAV-Based Dataset and Enhanced YOLO Model for Object Detection in Fire Rescue Scenes : Abstract: Object detection in fire rescue scenarios is importance for command and decision-making in firefighting operations. However, existing research still suffers from two main limitations. First,...
From Sequential to Spatial: Reordering Autoregression for Efficient Visual Generation : Abstract: Inspired by the remarkable success of autoregressive models in language modeling, this paradigm has been widely adopted in visual generation. However, the sequential token-by-token decoding ...
FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation : Abstract: In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly...
EchoFoley: Event-Centric Hierarchical Control for Video Grounded Creative Sound Generation : Abstract: Sound effects build an essential layer of multimodal storytelling, shaping the emotional atmosphere and the narrative semantics of videos. Despite recent advancement in video-text-to-audio (...
Splatwizard: A Benchmark Toolkit for 3D Gaussian Splatting Compression : Abstract: The recent advent of 3D Gaussian Splatting (3DGS) has marked a significant breakthrough in real-time novel view synthesis. However, the rapid proliferation of 3DGS-based algorithms has creat...
UniC-Lift: Unified 3D Instance Segmentation via Contrastive Learning : Abstract: 3D Gaussian Splatting (3DGS) and Neural Radiance Fields (NeRF) have advanced novel-view synthesis. Recent methods extend multi-view 2D segmentation to 3D, enabling instance/semantic segmenta...
CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture : Abstract: Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlus...
VLN-MME: Diagnosing MLLMs as Language-guided Visual Navigation agents : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities across a wide range of vision-language tasks. However, their performance as embodied agents, which requires...
OFL-SAM2: Prompt SAM2 with Online Few-shot Learner for Efficient Medical Image Segmentation : Abstract: The Segment Anything Model 2 (SAM2) has demonstrated remarkable promptable visual segmentation capabilities in video data, showing potential for extension to medical image segmentation (MIS)...
FinMMDocR: Benchmarking Financial Multimodal Reasoning with Scenario Awareness, Document Understanding, and Multi-Step Computation : Abstract: We introduce FinMMDocR, a novel bilingual multimodal benchmark for evaluating multimodal large language models (MLLMs) on real-world financial numerical reasoning. Compared to existing bench...
Semi-Supervised Diversity-Aware Domain Adaptation for 3D Object detection : Abstract: 3D object detectors are fundamental components of perception systems in autonomous vehicles. While these detectors achieve remarkable performance on standard autonomous driving benchmarks, t...
VIPER: Process-aware Evaluation for Generative Video Reasoning : Abstract: Recent breakthroughs in video generation have demonstrated an emerging capability termed Chain-of-Frames (CoF) reasoning, where models resolve complex tasks through the generation of continu...
Bi-C2R: Bidirectional Continual Compatible Representation for Re-indexing Free Lifelong Person Re-identification : Abstract: Lifelong person Re-IDentification (L-ReID) exploits sequentially collected data to continuously train and update a ReID model, focusing on the overall performance of all data. Its main chall...
FoundationSLAM: Unleashing the Power of Depth Foundation Models for End-to-End Dense Visual SLAM : Abstract: We present FoundationSLAM, a learning-based monocular dense SLAM system that addresses the absence of geometric consistency in previous flow-based approaches for accurate and robust tracking...
From Inpainting to Editing: A Self-Bootstrapping Framework for Context-Rich Visual Dubbing : Abstract: Audio-driven visual dubbing aims to synchronize a video's lip movements with new speech, but is fundamentally challenged by the lack of ideal training data: paired videos where only a subjec...
FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion : Abstract: Recognizing fine-grained actions from temporally corrupted skeleton sequences remains a significant challenge, particularly in real-world scenarios where online pose estimation often yields ...
Edit3r: Instant 3D Scene Editing from Sparse Unposed Images : Abstract: We present Edit3r, a feed-forward framework that reconstructs and edits 3D scenes in a single pass from unposed, view-inconsistent, instruction-edited images. Unlike prior methods requiring ...
GaMO: Geometry-aware Multi-view Diffusion Outpainting for Sparse-View 3D Reconstruction : Abstract: Recent advances in 3D reconstruction have achieved remarkable progress in high-quality scene capture from dense multi-view imagery, yet struggle when input views are limited. Various approac...
Learning to Feel the Future: DreamTacVLA for Contact-Rich Manipulation : Abstract: Vision-Language-Action (VLA) models have shown remarkable generalization by mapping web-scale knowledge to robotic control, yet they remain blind to physical contact. Consequently, they stru...
One-Shot Structured Pruning of Quantum Neural Networks via $q$-Group Engineering and Quantum Geometric Metrics : Abstract: Quantum neural networks (QNNs) suffer from severe gate-level redundancy, which hinders their deployment on noisy intermediate-scale quantum (NISQ) devices. In this work, we propose q-iPrune,...
Targeted Semantic Segmentation of Himalayan Glacial Lakes Using Time-Series SAR: Towards Automated GLOF Early Warning : Abstract: Glacial Lake Outburst Floods (GLOFs) are one of the most devastating climate change induced hazards. Existing remote monitoring approaches often prioritise maximising spatial coverage to tra...
RANGER: A Monocular Zero-Shot Semantic Navigation Framework through Contextual Adaptation : Abstract: Efficiently finding targets in complex environments is fundamental to real-world embodied applications. While recent advances in multimodal foundation models have enabled zero-shot object go...
Geometric Multi-Session Map Merging with Learned Local Descriptors : Abstract: Multi-session map merging is crucial for extended autonomous operations in large-scale environments. In this paper, we present GMLD, a learning-based local descriptor framework for large-sca...
Training-Free Color-Aware Adversarial Diffusion Sanitization for Diffusion Stegomalware Defense at Security Gateways : Abstract: The rapid expansion of generative AI has normalized large-scale synthetic media creation, enabling new forms of covert communication. Recent generative steganography methods, particularly th...
Towards autonomous time-calibration of large quantum-dot devices: Detection, real-time feedback, and noise spectroscopy : Abstract: The performance and scalability of semiconductor quantum-dot (QD) qubits are limited by electrostatic drift and charge noise that shift operating points and destabilize qubit parameters. As ...
PhysTalk: Language-driven Real-time Physics in 3D Gaussian Scenes : Abstract: Realistic visual simulations are omnipresent, yet their creation requires computing time, rendering, and expert animation knowledge. Open-vocabulary visual effects generation from text input...
HIDFlowNet: A Flow-Based Deep Network for Hyperspectral Image Denoising : Abstract: Hyperspectral image (HSI) denoising is essentially ill-posed since a noisy HSI can be degraded from multiple clean HSIs. However, existing deep learning (DL)-based approaches only restore on...
Matching Semantically Similar Non-Identical Objects : Abstract: Not identical but similar objects are ubiquitous in our world, ranging from four-legged animals such as dogs and cats to cars of different models and flowers of various colors. This study ad...
Reconstructing Hand-Held Objects in 3D from Images and Videos : Abstract: Objects manipulated by the hand (i.e., manipulanda) are particularly challenging to reconstruct from Internet videos. Not only does the hand occlude much of the object, but also the object i...
INST-IT: Boosting Instance Understanding via Explicit Visual Prompt Instruction Tuning : Abstract: Large Multimodal Models (LMMs) have made significant breakthroughs with the advancement of instruction tuning. However, while existing models can understand images and videos at a holistic l...
Hierarchical Context Alignment with Disentangled Geometric and Temporal Modeling for Semantic Occupancy Prediction : Abstract: Camera-based 3D Semantic Occupancy Prediction (SOP) is crucial for understanding complex 3D scenes from limited 2D image observations. Existing SOP methods typically aggregate contextual fea...
OnlineVPO: Align Video Diffusion Model with Online Video-Centric Preference Optimization : Abstract: Video diffusion models (VDMs) have demonstrated remarkable capabilities in text-to-video (T2V) generation. Despite their success, VDMs still suffer from degraded image quality and flickering...
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment : Abstract: Despite Contrastive Language-Image Pretraining (CLIP)'s remarkable capability to retrieve content across modalities, a substantial modality gap persists in its feature space. Intriguingly, w...
GaussianImage++: Boosted Image Representation and Compression with 2D Gaussian Splatting : Abstract: Implicit neural representations (INRs) have achieved remarkable success in image representation and compression, but they require substantial training time and memory. Meanwhile, recent 2D G...
Korean Canonical Legal Benchmark: Toward Knowledge-Independent Evaluation of LLMs' Legal Reasoning Capabilities : Abstract: We introduce the Korean Canonical Legal Benchmark (KCL), a benchmark designed to assess language models' legal reasoning capabilities independently of domain-specific knowledge. KCL provides...
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models : Abstract: We introduce Youtu-LLM, a lightweight yet powerful language model that harmonizes high computational efficiency with native agentic intelligence. Unlike typical small models that rely on dis...
MUSIC: MUlti-Step Instruction Contrast for Multi-Turn Reward Models : Abstract: Evaluating the quality of multi-turn conversations is crucial for developing capable Large Language Models (LLMs), yet remains a significant challenge, often requiring costly human evaluatio...
BIOME-Bench: A Benchmark for Biomolecular Interaction Inference and Multi-Omics Pathway Mechanism Elucidation from Scientific Literature : Abstract: Multi-omics studies often rely on pathway enrichment to interpret heterogeneous molecular changes, but pathway enrichment (PE)-based workflows inherit structural limitations of pathway resou...
Uncertainty-aware Semi-supervised Ensemble Teacher Framework for Multilingual Depression Detection : Abstract: Detecting depression from social media text is still a challenging task. This is due to different language styles, informal expression, and the lack of annotated data in many languages. To t...
Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models : Abstract: Large Language Models (LLMs) are demonstrating rapid improvements on complex reasoning benchmarks, particularly when allowed to utilize intermediate reasoning steps before converging on a fi...
Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability : Abstract: Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such mode...
BEDA: Belief Estimation as Probabilistic Constraints for Performing Strategic Dialogue Acts : Abstract: Strategic dialogue requires agents to execute distinct dialogue acts, for which belief estimation is essential. While prior work often estimates beliefs accurately, it lacks a principled mec...
MAMA-Memeia! Multi-Aspect Multi-Agent Collaboration for Depressive Symptoms Identification in Memes : Abstract: Over the past years, memes have evolved from being exclusively a medium of humorous exchanges to one that allows users to express a range of emotions freely and easily. With the ever-growing...
Quantum Visual Word Sense Disambiguation: Unraveling Ambiguities Through Quantum Inference Model : Abstract: Visual word sense disambiguation focuses on polysemous words, where candidate images can be easily confused. Traditional methods use classical probability to calculate the likelihood of an i...
Vibe Coding, Interface Flattening : Abstract: Large language models are reshaping programming by enabling 'vibe coding': the development of softwares through natural-language interaction with model-driven toolchains. This article argues...
CPJ: Explainable Agricultural Pest Diagnosis via Caption-Prompt-Judge with LLM-Judged Refinement : Abstract: Accurate and interpretable crop disease diagnosis is essential for agricultural decision-making, yet existing methods often rely on costly supervised fine-tuning and perform poorly under dom...
Large language models and the entropy of English : Abstract: We use large language models (LLMs) to uncover long-ranged structure in English texts from a variety of sources. The conditional entropy or code length in many cases continues to decrease wi...
CascadeNS: Confidence-Cascaded Neurosymbolic Model for Sarcasm Detection : Abstract: Sarcasm detection in product reviews requires balancing domain-specific symbolic pattern recognition with deep semantic understanding. Symbolic representations capture explicit linguistic ph...
Leveraging Synthetic Priors for Monocular Depth Estimation in Specular Surgical Environments : Abstract: Accurate Monocular Depth Estimation (MDE) is critical for robotic surgery but remains fragile in specular, fluid-filled endoscopic environments. Existing self-supervised methods, typically r...
Pretraining Frame Preservation in Autoregressive Video Memory Compression : Abstract: We present PFP, a neural network structure to compress long videos into short contexts, with an explicit pretraining objective to preserve the high-frequency details of single frames at arbi...
MRI-to-CT Synthesis With Cranial Suture Segmentations Using A Variational Autoencoder Framework : Abstract: Quantifying normative pediatric cranial development and suture ossification is crucial for diagnosing and treating growth-related cephalic disorders. Computed tomography (CT) is widely used ...
Scaling Remote Sensing Foundation Models: Data Domain Tradeoffs at the Peta-Scale : Abstract: We explore the scaling behaviors of artificial intelligence to establish practical techniques for training foundation models on high-resolution electro-optical (EO) datasets that exceed the ...
Learning to learn skill assessment for fetal ultrasound scanning : Abstract: Traditionally, ultrasound skill assessment has relied on expert supervision and feedback, a process known for its subjectivity and time-intensive nature. Previous works on quantitative and a...
MGML: A Plug-and-Play Meta-Guided Multi-Modal Learning Framework for Incomplete Multimodal Brain Tumor Segmentation : Abstract: Leveraging multimodal information from Magnetic Resonance Imaging (MRI) plays a vital role in lesion segmentation, especially for brain tumors. However, in clinical practice, multimodal MRI ...
Learnable Query Aggregation with KV Routing for Cross-view Geo-localisation : Abstract: Cross-view geo-localisation (CVGL) aims to estimate the geographic location of a query image by matching it with images from a large-scale database. However, the significant view-point discr...
Kinematic-Based Assessment of Surgical Actions in Microanastomosis : Abstract: Proficiency in microanastomosis is a critical surgical skill in neurosurgery, where the ability to precisely manipulate fine instruments is crucial to successful outcomes. These procedures r...
U-Net-Like Spiking Neural Networks for Single Image Dehazing : Abstract: Image dehazing is a critical challenge in computer vision, essential for enhancing image clarity in hazy conditions. Traditional methods often rely on atmospheric scattering models, while re...
T2VAttack: Adversarial Attack on Text-to-Video Diffusion Models : Abstract: The rapid evolution of Text-to-Video (T2V) diffusion models has driven remarkable advancements in generating high-quality, temporally coherent videos from natural language descriptions. Desp...
DriveExplorer: Images-Only Decoupled 4D Reconstruction with Progressive Restoration for Driving View Extrapolation : Abstract: This paper presents an effective solution for view extrapolation in autonomous driving scenarios. Recent approaches focus on generating shifted novel view images from given viewpoints using ...
Anomaly detection in satellite imagery through temporal inpainting : Abstract: Detecting surface changes from satellite imagery is critical for rapid disaster response and environmental monitoring, yet remains challenging due to the complex interplay between atmospheri...
GCA-ResUNet: Medical Image Segmentation Using Grouped Coordinate Attention : Abstract: Accurate segmentation of heterogeneous anatomical structures is pivotal for computer-aided diagnosis and subsequent clinical decision-making. Although U-Net based convolutional neural networ...
Bridging Structure and Appearance: Topological Features for Robust Self-Supervised Segmentation : Abstract: Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as ...
Improved 3D Gaussian Splatting of Unknown Spacecraft Structure Using Space Environment Illumination Knowledge : Abstract: This work presents a novel pipeline to recover the 3D structure of an unknown target spacecraft from a sequence of images captured during Rendezvous and Proximity Operations (RPO) in space. ...
Bridging the Perception-Cognition Gap:Re-engineering SAM2 with Hilbert-Mamba for Robust VLM-based Medical Diagnosis : Abstract: Recent studies suggest that Visual Language Models (VLMs) hold great potential for tasks such as automated medical diagnosis. However, processing complex three-dimensional (3D) multimodal me...
On Exact Editing of Flow-Based Diffusion Models : Abstract: Recent methods in flow-based diffusion editing have enabled direct transformations between source and target image distribution without explicit inversion. However, the latent trajectories i...
FitControler: Toward Fit-Aware Virtual Try-On : Abstract: Realistic virtual try-on (VTON) concerns not only faithful rendering of garment details but also coordination of the style. Prior art typically pursues the former, but neglects a key factor ...
Structure-Guided Allocation of 2D Gaussians for Image Representation and Compression : Abstract: Recent advances in 2D Gaussian Splatting (2DGS) have demonstrated its potential as a compact image representation with millisecond-level decoding. However, existing 2DGS-based pipelines allo...
Reinforced Diffusion: Learning to Push the Limits of Anisotropic Diffusion for Image Denoising : Abstract: Image denoising is an important problem in low-level vision and serves as a critical module for many image recovery tasks. Anisotropic diffusion is a wide family of image denoising approache...
Neighbor-aware Instance Refining with Noisy Labels for Cross-Modal Retrieval : Abstract: In recent years, Cross-Modal Retrieval (CMR) has made significant progress in the field of multi-modal analysis. However, since it is time-consuming and labor-intensive to collect large-scal...
Balanced Hierarchical Contrastive Learning with Decoupled Queries for Fine-grained Object Detection in Remote Sensing Images : Abstract: Fine-grained remote sensing datasets often use hierarchical label structures to differentiate objects in a coarse-to-fine manner, with each object annotated across multiple levels. However, ...
RainFusion2.0: Temporal-Spatial Awareness and Hardware-Efficient Block-wise Sparse Attention : Abstract: In video and image generation tasks, Diffusion Transformer (DiT) models incur extremely high computational costs due to attention mechanisms, which limits their practical applications. Furth...
Think Before You Move: Latent Motion Reasoning for Text-to-Motion Generation : Abstract: Current state-of-the-art paradigms predominantly treat Text-to-Motion (T2M) generation as a direct translation problem, mapping symbolic language directly to continuous poses. While effectiv...
Guided Diffusion-based Generation of Adversarial Objects for Real-World Monocular Depth Estimation Attacks : Abstract: Monocular Depth Estimation (MDE) serves as a core perception module in autonomous driving systems, but it remains highly susceptible to adversarial attacks. Errors in depth estimation may pr...
GeoBench: Rethinking Multimodal Geometric Problem-Solving via Hierarchical Evaluation : Abstract: Geometric problem solving constitutes a critical branch of mathematical reasoning, requiring precise analysis of shapes and spatial relationships. Current evaluations of geometric reasoning ...
Taming Preference Mode Collapse via Directional Decoupling Alignment in Diffusion Reinforcement Learning : Abstract: Recent studies have demonstrated significant progress in aligning text-to-image diffusion models with human preference via Reinforcement Learning from Human Feedback. However, while existing...
Towards Open-Vocabulary Industrial Defect Understanding with a Large-Scale Multimodal Dataset : Abstract: We present IMDD-1M, the first large-scale Industrial Multimodal Defect Dataset comprising 1,000,000 aligned image-text pairs, designed to advance multimodal learning for manufacturing and qu...
Bayesian Self-Distillation for Image Classification : Abstract: Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-...
DiffThinker: Towards Generative Multimodal Reasoning with Diffusion Models : Abstract: While recent Multimodal Large Language Models (MLLMs) have attained significant strides in multimodal reasoning, their reasoning processes remain predominantly text-centric, leading to subop...
CorGi: Contribution-Guided Block-Wise Interval Caching for Training-Free Acceleration of Diffusion Transformers : Abstract: Diffusion transformer (DiT) achieves remarkable performance in visual generation, but its iterative denoising process combined with larger capacity leads to a high inference cost. Recent wor...
ARM: A Learnable, Plug-and-Play Module for CLIP-based Open-vocabulary Semantic Segmentation : Abstract: Open-vocabulary semantic segmentation (OVSS) is fundamentally hampered by the coarse, image-level representations of CLIP, which lack precise pixel-level details. Existing training-free meth...
Mirage: One-Step Video Diffusion for Photorealistic and Coherent Asset Editing in Driving Scenes : Abstract: Vision-centric autonomous driving systems rely on diverse and scalable training data to achieve robust performance. While video object editing offers a promising path for data augmentation, ...
MambaSeg: Harnessing Mamba for Accurate and Efficient Image-Event Semantic Segmentation : Abstract: Semantic segmentation is a fundamental task in computer vision with wide-ranging applications, including autonomous driving and robotics. While RGB-based methods have achieved strong perform...
Physically-Grounded Manifold Projection with Foundation Priors for Metal Artifact Reduction in Dental CBCT : Abstract: Metal artifacts in Dental CBCT severely obscure anatomical structures, hindering diagnosis. Current deep learning for Metal Artifact Reduction (MAR) faces limitations: supervised methods suf...
LiftProj: Space Lifting and Projection-Based Panorama Stitching : Abstract: Traditional image stitching techniques have predominantly utilized two-dimensional homography transformations and mesh warping to achieve alignment on a planar surface. While effective for s...
UniAct: Unified Motion Generation and Action Streaming for Humanoid Robots : Abstract: A long-standing objective in humanoid robotics is the realization of versatile agents capable of following diverse multimodal instructions with human-level flexibility. Despite advances in h...
Robust Egocentric Referring Video Object Segmentation via Dual-Modal Causal Intervention : Abstract: Egocentric Referring Video Object Segmentation (Ego-RVOS) aims to segment the specific object actively involved in a human action, as described by a language query, within first-person video...
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning : Abstract: While Vision-Language Models (VLMs) can solve complex tasks through agentic reasoning, their capabilities remain largely constrained to text-oriented chain-of-thought or isolated tool invoca...
Spatial-aware Vision Language Model for Autonomous Driving : Abstract: While Vision-Language Models (VLMs) show significant promise for end-to-end autonomous driving by leveraging the common sense embedded in language models, their reliance on 2D image cues for...
The Mechanics of CNN Filtering with Rectification : Abstract: This paper proposes elementary information mechanics as a new model for understanding the mechanical properties of convolutional filtering with rectification, inspired by physical theories o...
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems : Abstract: The rapid advancement of autonomous systems, including self-driving vehicles and drones, has intensified the need to forge true Spatial Intelligence from multi-modal onboard sensor data. Whi...
RedunCut: Measurement-Driven Sampling and Accuracy Performance Modeling for Low-Cost Live Video Analytics : Abstract: Live video analytics (LVA) runs continuously across massive camera fleets, but inference cost with modern vision models remains high. To address this, dynamic model size selection (DMSS) is ...
DyStream: Streaming Dyadic Talking Heads Generation via Flow Matching-based Autoregressive Model : Abstract: Generating realistic, dyadic talking head video requires ultra-low latency. Existing chunk-based methods require full non-causal context windows, introducing significant delays. This high la...
Jacobian-Enhanced Neural Networks : Abstract: Jacobian-Enhanced Neural Networks (JENN) are densely connected multi-layer perceptrons, whose training process is modified to predict partial derivatives accurately. Their main benefit is be...
UnPaSt: unsupervised patient stratification by biclustering of omics data : Abstract: Unsupervised patient stratification is essential for disease subtype discovery, yet, despite growing evidence of molecular heterogeneity of non-oncological diseases, popular methods are benc...
Minibatch Optimal Transport and Perplexity Bound Estimation in Discrete Flow Matching : Abstract: Discrete flow matching, a recent framework for modeling categorical data, has shown competitive performance with autoregressive models. However, unlike continuous flow matching, the rectific...
The Generalization Error of Supervised Machine Learning Algorithms : Abstract: In this paper, the method of gaps, a technique for deriving closed-form expressions in terms of information measures for the generalization error of supervised machine learning algorithms is...
Private Linear Regression with Differential Privacy and PAC Privacy : Abstract: Linear regression is a fundamental tool for statistical analysis, which has motivated the development of linear regression methods that satisfy provable privacy guarantees so that the learne...
Revisiting Agnostic Boosting : Abstract: Boosting is a key method in statistical learning, allowing for converting weak learners into strong ones. While well studied in the realizable case, the statistical properties of weak-to-str...
Towards Privacy-Preserving and Heterogeneity-aware Split Federated Learning via Probabilistic Masking : Abstract: Split Federated Learning (SFL) has emerged as an efficient alternative to traditional Federated Learning (FL) by reducing client-side computation through model partitioning. However, exchang...
Efficient Active Learning with Abstention : Abstract: The goal of active learning is to achieve the same accuracy achievable by passive learning, while using much fewer labels. Exponential savings in terms of label complexity have been proved i...
Machine learning for option pricing: an empirical investigation of network architectures : Abstract: We consider the supervised learning problem of learning the price of an option or the implied volatility given appropriate input data (model parameters) and corresponding output data (option...
Generative Modelling of L\'evy Area for High Order SDE Simulation : Abstract: It is well understood that, when numerically simulating SDEs with general noise, achieving a strong convergence rate better than $O(\sqrt{h})$ (where h is the step size) requires the use of ...
Content-based Recommendation Engine for Video Streaming Platform : Abstract: Recommendation engines suggest content, products, or services to the user by using machine learning algorithms. This paper proposes a content-based recommendation engine that provides person...
Multi-fidelity Bayesian Optimization: A Review : Abstract: Resided at the intersection of multi-fidelity optimization (MFO) and Bayesian optimization (BO), MF BO has found a niche in solving expensive engineering design optimization problems, thanks...
Distribution-Dependent Rates for Multi-Distribution Learning : Abstract: To address the needs of modeling uncertainty in sensitive machine learning applications, the setup of distributionally robust optimization (DRO) seeks good performance uniformly across a var...
Myopically Verifiable Probabilistic Certificates for Safe Control and Learning : Abstract: This paper addresses the design of safety certificates for stochastic systems, with a focus on ensuring long-term safety through fast real-time control. In stochastic environments, set invar...
Symmetric Linear Bandits with Hidden Symmetry : Abstract: High-dimensional linear bandits with low-dimensional structure have received considerable attention in recent studies due to their practical significance. The most common structure in the li...
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs : Abstract: Spurious bias, a tendency to exploit spurious correlations between superficial input attributes and prediction targets, has revealed a severe robustness pitfall in classical machine learning...
Automatic identification of diagnosis from hospital discharge letters via weakly-supervised Natural Language Processing : Abstract: Identifying patient diagnoses from discharge letters is essential to enable large-scale cohort selection and epidemiological research, but traditional supervised approaches rely on extensive...
CAT: A Metric-Driven Framework for Analyzing the Consistency-Accuracy Relation of LLMs under Controlled Input Variations : Abstract: We introduce \textsc{CAT}, a framework designed to evaluate and visualize the \emph{interplay} of \emph{accuracy} and \emph{response consistency} of Large Language Models (LLMs) under contro...
PharmaShip: An Entity-Centric, Reading-Order-Supervised Benchmark for Chinese Pharmaceutical Shipping Documents : Abstract: We present PharmaShip, a real-world Chinese dataset of scanned pharmaceutical shipping documents designed to stress-test pre-trained text-layout models under noisy OCR and heterogeneous temp...
Noise-Driven Persona Formation in Reflexive Neural Language Generation : Abstract: This paper introduces the Luca-Noise Reflex Protocol (LN-RP), a computational framework for analyzing noise-driven persona emergence in large language models. By injecting stochastic noise s...
Emergent World Beliefs: Exploring Transformers in Stochastic Games : Abstract: Transformer-based large language models (LLMs) have demonstrated strong reasoning abilities across diverse fields, from solving programming challenges to competing in strategy-intensive game...
MiMo-Audio: Audio Language Models are Few-Shot Learners : Abstract: Existing audio language models typically rely on task-specific fine-tuning to accomplish particular audio tasks. In contrast, humans are able to generalize to new audio tasks with only a few...
Disentangling Learning from Judgment: Representation Learning for Open Response Analytics : Abstract: Open-ended responses are central to learning, yet automated scoring often conflates what students wrote with how teachers grade. We present an analytics-first framework that separates conten...
CEC-Zero: Zero-Supervision Character Error Correction with Self-Generated Rewards : Abstract: Large-scale Chinese spelling correction (CSC) remains critical for real-world text processing, yet existing LLMs and supervised methods lack robustness to novel errors and rely on costly ann...
WISE: Web Information Satire and Fakeness Evaluation : Abstract: Distinguishing fake or untrue news from satire or humor poses a unique challenge due to their overlapping linguistic features and divergent intent. This study develops WISE (Web Information ...
HY-MT1.5 Technical Report : Abstract: In this report, we introduce our latest translation models, HY-MT1.5-1.8B and HY-MT1.5-7B, a new family of machine translation models developed through a holistic training framework tailored...
Activation Steering for Masked Diffusion Language Models : Abstract: Masked diffusion language models (MDLMs) generate text through an iterative denoising process. They have recently gained attention due to mask-parallel decoding and competitive performance w...
Large Emotional World Model : Abstract: World Models serve as tools for understanding the current state of the world and predicting its future dynamics, with broad application potential across numerous fields. As a key component o...
Training Report of TeleChat3-MoE : Abstract: TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,tr...
MedKGI: Iterative Differential Diagnosis with Medical Knowledge Graphs and Information-Guided Inquiring : Abstract: Recent advancements in Large Language Models (LLMs) have demonstrated significant promise in clinical diagnosis. However, current models struggle to emulate the iterative, diagnostic hypothe...
LAILA: A Large Trait-Based Dataset for Arabic Automated Essay Scoring : Abstract: Automated Essay Scoring (AES) has gained increasing attention in recent years, yet research on Arabic AES remains limited due to the lack of publicly available datasets. To address this, we ...
Tracing the Flow of Knowledge From Science to Technology Using Deep Learning : Abstract: We develop a language similarity model suitable for working with patents and scientific publications at the same time. In a horse race-style evaluation, we subject eight language (similarity...
Automated Analysis of Sustainability Reports: Using Large Language Models for the Extraction and Prediction of EU Taxonomy-Compliant KPIs : Abstract: The manual, resource-intensive process of complying with the EU Taxonomy presents a significant challenge for companies. While Large Language Models (LLMs) offer a path to automation, resear...
Figure It Out: Improving the Frontier of Reasoning with Active Visual Thinking : Abstract: Complex reasoning problems often involve implicit spatial, geometric, and structural relationships that are not explicitly encoded in text. While recent reasoning models have achieved strong...
QianfanHuijin Technical Report: A Novel Multi-Stage Training Paradigm for Finance Industrial LLMs : Abstract: Domain-specific enhancement of Large Language Models (LLMs) within the financial context has long been a focal point of industrial application. While previous models such as BloombergGPT and...
World model inspired sarcasm reasoning with large language model agents : Abstract: Sarcasm understanding is a challenging problem in natural language processing, as it requires capturing the discrepancy between the surface meaning of an utterance and the speaker's intentio...
Cleaning English Abstracts of Scientific Publications : Abstract: Scientific abstracts are often used as proxies for the content and thematic focus of research publications. However, a significant share of published abstracts contains extraneous informatio...
IELTS Writing Revision Platform with Automated Essay Scoring and Adaptive Feedback : Abstract: This paper presents the design, development, and evaluation of a proposed revision platform assisting candidates for the International English Language Testing System (IELTS) writing exam. T...
Paragraph Segmentation Revisited: Towards a Standard Task for Structuring Speech : Abstract: Automatic speech transcripts are often delivered as unstructured word streams that impede readability and repurposing. We recast paragraph segmentation as the missing structuring step and fi...
Safe in the Future, Dangerous in the Past: Dissecting Temporal and Linguistic Vulnerabilities in LLMs : Abstract: As Large Language Models (LLMs) integrate into critical global infrastructure, the assumption that safety alignment transfers zero-shot from English to other languages remains a dangerous bl...
HaluNet: Multi-Granular Uncertainty Modeling for Efficient Hallucination Detection in LLM Question Answering : Abstract: Large Language Models (LLMs) excel at question answering (QA) but often generate hallucinations, including factual errors or fabricated content. Detecting hallucinations from internal uncert...
AODDiff: Probabilistic Reconstruction of Aerosol Optical Depth via Diffusion-based Bayesian Inference : Abstract: High-quality reconstruction of Aerosol Optical Depth (AOD) fields is critical for Atmosphere monitoring, yet current models remain constrained by the scarcity of complete training data and a...
Characterization of Transfer Using Multi-task Learning Curves : Abstract: Transfer effects manifest themselves both during training using a fixed data set and in inductive inference using accumulating data. We hypothesize that perturbing the data set by including ...
PRISM: A hierarchical multiscale approach for time series forecasting : Abstract: Forecasting is critical in areas such as finance, biology, and healthcare. Despite the progress in the field, making accurate forecasts remains challenging because real-world time series con...
Spectral Graph Neural Networks for Cognitive Task Classification in fMRI Connectomes : Abstract: Cognitive task classification using machine learning plays a central role in decoding brain states from neuroimaging data. By integrating machine learning with brain network analysis, comple...
Frequent subgraph-based persistent homology for graph classification : Abstract: Persistent homology (PH) has recently emerged as a powerful tool for extracting topological features. Integrating PH into machine learning and deep learning models enhances topology awarenes...
Attribution-Guided Distillation of Matryoshka Sparse Autoencoders : Abstract: Sparse autoencoders (SAEs) aim to disentangle model activations into monosemantic, human-interpretable features. In practice, learned features are often redundant and vary across training ru...
Efficiently Estimating Data Efficiency for Language Model Fine-tuning : Abstract: While large language models (LLMs) demonstrate reasonable zero-shot capability across many downstream tasks, fine-tuning is a common practice to improve their performance. However, a task's ...
Diffusion Language Models are Provably Optimal Parallel Samplers : Abstract: Diffusion language models (DLMs) have emerged as a promising alternative to autoregressive models for faster inference via parallel token generation. We provide a rigorous foundation for thi...
ResponseRank: Data-Efficient Reward Modeling through Preference Strength Learning : Abstract: Binary choices, as often used for reinforcement learning from human feedback (RLHF), convey only the direction of a preference. A person may choose apples over oranges and bananas over grape...
On the geometry and topology of representations: the manifolds of modular addition : Abstract: The Clock and Pizza interpretations, associated with architectures differing in either uniform or learnable attention, were introduced to argue that different architectural designs can yield...
Many Minds from One Model: Bayesian Transformers for Population Intelligence : Abstract: Despite their scale and success, modern transformers are almost universally trained as single-minded systems: optimization produces one deterministic set of parameters, representing a single...
Scaling Open-Ended Reasoning to Predict the Future : Abstract: High-stakes decision making involves reasoning under uncertainty about the future. In this work, we train language models to make predictions on open-ended forecasting questions. To scale up...
Spike-Timing-Dependent Plasticity for Bernoulli Message Passing : Abstract: Bayesian inference provides a principled framework for understanding brain function, while neural activity in the brain is inherently spike-based. This paper bridges these two perspectives b...
Governing Cloud Data Pipelines with Agentic AI : Abstract: Cloud data pipelines increasingly operate under dynamic workloads, evolving schemas, cost constraints, and strict governance requirements. Despite advances in cloud-native orchestration fram...
Fitted Q Evaluation Without Bellman Completeness via Stationary Weighting : Abstract: Fitted Q-evaluation (FQE) is a central method for off-policy evaluation in reinforcement learning, but it generally requires Bellman completeness: that the hypothesis class is closed under t...
Energy-Tweedie: Score meets Score, Energy meets Energy : Abstract: Denoising and score estimation have long been known to be linked via the classical Tweedie's formula. In this work, we first extend the latter to a wider range of distributions often called ...
Deep learning methods for inverse problems using connections between proximal operators and Hamilton-Jacobi equations : Abstract: Inverse problems are important mathematical problems that seek to recover model parameters from noisy data. Since inverse problems are often ill-posed, they require regularization or incorpo...
A Test of Lookahead Bias in LLM Forecasts : Abstract: We develop a statistical test to detect lookahead bias in economic forecasts generated by large language models (LLMs). Using state-of-the-art pre-training data detection techniques, we esti...
Integrating Domain Knowledge for Financial QA: A Multi-Retriever RAG Approach with LLMs : Abstract: This research project addresses the errors of financial numerical reasoning Question Answering (QA) tasks due to the lack of domain knowledge in finance. Despite recent advances in Large Lan...
Tensor Computing Interface: An Application-Oriented, Lightweight Interface for Portable High-Performance Tensor Network Applications : Abstract: Tensor networks (TNs) are a central computational tool in quantum science and artificial intelligence. However, the lack of unified software interface across tensor-computing frameworks seve...
Stationary Reweighting Yields Local Convergence of Soft Fitted Q-Iteration : Abstract: Fitted Q-iteration (FQI) and its entropy-regularized variant, soft FQI, are central tools for value-based model-free offline reinforcement learning, but can behave poorly under function appr...
Assessing generative modeling approaches for free energy estimates in condensed matter : Abstract: The accurate estimation of free energy differences between two states is a long-standing challenge in molecular simulations. Traditional approaches generally rely on sampling multiple interm...
Statistical Guarantees in the Search for Less Discriminatory Algorithms : Abstract: Recent scholarship has argued that firms building data-driven decision systems in high-stakes domains like employment, credit, and housing should search for "less discriminatory algorithms" ...
Implicit geometric regularization in flow matching via density weighted Stein operators : Abstract: Flow Matching (FM) has emerged as a powerful paradigm for continuous normalizing flows, yet standard FM implicitly performs an unweighted $L^2$ regression over the entire ambient space. In h...
Exploring the Potential of Spiking Neural Networks in UWB Channel Estimation : Abstract: Although existing deep learning-based Ultra-Wide Band (UWB) channel estimation methods achieve high accuracy, their computational intensity clashes sharply with the resource constraints of l...
Fundamental limits for weighted empirical approximations of tilted distributions : Abstract: Consider the task of generating samples from a tilted distribution of a random vector whose underlying distribution is unknown, but samples from it are available. This finds applications in ...
RepetitionCurse: Measuring and Understanding Router Imbalance in Mixture-of-Experts LLMs under DoS Stress : Abstract: Mixture-of-Experts architectures have become the standard for scaling large language models due to their superior parameter efficiency. To accommodate the growing number of experts in practi...
Policy Mirror Descent with Temporal Difference Learning: Sample Complexity under Online Markov Data : Abstract: This paper studies the policy mirror descent (PMD) method, which is a general policy optimization framework in reinforcement learning and can cover a wide range of policy gradient methods by...
Training a Huggingface Model on AWS Sagemaker (Without Tears) : Abstract: The development of Large Language Models (LLMs) has primarily been driven by resource-rich research groups and industry partners. Due to the lack of on-premise computing resources required f...
Constructive Approximation of Random Process via Stochastic Interpolation Neural Network Operators : Abstract: In this paper, we construct a class of stochastic interpolation neural network operators (SINNOs) with random coefficients activated by sigmoidal functions. We establish their boundedness, i...
Quantitative Understanding of PDF Fits and their Uncertainties : Abstract: Parton Distribution Functions (PDFs) play a central role in describing experimental data at colliders and provide insight into the structure of nucleons. As the LHC enters an era of high-pre...
Score-based sampling without diffusions: Guidance from a simple and modular scheme : Abstract: Sampling based on score diffusions has led to striking empirical results, and has attracted considerable attention from various research communities. It depends on availability of (approxima...
Deep Global Clustering for Hyperspectral Image Segmentation: Concepts, Applications, and Open Challenges : Abstract: Hyperspectral imaging (HSI) analysis faces computational bottlenecks due to massive data volumes that exceed available memory. While foundation models pre-trained on large remote sensing dat...
Variational Quantum Brushes : Abstract: Quantum brushes are computational arts software introduced by Ferreira et al (2025) that leverage quantum behavior to generate novel artistic effects. In this outreach paper, we introduce th...
Guiding a Diffusion Transformer with the Internal Dynamics of Itself : Abstract: The diffusion model presents a powerful ability to capture the entire (conditional) data distribution. However, due to the lack of sufficient training and data to learn to cover low-probabil...
Medical Image Classification on Imbalanced Data Using ProGAN and SMA-Optimized ResNet: Application to COVID-19 : Abstract: The challenge of imbalanced data is prominent in medical image classification. This challenge arises when there is a significant disparity in the number of images belonging to a particular c...
MotivNet: Evolving Meta-Sapiens into an Emotionally Intelligent Foundation Model : Abstract: In this paper, we introduce MotivNet, a generalizable facial emotion recognition model for robust real-world application. Current state-of-the-art FER models tend to have weak generalization...
Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning : Abstract: A fine-grained data recipe is crucial for pre-training large language models, as it can significantly enhance training efficiency and model performance. One important ingredient in the recip...
Fast reconstruction-based ROI triggering via anomaly detection in the CYGNO optical TPC : Abstract: Optical-readout Time Projection Chambers (TPCs) produce megapixel-scale images whose fine-grained topological information is essential for rare-event searches, but whose size challenges real...
MaRCA: Multi-Agent Reinforcement Learning for Dynamic Computation Allocation in Large-Scale Recommender Systems : Abstract: Modern recommender systems face significant computational challenges due to growing model complexity and traffic scale, making efficient computation allocation critical for maximizing busine...
Topological Spatial Graph Coarsening : Abstract: Spatial graphs are particular graphs for which the nodes are localized in space (e.g., public transport network, molecules, branching biological structures). In this work, we consider the pr...
OptiVote: Non-Coherent FSO Over-the-Air Majority Vote for Communication-Efficient Distributed Federated Learning in Space Data Centers : Abstract: The rapid deployment of mega-constellations is driving the long-term vision of space data centers (SDCs), where interconnected satellites form in-orbit distributed computing and learning inf...
Deep Learning in Geotechnical Engineering: A Critical Assessment of PINNs and Operator Learning : Abstract: Deep learning methods -- physics-informed neural networks (PINNs), deep operator networks (DeepONet), and graph network simulators (GNS) -- are increasingly proposed for geotechnical problem...
Implicit score matching meets denoising score matching: improved rates of convergence and log-density Hessian estimation : Abstract: We study the problem of estimating the score function using both implicit score matching and denoising score matching. Assuming that the data distribution exhibiting a low-dimensional struct...
Virasoro Symmetry in Neural Network Field Theories : Abstract: Neural Network Field Theories (NN-FTs) can realize global conformal symmetries via embedding space architectures. These models describe Generalized Free Fields (GFFs) in the infinite width l...
Towards mechanistic understanding in a data-driven weather model: internal activations reveal interpretable physical features : Abstract: Large data-driven physics models like DeepMind's weather model GraphCast have empirically succeeded in parameterizing time operators for complex dynamical systems with an accuracy reaching o...
Spectral and Spatial Graph Learning for Multispectral Solar Image Compression : Abstract: High-fidelity compression of multispectral solar imagery remains challenging for space missions, where limited bandwidth must be balanced against preserving fine spectral and spatial details...
Improving the stability of the covariance-controlled adaptive Langevin thermostat for large-scale Bayesian sampling : Abstract: Stochastic gradient Langevin dynamics and its variants approximate the likelihood of an entire dataset, via random (and typically much smaller) subsets, in the setting of Bayesian sampling. ...
A Graph Neural Network with Auxiliary Task Learning for Missing PMU Data Reconstruction : Abstract: In wide-area measurement systems (WAMS), phasor measurement unit (PMU) measurement is prone to data missingness due to hardware failures, communication delays, and cyber-attacks. Existing da...
Probabilistic Computers for Neural Quantum States : Abstract: Neural quantum states efficiently represent many-body wavefunctions with neural networks, but the cost of Monte Carlo sampling limits their scaling to large system sizes. Here we address thi...
Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning : Abstract: We propose a novel framework for risk-sensitive reinforcement learning (RSRL) that incorporates robustness against transition uncertainty. We define two distinct yet coupled risk measures: a...
MultiRisk: Multiple Risk Control via Iterative Score Thresholding : Abstract: As generative AI systems are increasingly deployed in real-world applications, regulating multiple dimensions of model behavior has become essential. We focus on test-time filtering: a light...
3D Semantic Segmentation for Post-Disaster Assessment : Abstract: The increasing frequency of natural disasters poses severe threats to human lives and leads to substantial economic losses. While 3D semantic segmentation is crucial for post-disaster assess...
Soliton profiles: Classical Numerical Schemes vs. Neural Network - Based Solvers : Abstract: We present a comparative study of classical numerical solvers, such as Petviashvili's method or finite difference with Newton iterations, and neural network-based methods for computing groun...
A New Decomposition Paradigm for Graph-structured Nonlinear Programs via Message Passing : Abstract: We study finite-sum nonlinear programs whose decision variables interact locally according to a graph or hypergraph. We propose MP-Jacobi (Message Passing-Jacobi), a graph-compliant decentra...
Fairness-Aware Insurance Pricing: A Multi-Objective Optimization Approach : Abstract: Machine learning improves predictive accuracy in insurance pricing but exacerbates trade-offs between competing fairness criteria across different discrimination measures, challenging regula...
Sparse Offline Reinforcement Learning with Corruption Robustness : Abstract: We investigate robustness to strong data corruption in offline sparse reinforcement learning (RL). In our setting, an adversary may arbitrarily perturb a fraction of the collected trajectori...
Projection-based Adversarial Attack using Physics-in-the-Loop Optimization for Monocular Depth Estimation : Abstract: Deep neural networks (DNNs) remain vulnerable to adversarial attacks that cause misclassification when specific perturbations are added to input images. This vulnerability also threatens the...
Nonlinear Noise2Noise for Efficient Monte Carlo Denoiser Training : Abstract: The Noise2Noise method allows for training machine learning-based denoisers with pairs of input and target images where both the input and target can be noisy. This removes the need for trai...
Limits of quantum generative models with classical sampling hardness : Abstract: Sampling tasks have been successful in establishing quantum advantages both in theory and experiments. This has fueled the use of quantum computers for generative modeling to create samples ...
Learning Temporally Consistent Turbulence Between Sparse Snapshots via Diffusion Models : Abstract: We investigate the statistical accuracy of temporally interpolated spatiotemporal flow sequences between sparse, decorrelated snapshots of turbulent flow fields using conditional Denoising D...
Are First-Order Diffusion Samplers Really Slower? A Fast Forward-Value Approach : Abstract: Higher-order ODE solvers have become a standard tool for accelerating diffusion probabilistic model (DPM) sampling, motivating the widespread view that first-order methods are inherently slo...
Adaptive Dependency-aware Prompt Optimization Framework for Multi-Step LLM Pipeline : Abstract: Multi-step LLM pipelines invoke large language models multiple times in a structured sequence and can effectively solve complex tasks, but their performance heavily depends on the prompts us...
ProDM: Synthetic Reality-driven Property-aware Progressive Diffusion Model for Coronary Calcium Motion Correction in Non-gated Chest CT : Abstract: Coronary artery calcium (CAC) scoring from chest CT is a well-established tool to stratify and refine clinical cardiovascular disease risk estimation. CAC quantification relies on the accura...
Basic Inequalities for First-Order Optimization with Applications to Statistical Risk Analysis : Abstract: We introduce \textit{basic inequalities} for first-order iterative optimization algorithms, forming a simple and versatile framework that connects implicit and explicit regularization. While...
Convergence of the generalization error for deep gradient flow methods for PDEs : Abstract: The aim of this article is to provide a firm mathematical foundation for the application of deep gradient flow methods (DGFMs) for the solution of (high-dimensional) partial differential equ...
Reliable and Resilient Collective Communication Library for LLM Training and Serving : Abstract: Modern ML training and inference now span tens to tens of thousands of GPUs, where network faults can waste 10--15\% of GPU hours due to slow recovery. Common network errors and link fluctua...
Optimal Approximation -- Smoothness Tradeoffs for Soft-Max Functions : Abstract: A soft-max function has two main efficiency measures: (1) approximation - which corresponds to how well it approximates the maximum function, (2) smoothness - which shows how sensitive it is...
Active Learning with Neural Networks: Insights from Nonparametric Statistics : Abstract: Deep neural networks have great representation power, but typically require large numbers of training examples. This motivates deep active learning methods that can significantly reduce the ...
The Power of Preconditioning in Overparameterized Low-Rank Matrix Sensing : Abstract: We propose $\textsf{ScaledGD($λ$)}$, a preconditioned gradient descent method to tackle the low-rank matrix sensing problem when the true rank is unknown, and when the matrix is possibly ill...
HiGen: Hierarchical Graph Generative Networks : Abstract: Most real-world graphs exhibit a hierarchical structure, which is often overlooked by existing graph generation methods. To address this limitation, we propose a novel graph generative netwo...
Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling : Abstract: In the rapidly evolving field of deep learning, the demand for models that are both expressive and computationally efficient has never been more critical. This paper introduces Orchid, a nov...
Benchmarking LLMs for Fine-Grained Code Review with Enriched Context in Practice : Abstract: Code review is a cornerstone of software quality assurance, and recent advances in Large Language Models (LLMs) have shown promise in its automation. However, existing benchmarks for LLM-bas...
Network Traffic Analysis with Process Mining: The UPSIDE Case Study : Abstract: Online gaming is a popular activity involving the adoption of complex systems and network infrastructures. The relevance of gaming, which generates large amounts of market revenue, drove res...
A Comprehensive Study of Deep Learning Model Fixing Approaches : Abstract: Deep Learning (DL) has been widely adopted in diverse industrial domains, including autonomous driving, intelligent healthcare, and aided programming. Like traditional software, DL systems a...
A Review of Diffusion-based Simulation-Based Inference: Foundations and Applications in Non-Ideal Data Scenarios : Abstract: For complex simulation problems, inferring parameters of scientific interest often precludes the use of classical likelihood-based techniques due to intractable likelihood functions. Simulat...
Learning Coupled System Dynamics under Incomplete Physical Constraints and Missing Data : Abstract: Advances in data acquisition and computational methods have accelerated the use of differential equation based modelling for complex systems. Such systems are often described by coupled (or ...
Neural Optimal Design of Experiment for Inverse Problems : Abstract: We introduce Neural Optimal Design of Experiments, a learning-based framework for optimal experimental design in inverse problems that avoids classical bilevel optimization and indirect spar...
Exploring Cumulative Effects in Survival Data Using Deep Learning Networks : Abstract: In epidemiological research, modeling the cumulative effects of time-dependent exposures on survival outcomes presents a challenge due to their intricate temporal dynamics. Conventional spli...
A Granular Grassmannian Clustering Framework via the Schubert Variety of Best Fit : Abstract: In many classification and clustering tasks, it is useful to compute a geometric representative for a dataset or a cluster, such as a mean or median. When datasets are represented by subspac...
TabMixNN: A Unified Deep Learning Framework for Structural Mixed Effects Modeling on Tabular Data : Abstract: We present TabMixNN, a flexible PyTorch-based deep learning framework that synthesizes classical mixed-effects modeling with modern neural network architectures for tabular data analysis. Ta...
MS-SSM: A Multi-Scale State Space Model for Efficient Sequence Modeling : Abstract: State-space models (SSMs) have recently attention as an efficient alternative to computationally expensive attention-based models for sequence modeling. They rely on linear recurrences to in...
Exploiting the Prior of Generative Time Series Imputation : Abstract: Time series imputation, i.e., filling the missing values of a time recording, finds various applications in electricity, finance, and weather modelling. Previous methods have introduced gene...
Trellis: Learning to Compress Key-Value Memory in Attention Models : Abstract: Transformers, while powerful, suffer from quadratic computational complexity and the ever-growing Key-Value (KV) cache of the attention mechanism. This paper introduces Trellis, a novel Tran...
Flow Matching Neural Processes : Abstract: Neural processes (NPs) are a class of models that learn stochastic processes directly from data and can be used for inference, sampling and conditional sampling. We introduce a new NP model ...
Yggdrasil: Bridging Dynamic Speculation and Static Runtime for Latency-Optimal Tree-Based LLM Decoding : Abstract: Speculative decoding improves LLM inference by generating and verifying multiple tokens in parallel, but existing systems suffer from suboptimal performance due to a mismatch between dynamic...
Max-Entropy Reinforcement Learning with Flow Matching and A Case Study on LQR : Abstract: Soft actor-critic (SAC) is a popular algorithm for max-entropy reinforcement learning. In practice, the energy-based policies in SAC are often approximated using simple policy classes for ef...
Rethinking Dense Linear Transformations: Stagewise Pairwise Mixing (SPM) for Near-Linear Training in Neural Networks : Abstract: Dense linear layers are a dominant source of computational and parametric cost in modern machine learning models, despite their quadratic complexity and often being misaligned with the compo...
Constraint Breeds Generalization: Temporal Dynamics as an Inductive Bias : Abstract: Conventional deep learning prioritizes unconstrained optimization, yet biological systems operate under strict metabolic constraints. We propose that these physical constraints shape dynamic...
Improved Balanced Classification with Theoretically Grounded Loss Functions : Abstract: The balanced loss is a widely adopted objective for multi-class classification under class imbalance. By assigning equal importance to all classes, regardless of their frequency, it promotes...
DivQAT: Enhancing Robustness of Quantized Convolutional Neural Networks against Model Extraction Attacks : Abstract: Convolutional Neural Networks (CNNs) and their quantized counterparts are vulnerable to extraction attacks, posing a significant threat of IP theft. Yet, the robustness of quantized models a...
Assured Autonomy: How Operations Research Powers and Orchestrates Generative AI Systems : Abstract: Generative artificial intelligence (GenAI) is shifting from conversational assistants toward agentic systems -- autonomous decision-making systems that sense, decide, and act within operatio...
Information-Theoretic Quality Metric of Low-Dimensional Embeddings : Abstract: In this work we study the quality of low-dimensional embeddings from an explicitly information-theoretic perspective. We begin by noting that classical evaluation metrics such as stress, ran...
Hyperspherical Graph Representation Learning via Adaptive Neighbor-Mean Alignment and Uniformity : Abstract: Graph representation learning (GRL) aims to encode structural and semantic dependencies of graph-structured data into low-dimensional embeddings. However, existing GRL methods often rely on ...
How and Why LLMs Generalize: A Fine-Grained Analysis of LLM Reasoning from Cognitive Behaviors to Low-Level Patterns : Abstract: Large Language Models (LLMs) display strikingly different generalization behaviors: supervised fine-tuning (SFT) often narrows capability, whereas reinforcement-learning (RL) tuning tends to...
Time-varying Mixing Matrix Design for Energy-efficient Decentralized Federated Learning : Abstract: We consider the design of mixing matrices to minimize the operation cost for decentralized federated learning (DFL) in wireless networks, with focus on minimizing the maximum per-node energy...
Multi-Scenario Highway Lane-Change Intention Prediction: A Temporal Physics-Informed Multi-Modal Framework : Abstract: Lane-change intention prediction is safety-critical for autonomous driving and ADAS, but remains difficult in naturalistic traffic due to noisy kinematics, severe class imbalance, and limite...
Autoregressivity in the Latent Space of a GP-VAE Language Model: An Empirical Ablation Study : Abstract: This paper provides an ablation-based analysis of latent autoregression in GP-VAE models, building upon our previous work introducing the architecture. Language models typically rely on an a...
Colorful Pinball: Density-Weighted Quantile Regression for Conditional Guarantee of Conformal Prediction : Abstract: While conformal prediction provides robust marginal coverage guarantees, achieving reliable conditional coverage for specific inputs remains challenging. Although exact distribution-free con...
Paired Seed Evaluation: Statistical Reliability for Learning-Based Simulators : Abstract: Machine learning systems appear stochastic but are deterministically random, as seeded pseudorandom number generators produce identical realisations across executions. Learning-based simulat...
Micro-Macro Tensor Neural Surrogates for Uncertainty Quantification in Collisional Plasma : Abstract: Plasma kinetic equations exhibit pronounced sensitivity to microscopic perturbations in model parameters and data, making reliable and efficient uncertainty quantification (UQ) essential for...
Early Prediction of Sepsis using Heart Rate Signals and Genetic Optimized LSTM Algorithm : Abstract: Sepsis, characterized by a dysregulated immune response to infection, results in significant mortality, morbidity, and healthcare costs. The timely prediction of sepsis progression is crucia...
Lifting Vision: Ground to Aerial Localization with Reasoning Guided Planning : Abstract: Multimodal intelligence development recently show strong progress in visual understanding and high level reasoning. Though, most reasoning system still reply on textual information as the ma...
Efficient Inference for Inverse Reinforcement Learning and Dynamic Discrete Choice Models : Abstract: Inverse reinforcement learning (IRL) and dynamic discrete choice (DDC) models explain sequential decision-making by recovering reward functions that rationalize observed behavior. Flexible I...
Sparse classification with positive-confidence data in high dimensions : Abstract: High-dimensional learning problems, where the number of features exceeds the sample size, often require sparse regularization for effective prediction and variable selection. While establish...
Adaptive Learning Guided by Bias-Noise-Alignment Diagnostics : Abstract: Learning systems deployed in nonstationary and safety-critical environments often suffer from instability, slow convergence, or brittle adaptation when learning dynamics evolve over time. Wh...
Generative forecasting with joint probability models : Abstract: Chaotic dynamical systems exhibit strong sensitivity to initial conditions and often contain unresolved multiscale processes, making deterministic forecasting fundamentally limited. Generati...
Generalising E-prop to Deep Networks : Abstract: Recurrent networks are typically trained with backpropagation through time (BPTT). However, BPTT requires storing the history of all states in the network and then replaying them sequentiall...
From Perception to Punchline: Empowering VLM with the Art of In-the-wild Meme : Abstract: Generating humorous memes is a challenging multimodal task that moves beyond direct image-to-caption supervision. It requires a nuanced reasoning over visual content, contextual cues, and su...
CPR: Causal Physiological Representation Learning for Robust ECG Analysis under Distribution Shifts : Abstract: Deep learning models for Electrocardiogram (ECG) diagnosis have achieved remarkable accuracy but exhibit fragility against adversarial perturbations, particularly Smooth Adversarial Perturba...
A Scalable Framework for logP Prediction: From Terabyte-Scale Data Integration to Interpretable Ensemble Modeling : Abstract: This study presents a large-scale predictive modeling framework for logP prediction using 426850 bioactive compounds rigorously curated from the intersection of three authoritative chemical ...
HeteroHBA: A Generative Structure-Manipulating Backdoor Attack on Heterogeneous Graphs : Abstract: Heterogeneous graph neural networks (HGNNs) have achieved strong performance in many real-world applications, yet targeted backdoor poisoning on heterogeneous graphs remains less studied. We...
Mobility-Assisted Decentralized Federated Learning: Convergence Analysis and A Data-Driven Approach : Abstract: Decentralized Federated Learning (DFL) has emerged as a privacy-preserving machine learning paradigm that enables collaborative training among users without relying on a central server. Howe...
Causal Discovery with Mixed Latent Confounding via Precision Decomposition : Abstract: We study causal discovery from observational data in linear Gaussian systems affected by \emph{mixed latent confounding}, where some unobserved factors act broadly across many variables whil...
FPGA Co-Design for Efficient N:M Sparse and Quantized Model Inference : Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of language processing tasks. However, this success comes at the cost of substantial computation and...
From Trial to Deployment: A SEM Analysis of Traveler Adoptions to Fully Operational Autonomous Taxis : Abstract: Autonomous taxi services represent a transformative advancement in urban mobility, offering safety, efficiency, and round-the-clock operations. While existing literature has explored user ac...
Gradient Descent as Implicit EM in Distance-Based Neural Models : Abstract: Neural networks trained with standard objectives exhibit behaviors characteristic of probabilistic inference: soft clustering, prototype specialization, and Bayesian uncertainty tracking. Th...
Self-Supervised Neural Architecture Search for Multimodal Deep Neural Networks : Abstract: Neural architecture search (NAS), which automates the architectural design process of deep neural networks (DNN), has attracted increasing attention. Multimodal DNNs that necessitate feature...
DTI-GP: Bayesian operations for drug-target interactions using deep kernel Gaussian processes : Abstract: Precise probabilistic information about drug-target interaction (DTI) predictions is vital for understanding limitations and boosting predictive performance. Gaussian processes (GP) offer a ...
Unregularized Linear Convergence in Zero-Sum Game from Preference Feedback : Abstract: Aligning large language models (LLMs) with human preferences has proven effective for enhancing model capabilities, yet standard preference modeling using the Bradley-Terry model assumes tra...
Discovering Coordinated Joint Options via Inter-Agent Relative Dynamics : Abstract: Temporally extended actions improve the ability to explore and plan in single-agent settings. In multi-agent settings, the exponential growth of the joint state space with the number of agen...
Renormalization Group Guided Tensor Network Structure Search : Abstract: Tensor network structure search (TN-SS) aims to automatically discover optimal network topologies and rank configurations for efficient tensor decomposition in high-dimensional data represen...
VLA-RAIL: A Real-Time Asynchronous Inference Linker for VLA Models and Robots : Abstract: Vision-Language-Action (VLA) models have achieved remarkable breakthroughs in robotics, with the action chunk playing a dominant role in these advances. Given the real-time and continuous na...
An Adaptive, Disentangled Representation for Multidimensional MRI Reconstruction : Abstract: We present a new approach for representing and reconstructing multidimensional magnetic resonance imaging (MRI) data. Our method builds on a novel, learned feature-based image representation...
R-Debater: Retrieval-Augmented Debate Generation through Argumentative Memory : Abstract: We present R-Debater, an agentic framework for generating multi-turn debates built on argumentative memory. Grounded in rhetoric and memory studies, the system views debate as a process of r...
Nested Learning: The Illusion of Deep Learning Architectures : Abstract: Despite the recent progresses, particularly in developing Language Models, there are fundamental challenges and unanswered questions about how such models can continually learn/memorize, sel...
Evolving, Not Training: Zero-Shot Reasoning Segmentation via Evolutionary Prompting : Abstract: Reasoning Segmentation requires models to interpret complex, context-dependent linguistic queries to achieve pixel-level localization. Current dominant approaches rely heavily on Supervised ...
BandiK: Efficient Multi-Task Decomposition Using a Multi-Bandit Framework : Abstract: The challenge of effectively transferring knowledge across multiple tasks is of critical importance and is also present in downstream tasks with foundation models. However, the nature of tra...
LSRE: Latent Semantic Rule Encoding for Real-Time Semantic Risk Detection in Autonomous Driving : Abstract: Real-world autonomous driving must adhere to complex human social rules that extend beyond legally codified traffic regulations. Many of these semantic constraints, such as yielding to emerg...
AstroReview: An LLM-driven Multi-Agent Framework for Telescope Proposal Peer Review and Refinement : Abstract: Competitive access to modern observatories has intensified as proposal volumes outpace available telescope time, making timely, consistent, and transparent peer review a critical bottleneck ...
Dream2Flow: Bridging Video Generation and Open-World Manipulation with 3D Object Flow : Abstract: Generative video modeling has emerged as a compelling tool to zero-shot reason about plausible physical interactions for open-world manipulation. Yet, it remains a challenge to translate suc...
HiGR: Efficient Generative Slate Recommendation via Hierarchical Planning and Multi-Objective Preference Alignment : Abstract: Slate recommendation, where users are presented with a ranked list of items simultaneously, is widely adopted in online platforms. Recent advances in generative models have shown promise in ...
LeanCat: A Benchmark Suite for Formal Category Theory in Lean (Part I: 1-Categories) : Abstract: Large language models (LLMs) have made rapid progress in formal theorem proving, yet current benchmarks under-measure the kind of abstraction and library-mediated reasoning that organizes mo...
Practising responsibility: Ethics in NLP as a hands-on course : Abstract: As Natural Language Processing (NLP) systems become more pervasive, integrating ethical considerations into NLP education has become essential. However, this presents inherent challenges in ...
Video and Language Alignment in 2D Systems for 3D Multi-object Scenes with Multi-Information Derivative-Free Control : Abstract: Cross-modal systems trained on 2D visual inputs are presented with a dimensional shift when processing 3D scenes. An in-scene camera bridges the dimensionality gap but requires learning a co...
PrivacyBench: A Conversational Benchmark for Evaluating Privacy in Personalized AI : Abstract: Personalized AI agents rely on access to a user's digital footprint, which often includes sensitive data from private emails, chats and purchase histories. Yet this access creates a fundamen...
Big AI is accelerating the metacrisis: What can we do? : Abstract: The world is in the grip of ecological, meaning, and language crises which are converging into a metacrisis. Big AI is accelerating them all. Language engineers are playing a central role, p...
Encyclo-K: Evaluating LLMs with Dynamically Composed Knowledge Statements : Abstract: Benchmarks play a crucial role in tracking the rapid advancement of large language models (LLMs) and identifying their capability boundaries. However, existing benchmarks predominantly curat...
mHC: Manifold-Constrained Hyper-Connections : Abstract: Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and ...
AI-Driven Cloud Resource Optimization for Multi-Cluster Environments : Abstract: Modern cloud-native systems increasingly rely on multi-cluster deployments to support scalability, resilience, and geographic distribution. However, existing resource management approaches r...
RAIR: A Rule-Aware Benchmark Uniting Challenging Long-Tail and Visual Salience Subset for E-commerce Relevance Assessment : Abstract: Search relevance plays a central role in web e-commerce. While large language models (LLMs) have shown significant results on relevance task, existing benchmarks lack sufficient complexity f...
HaineiFRDM: Explore Diffusion to Restore Defects in Fast-Movement Films : Abstract: Existing open-source film restoration methods show limited performance compared to commercial methods due to training with low-quality synthetic data and employing noisy optical flows. In ad...
MSACL: Multi-Step Actor-Critic Learning with Lyapunov Certificates for Exponentially Stabilizing Control : Abstract: Achieving provable stability in model-free reinforcement learning (RL) remains a challenge, particularly in balancing exploration with rigorous safety. This article introduces MSACL, a frame...
Semi-overlapping Multi-bandit Best Arm Identification for Sequential Support Network Learning : Abstract: Many modern AI and ML problems require evaluating partners' contributions through shared yet asymmetric, computationally intensive processes and the simultaneous selection of the most benefi...
ShowUI-$\pi$: Flow-based Generative Models as GUI Dexterous Hands : Abstract: Building intelligent agents capable of dexterous manipulation is essential for achieving human-like automation in both robotics and digital environments. However, existing GUI agents rely on...
The Impact of LLMs on Online News Consumption and Production : Abstract: Large language models (LLMs) change how consumers acquire information online; their bots also crawl news publishers' websites for training data and to answer consumer queries; and they provi...
Evaluating the Impact of Compression Techniques on the Robustness of CNNs under Natural Corruptions : Abstract: Compressed deep learning models are crucial for deploying computer vision systems on resource-constrained devices. However, model compression may affect robustness, especially under natural ...
SymSeqBench: a unified framework for the generation and analysis of rule-based symbolic sequences and datasets : Abstract: Sequential structure is a key feature of multiple domains of natural cognition and behavior, such as language, movement and decision-making. Likewise, it is also a central property of tasks ...
A Modal Logic for Possibilistic Reasoning with Fuzzy Formal Contexts : Abstract: We introduce a two-sort weighted modal logic for possibilistic reasoning with fuzzy formal contexts. The syntax of the logic includes two types of weighted modal operators corresponding to c...
DarkEQA: Benchmarking Vision-Language Models for Embodied Question Answering in Low-Light Indoor Environments : Abstract: Vision Language Models (VLMs) are increasingly adopted as central reasoning modules for embodied agents. Existing benchmarks evaluate their capabilities under ideal, well-lit conditions, yet...
Classifying long legal documents using short random chunks : Abstract: Classifying legal documents is a challenge, besides their specialized vocabulary, sometimes they can be very long. This means that feeding full documents to a Transformers-based models for c...
Modeling Language as a Sequence of Thoughts : Abstract: Transformer language models can generate strikingly natural text by modeling language as a sequence of tokens. Yet, by relying primarily on surface-level co-occurrence statistics, they fail ...
Generative Classifiers Avoid Shortcut Solutions : Abstract: Discriminative approaches to classification often learn shortcuts that hold in-distribution but fail even under minor distribution shift. This failure mode stems from an overreliance on feat...
AdaGReS:Adaptive Greedy Context Selection via Redundancy-Aware Scoring for Token-Budgeted RAG : Abstract: Retrieval-augmented generation (RAG) is highly sensitive to the quality of selected context, yet standard top-k retrieval often returns redundant or near-duplicate chunks that waste token bu...
Vulcan: Instance-Optimal Systems Heuristics Through LLM-Driven Search : Abstract: Resource-management tasks in modern operating and distributed systems continue to rely primarily on hand-designed heuristics for tasks such as scheduling, caching, or active queue management...
Coordinated Humanoid Manipulation with Choice Policies : Abstract: Humanoid robots hold great promise for operating in human-centric environments, yet achieving robust whole-body coordination across the head, hands, and legs remains a major challenge. We pr...
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time : Abstract: We present SpaceTimePilot, a video diffusion model that disentangles space and time for controllable generative rendering. Given a monocular video, SpaceTimePilot can independently alter the...
FEDSTR: Money-In AI-Out | A Decentralized Marketplace for Federated Learning and LLM Training on the NOSTR Protocol : Abstract: The NOSTR is a communication protocol for the social web, based on the w3c websockets standard. Although it is still in its infancy, it is well known as a social media protocol, with thousan...
LTLBench: Towards Benchmarks for Evaluating Temporal Logic Reasoning in Large Language Models : Abstract: Temporal Reasoning (TR) is a critical ability for LLMs to understand and reason over temporal information and relationships between events. To study the TR ability in LLMs, prior works provi...
Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems : Abstract: This paper presents a novel online transfer learning approach in state-based potential games (TL-SbPGs) for distributed self-optimization in manufacturing systems. The approach targets pract...
Model Merging in LLMs, MLLMs, and Beyond: Methods, Theories, Applications and Opportunities : Abstract: Model merging is an efficient empowerment technique in the machine learning community that does not require the collection of raw training data and does not require expensive computation. As...
A Systematic Survey on Large Language Models for Algorithm Design : Abstract: Algorithm design is crucial for effective problem-solving across various domains. The advent of Large Language Models (LLMs) has notably enhanced the automation and innovation within this fi...
Bielik 7B v0.1: A Polish Language Model -- Development, Insights, and Evaluation : Abstract: We introduce Bielik 7B v0.1, a 7-billion-parameter generative text model for Polish language processing. Trained on curated Polish corpora, this model addresses key challenges in language mo...
Probing the Limits of Compressive Memory: A Study of Infini-Attention in Small-Scale Pretraining : Abstract: This study investigates small-scale pretraining for Small Language Models (SLMs) to enable efficient use of limited data and compute, improve accessibility in low-resource settings and reduc...
Breaking Audio Large Language Models by Attacking Only the Encoder: A Universal Targeted Latent-Space Audio Attack : Abstract: Audio-language models combine audio encoders with large language models to enable multimodal reasoning, but they also introduce new security vulnerabilities. We propose a universal targeted ...
Autoregressive long-horizon prediction of plasma edge dynamics : Abstract: Accurate modeling of scrape-off layer (SOL) and divertor-edge dynamics is vital for designing plasma-facing components in fusion devices. High-fidelity edge fluid/neutral codes such as SOLPS...
How Large Language Models Systematically Misrepresent American Climate Opinions : Abstract: Federal agencies and researchers increasingly use large language models to analyze and simulate public opinion. When AI mediates between the public and policymakers, accuracy across intersec...
Efficient Deep Learning for Short-Term Solar Irradiance Time Series Forecasting: A Benchmark Study in Ho Chi Minh City : Abstract: Reliable forecasting of Global Horizontal Irradiance (GHI) is essential for mitigating the variability of solar energy in power grids. This study presents a comprehensive benchmark of ten de...
A multimodal Transformer for InSAR-based ground deformation forecasting with cross-site generalization across Europe : Abstract: Near-real-time regional-scale monitoring of ground deformation is increasingly required to support urban planning, critical infrastructure management, and natural hazard mitigation. While In...
Interactive Machine Learning: From Theory to Scale : Abstract: Machine learning has achieved remarkable success across a wide range of applications, yet many of its most effective methods rely on access to large amounts of labeled data or extensive onli...
Improving Multi-step RAG with Hypergraph-based Memory for Long-Context Complex Relational Modeling : Abstract: Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reason...
An Comparative Analysis about KYC on a Recommendation System Toward Agentic Recommendation System : Abstract: This research presents a cutting-edge recommendation system utilizing agentic AI for KYC (Know Your Customer in the financial domain), and its evaluation across five distinct content vertica...
Physics-informed Graph Neural Networks for Operational Flood Modeling : Abstract: Flood models inform strategic disaster management by simulating the spatiotemporal hydrodynamics of flooding. While physics-based numerical flood models are accurate, their substantial compu...
Efficient Context Scaling with LongCat ZigZag Attention : Abstract: We introduce LongCat ZigZag Attention (LoZA), which is a sparse attention scheme designed to transform any existing full-attention models into sparse versions with rather limited compute bud...
A Community-Aware Framework for Influence Maximization with Explicit Accounting for Inter-Community Influence : Abstract: Influence Maximization (IM) seeks to identify a small set of seed nodes in a social network to maximize expected information spread under a diffusion model. While community-based approaches ...
Causify DataFlow: A Framework For High-performance Machine Learning Stream Computing : Abstract: We present DataFlow, a computational framework for building, testing, and deploying high-performance machine learning systems on unbounded time-series data. Traditional data science workflow...
Coding With AI: From a Reflection on Industrial Practices to Future Computer Science and Software Engineering Education : Abstract: Recent advances in large language models (LLMs) have introduced new paradigms in software development, including vibe coding, AI-assisted coding, and agentic coding, fundamentally reshaping ...
MeLeMaD: Adaptive Malware Detection via Chunk-wise Feature Selection and Meta-Learning : Abstract: Confronting the substantial challenges of malware detection in cybersecurity necessitates solutions that are both robust and adaptable to the ever-evolving threat environment. The paper intr...
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process : Abstract: Despite the growing reasoning capabilities of recent large language models (LLMs), their internal mechanisms during the reasoning process remain underexplored. Prior approaches often rely on...
PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation : Abstract: Text-to-audio-video (T2AV) generation underpins a wide range of applications demanding realistic audio-visual content, including virtual reality, world modeling, gaming, and filmmaking. Howe...
Tracing the Heart's Pathways: ECG Representation Learning from a Cardiac Conduction Perspective : Abstract: The multi-lead electrocardiogram (ECG) stands as a cornerstone of cardiac diagnosis. Recent strides in electrocardiogram self-supervised learning (eSSL) have brightened prospects for enhanci...
TESO Tabu Enhanced Simulation Optimization for Noisy Black Box Problems : Abstract: Simulation optimization (SO) is frequently challenged by noisy evaluations, high computational costs, and complex, multimodal search landscapes. This paper introduces Tabu-Enhanced Simulatio...
iCLP: Large Language Model Reasoning with Implicit Cognition Latent Planning : Abstract: Large language models (LLMs), when guided by explicit textual plans, can perform reliable step-by-step reasoning during problem-solving. However, generating accurate and effective textual pl...
FUSE-RSVLM: Feature Fusion Vision-Language Model for Remote Sensing : Abstract: Large vision-language models (VLMs) exhibit strong performance across various tasks. However, these VLMs encounter significant challenges when applied to the remote sensing domain due to the...
RSAgent: Learning to Reason and Act for Text-Guided Segmentation via Multi-Turn Tool Invocations : Abstract: Text-guided object segmentation requires both cross-modal reasoning and pixel grounding abilities. Most recent methods treat text-guided segmentation as one-shot grounding, where the model p...
PipeFlow: Pipelined Processing and Motion-Aware Frame Selection for Long-Form Video Editing : Abstract: Long-form video editing poses unique challenges due to the exponential increase in the computational cost from joint editing and Denoising Diffusion Implicit Models (DDIM) inversion across e...
Kidney Exchange: Faster Parameterized Algorithms and Tighter Lower Bounds : Abstract: The kidney exchange mechanism allows many patient-donor pairs who are otherwise incompatible with each other to come together and exchange kidneys along a cycle. However, due to infrastructu...
Jailbreaking Attacks vs. Content Safety Filters: How Far Are We in the LLM Safety Arms Race? : Abstract: As large language models (LLMs) are increasingly deployed, ensuring their safe use is paramount. Jailbreaking, adversarial prompts that bypass model alignment to trigger harmful outputs, pre...
AHA: Aligning Large Audio-Language Models for Reasoning Hallucinations via Counterfactual Hard Negatives : Abstract: Although Large Audio-Language Models (LALMs) deliver state-of-the-art (SOTA) performance, they frequently suffer from hallucinations, e.g. generating text not grounded in the audio input. We...
Beyond Hallucinations: A Composite Score for Measuring Reliability in Open-Source Large Language Models : Abstract: Large Language Models (LLMs) like LLaMA, Mistral, and Gemma are increasingly used in decision-critical domains such as healthcare, law, and finance, yet their reliability remains uncertain. ...
Pathology Context Recalibration Network for Ocular Disease Recognition : Abstract: Pathology context and expert experience play significant roles in clinical ocular disease diagnosis. Although deep neural networks (DNNs) have good ocular disease recognition results, they o...
Random Multiplexing : Abstract: As wireless communication applications evolve from traditional multipath environments to high-mobility scenarios like unmanned aerial vehicles, multiplexing techniques have advanced accordin...
FedLiTeCAN : A Federated Lightweight Transformer for Fast and Robust CAN Bus Intrusion Detection : Abstract: This work implements a lightweight Transformer model for IDS in the domain of Connected and Autonomous Vehicles
Factorized Learning for Temporally Grounded Video-Language Models : Abstract: Recent video-language models have shown great potential for video understanding, but still struggle with accurate temporal grounding for event-level perception. We observe that two main fact...
Enhancing LLM Planning Capabilities through Intrinsic Self-Critique : Abstract: We demonstrate an approach for LLMs to critique their \emph{own} answers with the goal of enhancing their performance that leads to significant improvements over established planning benchma...
Multilevel Fair Allocation : Abstract: We introduce the concept of multilevel fair allocation of resources with tree-structured hierarchical relations among agents. While at each level it is possible to consider the problem local...
Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design : Abstract: Automated neural network architecture design remains a significant challenge in computer vision. Task diversity and computational constraints require both effective architectures and efficie...
OptRot: Mitigating Weight Outliers via Data-Free Rotations for Post-Training Quantization : Abstract: The presence of outliers in Large Language Models (LLMs) weights and activations makes them difficult to quantize. Recent work has leveraged rotations to mitigate these outliers. In this wor...
Unified Embodied VLM Reasoning with Robotic Action via Autoregressive Discretized Pre-training : Abstract: General-purpose robotic systems operating in open-world environments must achieve both broad generalization and high-precision action execution, a combination that remains challenging for ex...
GARDO: Reinforcing Diffusion Models without Reward Hacking : Abstract: Fine-tuning diffusion models via online reinforcement learning (RL) has shown great potential for enhancing text-to-image alignment. However, since precisely specifying a ground-truth object...
Developing controlled natural language for formal specification patterns using AI assistants : Abstract: Using an AI assistant, we developed a method for systematically constructing controlled natural language for requirements based on formal specification patterns containing logical attributes...
PointRAFT: 3D deep learning for high-throughput prediction of potato tuber weight from partial point clouds : Abstract: Potato yield is a key indicator for optimizing cultivation practices in agriculture. Potato yield can be estimated on harvesters using RGB-D cameras, which capture three-dimensional (3D) inf...
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation : Abstract: Multimodal Large Language Models (MLLMs) have made remarkable progress in video understanding. However, they suffer from a critical vulnerability: an over-reliance on language priors, which ...
One-shot synthesis of rare gastrointestinal lesions improves diagnostic accuracy and clinical training : Abstract: Rare gastrointestinal lesions are infrequently encountered in routine endoscopy, restricting the data available for developing reliable artificial intelligence (AI) models and training novic...
DRL-TH: Jointly Utilizing Temporal Graph Attention and Hierarchical Fusion for UGV Navigation in Crowded Environments : Abstract: Deep reinforcement learning (DRL) methods have demonstrated potential for autonomous navigation and obstacle avoidance of unmanned ground vehicles (UGVs) in crowded environments. Most existi...
Virtual-Eyes: Quantitative Validation of a Lung CT Quality-Control Pipeline for Foundation-Model Cancer Risk Prediction : Abstract: Robust preprocessing is rarely quantified in deep-learning pipelines for low-dose CT (LDCT) lung cancer screening. We develop and validate Virtual-Eyes, a clinically motivated 16-bit CT qual...
Generative Video Compression: Towards 0.01% Compression Rate for Video Transmission : Abstract: Whether a video can be compressed at an extreme compression rate as low as 0.01%? To this end, we achieve the compression rate as 0.02% at some cases by introducing Generative Video Compress...
Empower Low-Altitude Economy: A Reliability-Aware Dynamic Weighting Allocation for Multi-modal UAV Beam Prediction : Abstract: The low-altitude economy (LAE) is rapidly expanding driven by urban air mobility, logistics drones, and aerial sensing, while fast and accurate beam prediction in uncrewed aerial vehicles (U...
DermaVQA-DAS: Dermatology Assessment Schema (DAS) & Datasets for Closed-Ended Question Answering & Segmentation in Patient-Generated Dermatology Images : Abstract: Recent advances in dermatological image analysis have been driven by large-scale annotated datasets; however, most existing benchmarks focus on dermatoscopic images and lack patient-authored...
FedSecureFormer: A Fast, Federated and Secure Transformer Framework for Lightweight Intrusion Detection in Connected and Autonomous Vehicles : Abstract: This works presents an encoder-only transformer built with minimum layers for intrusion detection in the domain of Connected and Autonomous Vehicles using Federated Learning.
Skim-Aware Contrastive Learning for Efficient Document Representation : Abstract: Although transformer-based models have shown strong performance in word- and sentence-level tasks, effectively representing long documents, especially in fields like law and medicine, remain...
Tubular Riemannian Laplace Approximations for Bayesian Neural Networks : Abstract: Laplace approximations are among the simplest and most practical methods for approximate Bayesian inference in neural networks, yet their Euclidean formulation struggles with the highly anis...
FAST-IDS: A Fast Two-Stage Intrusion Detection System with Hybrid Compression for Real-Time Threat Detection in Connected and Autonomous Vehicles : Abstract: We have implemented a multi-stage IDS for CAVs that can be deployed to resourec-constrained environments after hybrid model compression.
Fast and Realistic Automated Scenario Simulations and Reporting for an Autonomous Racing Stack : Abstract: In this paper, we describe the automated simulation and reporting pipeline implemented for our autonomous racing stack, ur.autopilot. The backbone of the simulation is based on a high-fideli...
Comparing Approaches to Automatic Summarization in Less-Resourced Languages : Abstract: Automatic text summarization has achieved high performance in high-resourced languages like English, but comparatively less attention has been given to summarization in less-resourced langua...
PackKV: Reducing KV Cache Memory Footprint through LLM-Aware Lossy Compression : Abstract: Transformer-based large language models (LLMs) have demonstrated remarkable potential across a wide range of practical applications. However, long-context inference remains a significant cha...
Privacy-Preserving Semantic Communications via Multi-Task Learning and Adversarial Perturbations : Abstract: Semantic communications conveys task-relevant meaning rather than focusing solely on message reconstruction, improving bandwidth efficiency and robustness for next-generation wireless system...
Foundation models on the bridge: Semantic hazard detection and safety maneuvers for maritime autonomy with vision-language models : Abstract: The draft IMO MASS Code requires autonomous and remotely supervised maritime vessels to detect departures from their operational design domain, enter a predefined fallback that notifies the ...
F2IDiff: Real-world Image Super-resolution using Feature to Image Diffusion Foundation Model : Abstract: With the advent of Generative AI, Single Image Super-Resolution (SISR) quality has seen substantial improvement, as the strong priors learned by Text-2-Image Diffusion (T2IDiff) Foundation M...
HOLOGRAPH: Active Causal Discovery via Sheaf-Theoretic Alignment of Large Language Model Priors : Abstract: Causal discovery from observational data remains fundamentally limited by identifiability constraints. Recent work has explored leveraging Large Language Models (LLMs) as sources of prior ca...
Automated Classification of First-Trimester Fetal Heart Views Using Ultrasound-Specific Self-Supervised Learning : Abstract: Congenital heart disease remains the most common congenital anomaly and a leading cause of neonatal morbidity and mortality. Although first-trimester fetal echocardiography offers an opportu...
Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice : Abstract: Data teams at frontier AI companies routinely train small proxy models to make critical decisions about pretraining data recipes for full-scale training runs. However, the community has a li...
Generative AI-enhanced Sector-based Investment Portfolio Construction : Abstract: This paper investigates how Large Language Models (LLMs) from leading providers (OpenAI, Google, Anthropic, DeepSeek, and xAI) can be applied to quantitative sector-based portfolio construct...
More Than Bits: Multi-Envelope Double Binary Factorization for Extreme Quantization : Abstract: For extreme low-bit quantization of large language models (LLMs), Double Binary Factorization (DBF) is attractive as it enables efficient inference without sacrificing accuracy. However, the...
Localized Calibrated Uncertainty in Code Language Models : Abstract: Large Language models (LLMs) can generate complicated source code from natural language prompts. However, LLMs can generate output that deviates from what the user wants, requiring supervisi...
SynRAG: A Large Language Model Framework for Executable Query Generation in Heterogeneous SIEM System : Abstract: Security Information and Event Management (SIEM) systems are essential for large enterprises to monitor their IT infrastructure by ingesting and analyzing millions of logs and events daily. ...
Understanding and Steering the Cognitive Behaviors of Reasoning Models at Test-Time : Abstract: Large Language Models (LLMs) often rely on long chain-of-thought (CoT) reasoning to solve complex tasks. While effective, these trajectories are frequently inefficient, leading to high laten...
Chat-Driven Optimal Management for Virtual Network Services : Abstract: This paper proposes a chat-driven network management framework that integrates natural language processing (NLP) with optimization-based virtual network allocation, enabling intuitive and re...
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space : Abstract: Large Language Models (LLMs) apply uniform computation to all tokens, despite language exhibiting highly non-uniform information density. This token-uniform regime wastes capacity on locally...
AutoFed: Manual-Free Federated Traffic Prediction via Personalized Prompt : Abstract: Accurate traffic prediction is essential for Intelligent Transportation Systems, including ride-hailing, urban road planning, and vehicle fleet management. However, due to significant privac...
AI-Driven Acoustic Voice Biomarker-Based Hierarchical Classification of Benign Laryngeal Voice Disorders from Sustained Vowels : Abstract: Benign laryngeal voice disorders affect nearly one in five individuals and often manifest as dysphonia, while also serving as non-invasive indicators of broader physiological dysfunction. We...
DynaFix: Iterative Automated Program Repair Driven by Execution-Level Dynamic Information : Abstract: Automated Program Repair (APR) aims to automatically generate correct patches for buggy programs. Recent approaches leveraging large language models (LLMs) have shown promise but face limita...
Hybrid Motion Planning with Deep Reinforcement Learning for Mobile Robot Navigation : Abstract: Autonomous mobile robots operating in complex, dynamic environments face the dual challenge of navigating large-scale, structurally diverse spaces with static obstacles while safely interact...
Do Large Language Models Know What They Are Capable Of? : Abstract: We investigate whether large language models (LLMs) can predict whether they will succeed on a given task and whether their predictions improve as they progress through multi-step tasks. We ...
The Drill-Down and Fabricate Test (DDFT): A Protocol for Measuring Epistemic Robustness in Language Models : Abstract: Current language model evaluations measure what models know under ideal conditions but not how robustly they know it under realistic stress. Static benchmarks like MMLU and TruthfulQA cannot...
CASCADE: Cumulative Agentic Skill Creation through Autonomous Development and Evolution : Abstract: Large language model (LLM) agents currently depend on predefined tools or brittle tool generation, constraining their capability and adaptability to complex scientific tasks. We introduce CA...
A Proof-of-Concept for Explainable Disease Diagnosis Using Large Language Models and Answer Set Programming : Abstract: Accurate disease prediction is vital for timely intervention, effective treatment, and reducing medical complications. While symbolic AI has been applied in healthcare, its adoption remains ...
SPARK: Search Personalization via Agent-Driven Retrieval and Knowledge-sharing : Abstract: Personalized search demands the ability to model users' evolving, multi-dimensional information needs; a challenge for systems constrained by static profiles or monolithic retrieval pipeline...
ROAD: Reflective Optimization via Automated Debugging for Zero-Shot Agent Alignment : Abstract: Automatic Prompt Optimization (APO) has emerged as a critical technique for enhancing Large Language Model (LLM) performance, yet current state-of-the-art methods typically rely on large, la...
LoongFlow: Directed Evolutionary Search via a Cognitive Plan-Execute-Summarize Paradigm : Abstract: The transition from static Large Language Models (LLMs) to self-improving agents is hindered by the lack of structured reasoning in traditional evolutionary approaches. Existing methods ofte...
CogRec: A Cognitive Recommender Agent Fusing Large Language Models and Soar for Explainable Recommendation : Abstract: Large Language Models (LLMs) have demonstrated a remarkable capacity in understanding user preferences for recommendation systems. However, they are constrained by several critical challenge...
Graph-Based Exploration for ARC-AGI-3 Interactive Reasoning Tasks : Abstract: We present a training-free graph-based approach for solving interactive reasoning tasks in the ARC-AGI-3 benchmark. ARC-AGI-3 comprises game-like tasks where agents must infer task mechanics...
SCP: Accelerating Discovery with a Global Web of Autonomous Scientific Agents : Abstract: We introduce SCP: the Science Context Protocol, an open-source standard designed to accelerate discovery by enabling a global network of autonomous scientific agents. SCP is built on two fou...
Deep Reinforcement Learning for Solving the Fleet Size and Mix Vehicle Routing Problem : Abstract: The Fleet Size and Mix Vehicle Routing Problem (FSMVRP) is a prominent variant of the Vehicle Routing Problem (VRP), extensively studied in operations research and computational science. FSM...
Constrained Language Model Policy Optimization via Risk-aware Stepwise Alignment : Abstract: When fine-tuning pre-trained Language Models (LMs) to exhibit desired behaviors, maintaining control over risk is critical for ensuring both safety and trustworthiness. Most existing safety ...
Align While Search: Belief-Guided Exploratory Inference for World-Grounded Embodied Agents : Abstract: In this paper, we propose a test-time adaptive agent that performs exploratory inference through posterior-guided belief refinement without relying on gradient-based updates or additional tr...
What Drives Success in Physical Planning with Joint-Embedding Predictive World Models? : Abstract: A long-standing challenge in AI is to develop agents capable of solving a wide range of physical tasks and generalizing to new, unseen tasks and environments. A popular recent approach invol...
Thinking on Maps: How Foundation Model Agents Explore, Remember, and Reason Map Environments : Abstract: Map environments provide a fundamental medium for representing spatial structure. Understanding how foundation model (FM) agents understand and act in such environments is therefore critical...
Evaluating the Reasoning Abilities of LLMs on Underrepresented Mathematics Competition Problems : Abstract: Understanding the limitations of Large Language Models, or LLMs, in mathematical reasoning has been the focus of several recent studies. However, the majority of these studies use the same d...
From Building Blocks to Planning: Multi-Step Spatial Reasoning in LLMs with Reinforcement Learning : Abstract: Spatial reasoning in large language models (LLMs) has gained increasing attention due to applications in navigation and planning. Despite strong general language capabilities, LLMs still str...
MCPAgentBench: A Real-world Task Benchmark for Evaluating LLM Agent MCP Tool Use : Abstract: Large Language Models (LLMs) are increasingly serving as autonomous agents, and their utilization of external tools via the Model Context Protocol (MCP) is considered a future trend. Current...
Recursive Language Models : Abstract: We study allowing large language models (LLMs) to process arbitrarily long prompts through the lens of inference-time scaling. We propose Recursive Language Models (RLMs), a general inferenc...
Reinforcement Learning-Augmented LLM Agents for Collaborative Decision Making and Performance Optimization : Abstract: Large Language Models (LLMs) perform well in language tasks but often lack collaborative awareness and struggle to optimize global performance in multi-agent settings. We present a reinforce...
Group Deliberation Oriented Multi-Agent Conversational Model for Complex Reasoning : Abstract: This paper proposes a group deliberation oriented multi-agent conversational model to address the limitations of single large language models in complex reasoning tasks. The model adopts a t...
Youtu-Agent: Scaling Agent Productivity with Automated Generation and Hybrid Policy Optimization : Abstract: Existing Large Language Model (LLM) agent frameworks face two significant challenges: high configuration costs and static capabilities. Building a high-quality agent often requires extensive...
Multi-modal cross-domain mixed fusion model with dual disentanglement for fault diagnosis under unseen working conditions : Abstract: Intelligent fault diagnosis has become an indispensable technique for ensuring machinery reliability. However, existing methods suffer significant performance decline in real-world scenarios...
BatteryAgent: Synergizing Physics-Informed Interpretation with LLM Reasoning for Intelligent Battery Fault Diagnosis : Abstract: Fault diagnosis of lithium-ion batteries is critical for system safety. While existing deep learning methods exhibit superior detection accuracy, their "black-box" nature hinders interpretab...
Explaining Why Things Go Where They Go: Interpretable Constructs of Human Organizational Preferences : Abstract: Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While effective at prediction, these models offer limited insigh...
GenZ: Foundational models as latent variable generators within traditional statistical models : Abstract: We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge,...
A study on constraint extraction and exception exclusion in care worker scheduling : Abstract: Technologies for automatically generating work schedules have been extensively studied; however, in long-term care facilities, the conditions vary between facilities, making it essential to ...
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem : Abstract: Agentic crafting requires LLMs to operate in real-world environments over multiple turns by taking actions, observing outcomes, and iteratively refining artifacts. Despite its importance, th...
Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing : Abstract: This report presents the design and implementation of a semi-automated data annotation pipeline developed within the DARTS project, whose goal is to create a large-scale, multimodal dataset ...
Iterative Deployment Improves Planning Skills in LLMs : Abstract: We show that iterative deployment of large language models (LLMs), each fine-tuned on data carefully curated by users from the previous models' deployment, can significantly change the prope...
AMAP Agentic Planning Technical Report : Abstract: We present STAgent, an agentic large language model tailored for spatio-temporal understanding, designed to solve complex tasks such as constrained point-of-interest discovery and itinerary ...
Context-aware LLM-based AI Agents for Human-centered Energy Management Systems in Smart Buildings : Abstract: This study presents a conceptual framework and a prototype assessment for Large Language Model (LLM)-based Building Energy Management System (BEMS) AI agents to facilitate context-aware ener...
Enriching Historical Records: An OCR and AI-Driven Approach for Database Integration : Abstract: This research digitizes and analyzes the Leidse hoogleraren en lectoren 1575-1815 books written between 1983 and 1985, which contain biographic data about professors and curators of Leiden U...
STED and Consistency Scoring: A Framework for Evaluating LLM Structured Output Reliability : Abstract: Large Language Models (LLMs) are increasingly deployed for structured data generation, yet output consistency remains critical for production applications. We introduce a comprehensive frame...
PyBangla at BLP-2025 Task 2: Enhancing Bangla-to-Python Code Generation with Iterative Self-Correction and Multilingual Agents : Abstract: LLMs excel at code generation from English prompts, but this progress has not extended to low-resource languages. We address Bangla-to-Python code generation by introducing BanglaCodeAct, an...
HarmTransform: Transforming Explicit Harmful Queries into Stealthy via Multi-Agent Debate : Abstract: Large language models (LLMs) are equipped with safety mechanisms to detect and block harmful queries, yet current alignment approaches primarily focus on overtly dangerous content and overlo...
A Survey of AI Methods for Geometry Preparation and Mesh Generation in Engineering Simulation : Abstract: Artificial intelligence is beginning to ease long-standing bottlenecks in the CAD-to-mesh pipeline. This survey reviews recent advances where machine learning aids part classification, mesh ...
q3-MuPa: Quick, Quiet, Quantitative Multi-Parametric MRI using Physics-Informed Diffusion Models : Abstract: The 3D fast silent multi-parametric mapping sequence with zero echo time (MuPa-ZTE) is a novel quantitative MRI (qMRI) acquisition that enables nearly silent scanning by using a 3D phyllotax...
When in Doubt, Deliberate: Confidence-Based Routing to Expert Debate for Sexism Detection : Abstract: Sexist content online increasingly appears in subtle, context-dependent forms that evade traditional detection methods. Its interpretation often depends on overlapping linguistic, psychologi...
Enforcing Temporal Constraints for LLM Agents : Abstract: LLM-based agents are deployed in safety-critical applications, yet current guardrail systems fail to prevent violations of temporal safety policies, requirements that govern the ordering and...
Break Out the Silverware -- Semantic Understanding of Stored Household Items : Abstract: ``Bring me a plate.'' For domestic service robots, this simple command reveals a complex challenge: inferring where everyday items are stored, often out of sight in drawers, cabinets, or clo...
Towards representation agnostic probabilistic programming : Abstract: Current probabilistic programming languages and tools tightly couple model representations with specific inference algorithms, preventing experimentation with novel representations or mixed ...
AgenticTCAD: A LLM-based Multi-Agent Framework for Automated TCAD Code Generation and Device Optimization : Abstract: With the continued scaling of advanced technology nodes, the design-technology co-optimization (DTCO) paradigm has become increasingly critical, rendering efficient device design and optimiz...
Hybrid-Code: A Privacy-Preserving, Redundant Multi-Agent Framework for Reliable Local Clinical Coding : Abstract: Clinical coding automation using cloud-based Large Language Models (LLMs) poses privacy risks and latency bottlenecks, rendering them unsuitable for on-premise healthcare deployment. We intr...
State-of-the-art Small Language Coder Model: Mify-Coder : Abstract: We present Mify-Coder, a 2.5B-parameter code model trained on 4.2T tokens using a compute-optimal strategy built on the Mify-2.5B foundation model. Mify-Coder achieves comparable accuracy an...
Coordinate Matrix Machine: A Human-level Concept Learning to Classify Very Similar Documents : Abstract: Human-level concept learning argues that humans typically learn new concepts from a single example, whereas machine learning algorithms typically require hundreds of samples to learn a singl...
Geometric Scaling of Bayesian Inference in LLMs : Abstract: Recent work has shown that small transformers trained in controlled "wind-tunnel'' settings can implement exact Bayesian inference, and that their training dynamics produce a geometric subst...
Generalized Regularized Evidential Deep Learning Models: Theory and Comprehensive Evaluation : Abstract: Evidential deep learning (EDL) models, based on Subjective Logic, introduce a principled and computationally efficient way to make deterministic neural networks uncertainty-aware. The result...
HINTS: Extraction of Human Insights from Time-Series Without External Sources : Abstract: Human decision-making, emotions, and collective psychology are complex factors that shape the temporal dynamics observed in financial and economic systems. Many recent time series forecastin...
Leveraging Machine Learning for Early Detection of Lung Diseases : Abstract: A combination of traditional image processing methods with advanced neural networks concretes a predictive and preventive healthcare paradigm. This study offers rapid, accurate, and non-inva...
Audited Skill-Graph Self-Improvement for Agentic LLMs via Verifiable Rewards, Experience Synthesis, and Continual Memory : Abstract: Reinforcement learning is increasingly used to transform large language models into agentic systems that act over long horizons, invoke tools, and manage memory under partial observability. ...
Drift-Based Dataset Stability Benchmark : Abstract: Machine learning (ML) represents an efficient and popular approach for network traffic classification. However, network traffic classification is a challenging domain, and trained models may...
Entropy-Aware Speculative Decoding Toward Improved LLM Reasoning : Abstract: Speculative decoding (SD) accelerates large language model (LLM) reasoning by using a small draft model to generate candidate tokens, which the target LLM either accepts directly or regenera...
Enabling Physical AI at the Edge: Hardware-Accelerated Recovery of System Dynamics : Abstract: Physical AI at the edge -- enabling autonomous systems to understand and predict real-world dynamics in real time -- requires hardware-efficient learning and inference. Model recovery (MR), ...
Uncovering Discrimination Clusters: Quantifying and Explaining Systematic Fairness Violations : Abstract: Fairness in algorithmic decision-making is often framed in terms of individual fairness, which requires that similar individuals receive similar outcomes. A system violates individual fairne...
Safety-Biased Policy Optimisation: Towards Hard-Constrained Reinforcement Learning via Trust Regions : Abstract: Reinforcement learning (RL) in safety-critical domains requires agents to maximise rewards while strictly adhering to safety constraints. Existing approaches, such as Lagrangian and projecti...
FineFT: Efficient and Risk-Aware Ensemble Reinforcement Learning for Futures Trading : Abstract: Futures are contracts obligating the exchange of an asset at a predetermined date and price, notable for their high leverage and liquidity and, therefore, thrive in the Crypto market. RL has...
A Survey on Graph Neural Networks for Fraud Detection in Ride Hailing Platforms : Abstract: This study investigates fraud detection in ride hailing platforms through Graph Neural Networks (GNNs),focusing on the effectiveness of various models. By analyzing prevalent fraudulent acti...
Prompt-Induced Over-Generation as Denial-of-Service: A Black-Box Attack-Side Benchmark : Abstract: Large language models (LLMs) can be driven into over-generation, emitting thousands of tokens before producing an end-of-sequence (EOS) token. This degrades answer quality, inflates latency ...
Zero-Trust Agentic Federated Learning for Secure IIoT Defense Systems : Abstract: Recent attacks on critical infrastructure, including the 2021 Oldsmar water treatment breach and 2023 Danish energy sector compromises, highlight urgent security gaps in Industrial IoT (IIoT...
StressRoBERTa: Cross-Condition Transfer Learning from Depression, Anxiety, and PTSD to Stress Detection : Abstract: The prevalence of chronic stress represents a significant public health concern, with social media platforms like Twitter serving as important venues for individuals to share their experienc...
Improved Bounds for Private and Robust Alignment : Abstract: In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online se...
Quantum Error Mitigation with Attention Graph Transformers for Burgers Equation Solvers on NISQ Hardware : Abstract: We present a hybrid quantum-classical framework augmented with learned error mitigation for solving the viscous Burgers equation on noisy intermediate-scale quantum (NISQ) hardware. Using th...
Video-Based Performance Evaluation for ECR Drills in Synthetic Training Environments : Abstract: Effective urban warfare training requires situational awareness and muscle memory, developed through repeated practice in realistic yet controlled environments. A key drill, Enter and Clear ...
Artificial Intelligence for All? Brazilian Teachers on Ethics, Equity, and the Everyday Challenges of AI in Education : Abstract: This study examines the perceptions of Brazilian K-12 education teachers regarding the use of AI in education, specifically General Purpose AI. This investigation employs a quantitative anal...
Explaining News Bias Detection: A Comparative SHAP Analysis of Transformer Model Decision Mechanisms : Abstract: Automated bias detection in news text is heavily used to support journalistic analysis and media accountability, yet little is known about how bias detection models arrive at their decisions...
Retrieval Augmented Question Answering: When Should LLMs Admit Ignorance? : Abstract: The success of expanded context windows in Large Language Models (LLMs) has driven increased use of broader context in retrieval-augmented generation. We investigate the use of LLMs for retr...
Adversarial Lens: Exploiting Attention Layers to Generate Adversarial Examples for Evaluation : Abstract: Recent advances in mechanistic interpretability suggest that intermediate attention layers encode token-level hypotheses that are iteratively refined toward the final output. In this work, w...
From Correctness to Collaboration: Toward a Human-Centered Framework for Evaluating AI Agent Behavior in Software Engineering : Abstract: As Large Language Models (LLMs) evolve from code generators into collaborative partners for software engineers, our methods for evaluation are lagging. Current benchmarks, focused on code co...
Security Without Detection: Economic Denial as a Primitive for Edge and IoT Defense : Abstract: Detection-based security fails against sophisticated attackers using encryption, stealth, and low-rate techniques, particularly in IoT/edge environments where resource constraints preclude M...
Seeking Late Night Life Lines: Experiences of Conversational AI Use in Mental Health Crisis : Abstract: Online, people often recount their experiences turning to conversational AI agents (e.g., ChatGPT, Claude, Copilot) for mental health support -- going so far as to replace their therapists. ...
Lifelong Domain Adaptive 3D Human Pose Estimation : Abstract: 3D Human Pose Estimation (3D HPE) is vital in various applications, from person re-identification and action recognition to virtual reality. However, the reliance on annotated 3D data collec...

Research Sources: 450 | Generated: 1/1/2026