AI Research News Feeds for December 8th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition : Abstract: Facial recognition has become a widely used method for authentication and identification, with applications for secure access and locating missing persons. Its success is largely attributed ...
Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition : Abstract: In this paper, we present a synthesis pipeline and dataset for training / testing data in the task of traffic sign recognition that combines the advantages of data-driven and analytical mode...
EditThinker: Unlocking Iterative Reasoning for Any Image Editor : Abstract: Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aesthetic quality, making instruction-...
ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety : Abstract: Vulnerable road users (VRUs) face high collision risks in mixed traffic, yet most existing safety systems prioritize driver or vehicle assistance over direct VRU support. This paper presents...
Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation : Abstract: Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individ...
Physically-Based Simulation of Automotive LiDAR : Abstract: We present an analytic model for simulating automotive time-of-flight (ToF) LiDAR that includes blooming, echo pulse width, and ambient light, along with steps to determine model parameters ...
SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models : Abstract: Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises fr...
Multi-Scale Direction-Aware Network for Infrared Small Target Detection : Abstract: Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on edge and shape fea...
iMotion-LLM: Instruction-Conditioned Trajectory Generation : Abstract: We introduce iMotion-LLM, a large language model (LLM) integrated with trajectory prediction modules for interactive motion generation. Unlike conventional approaches, it generates feasible,...
PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds : Abstract: Creation of new annotated public datasets is crucial in helping advances in 3D computer vision and machine learning meet their full potential for automatic interpretation of 3D plant models....
Neural Eulerian Scene Flow Fields : Abstract: We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, ...
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM : Abstract: Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to a...
Perspective-Invariant 3D Object Detection : Abstract: With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on veh...
Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4 : Abstract: In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technolog...
Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change : Abstract: This study demonstrates how hybrid neural-symbolic methods can yield significant new insights into the evolution of a morphologically rich, low-resource language. We challenge the naive assu...
Learning from Self Critique and Refinement for Faithful LLM Summarization : Abstract: Large Language Models (LLMs) often suffer from hallucinations: output content that is not grounded in the input context, when performing long-form text generation tasks such as summarization...
SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs : Abstract: Post-training quantization (PTQ) plays a crucial role in the democratization of large language models (LLMs). However, existing low-bit quantization and sparsification techniques are difficu...
LMSpell: Neural Spell Checking for Low-Resource Languages : Abstract: Spell correction is still a challenging problem for low-resource languages (LRLs). While pretrained language models (PLMs) have been employed for spell correction, their use is still limited...
SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures : Abstract: Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing mult...
Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches : Abstract: Objective: To evaluate large language models (LLMs) against supervised baselines for fine-grained, lesion-level detection of incidentalomas requiring follow-up, addressing the limitations of...
Structured Reasoning with Tree-of-Thoughts for Bengali Math Word Problems : Abstract: Mathematical Word Problems (MWPs) are among the most challenging tasks in natural language processing because they require both linguistic understanding and multi-step numerical reasoning. W...
A Greek Government Decisions Dataset for Public-Sector Analysis and Insight : Abstract: We introduce an open, machine-readable corpus of Greek government decisions sourced from the national transparency platform Diavgeia. The resource comprises 1 million decisions, featuring an...
Interleaved Latent Visual Reasoning with Selective Perceptual Modeling : Abstract: Interleaved reasoning paradigms enhance Multimodal Large Language Models (MLLMs) with visual feedback but are hindered by the prohibitive computational cost of repeatedly re-encoding pixel-d...
MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation : Abstract: The significant gap between rising demands for clinical training and the scarcity of expert instruction poses a major challenge to medical education. With powerful capabilities in personaliz...
Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning : Abstract: Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on...
Heard or Halted? Gender, Interruptions, and Emotional Tone in U.S. Supreme Court Oral Arguments : Abstract: This study examines how interruptions during U.S. Supreme Court oral arguments shape both the semantic content and emotional tone of advocates' speech, with a focus on gendered dynamics in j...
Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy : Abstract: This is the fourth in a series of short reports that help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. Here, we ask w...
Vague Knowledge: Information without Transitivity and Partitions : Abstract: I relax the standard assumptions of transitivity and partition structure in economic models of information to formalize vague knowledge: non-transitive indistinguishability over states. I sh...
Self-Improving VLM Judges Without Human Annotations : Abstract: Effective judges of Vision-Language Models (VLMs) are crucial for model development. Current methods for training VLM judges mainly rely on large-scale human preference annotations. However,...
TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows : Abstract: Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically b...
EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models : Abstract: Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus ...
DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of Rendering : Abstract: Traditional Image Quality Assessment~(IQA) focuses on quantifying technical degradations such as noise, blur, or compression artifacts, using both full-reference and no-reference objective m...
IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video Reconstruction : Abstract: Continuous video monitoring in surveillance, robotics, and wearable systems faces a fundamental power constraint: conventional RGB cameras consume substantial energy through fixed-rate captu...
Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data Anonymization : Abstract: While three-dimensional (3D) shape and pose estimation is a highly researched area that has yielded significant advances, the resulting methods, despite performing well for the adult populat...
CARD: Correlation Aware Restoration with Diffusion : Abstract: Denoising diffusion models have achieved state-of-the-art performance in image restoration by modeling the process as sequential denoising steps. However, most approaches assume independent ...
Inferring Compositional 4D Scenes without Ever Seeing One : Abstract: Scenes in the real world are often composed of several static and dynamic objects. Capturing their 4-dimensional structures, composition and spatio-temporal configuration in-the-wild, though...
SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training : Abstract: The rise of 3D Gaussian Splatting has revolutionized photorealistic 3D asset creation, yet a critical gap remains for their interactive refinement and editing. Existing approaches based on d...
Group Orthogonal Low-Rank Adaptation for RGB-T Tracking : Abstract: Parameter-efficient fine-tuning has emerged as a promising paradigm in RGB-T tracking, enabling downstream task adaptation by freezing pretrained parameters and fine-tuning only a small set ...
ShaRP: SHAllow-LayeR Pruning for Video Large Language Models Acceleration : Abstract: Video Large Language Models (VLLMs) face the challenge of high computational load during the pre-filling stage due to the processing of an enormous number of visual tokens. Although attentio...
LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models : Abstract: Whole Slide Image (WSI) understanding is fundamentally challenging due to its gigapixel scale and the extreme sparsity of diagnostically relevant regions. Unlike human experts who primarily ...
Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability : Abstract: Latent diffusion models pair VAEs with diffusion backbones, and the structure of VAE latents strongly influences the difficulty of diffusion training. However, existing video VAEs typically ...
The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos : Abstract: Estimating accurate camera poses, 3D scene geometry, and object motion from in-the-wild videos is a long-standing challenge for classical structure from motion pipelines due to the presence ...
Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch Images : Abstract: Traditional stereo matching algorithms like Semi-Global Block Matching (SGBM) with Weighted Least Squares (WLS) filtering offer speed advantages over neural networks for UAV applications, ge...
YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications : Abstract: Manual pruning of radiata pine trees poses significant safety risks due to extreme working heights and challenging terrain. This paper presents a computer vision framework that integrates YO...
Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry Systems : Abstract: UAV-based autonomous forestry operations require rapid and precise tree branch segmentation for safe navigation and automated pruning across varying pixel resolutions and operational conditi...
ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction : Abstract: Unified multimodal models significantly improve visual generation by combining vision-language models (VLMs) with diffusion models. However, existing methods struggle to fully balance suffic...
TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression : Abstract: Building on the success of 3D Gaussian Splatting (3DGS) in static 3D scene representation, its extension to dynamic scenes, commonly referred to as 4DGS or dynamic 3DGS, has attracted increa...
EmoStyle: Emotion-Driven Image Stylization : Abstract: Art has long been a profound medium for expressing emotions. While existing image stylization methods effectively transform visual appearance, they often overlook the emotional impact carrie...
Concept-based Explainable Data Mining with VLM for 3D Detection : Abstract: Rare-object detection remains a challenging task in autonomous driving systems, particularly when relying solely on point cloud data. Although Vision-Language Models (VLMs) exhibit strong ca...
WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field : Abstract: Underwater video pairs are fairly difficult to obtain due to the complex underwater imaging. In this case, most existing video underwater enhancement methods are performed by directly applyi...
Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation : Abstract: To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specif...
Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm : Abstract: While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remain...
Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning : Abstract: Large Video-Language Models (Video-LMs) have achieved impressive progress in multimodal understanding, yet their reasoning remains weakly grounded in space and time. We present Know-Show, a ...
VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation : Abstract: Spatio-temporal scene graph generation (ST-SGG) aims to model objects and their evolving relationships across video frames, enabling interpretable representations for downstream reasoning ta...
Ideal Observer for Segmentation of Dead Leaves Images : Abstract: The human visual environment is comprised of different surfaces that are distributed in space. The parts of a scene that are visible at any one time are governed by the occlusion of overlapp...
ProPhy: Progressive Physical Alignment for Dynamic World Simulation : Abstract: Recent advances in video generation have shown remarkable potential for constructing world simulators. However, current models still struggle to produce physically consistent results, partic...
MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging : Abstract: Accurate spatial correspondence between medical images is essential for longitudinal analysis, lesion tracking, and image-guided interventions. Medical image registration methods rely on loc...
Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer : Abstract: We present a novel method for generating 3D garment deformations from given body poses, which is key to a wide range of applications, including virtual try-on and extended reality. To simpli...
Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction : Abstract: Recent perception-generalist approaches based on language models have achieved state-of-the-art results across diverse tasks, including 3D scene layout estimation, via unified architecture a...
NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projections : Abstract: Laser scanning has proven to be an invaluable tool in assessing the decomposition of forest environments. Mobile laser scanning (MLS) has shown to be highly promising for extremely accurate,...
DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model : Abstract: Cross-Domain Few-Shot Semantic Segmentation (CD-FSS) seeks to segment unknown classes in unseen domains using only a few annotated examples. This setting is inherently challenging: source an...
Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data : Abstract: Learned Image Signal Processing (ISP) pipelines offer powerful end-to-end performance but are critically dependent on large-scale paired raw-to-sRGB datasets. This reliance on costly-to-acqu...
Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective : Abstract: The proliferation of AI-generated imagery poses escalating challenges for multimedia forensics, yet many existing detectors depend on assumptions about the internals of specific generative m...
LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection : Abstract: Real-time monocular 3D object detection remains challenging due to severe depth ambiguity, viewpoint shifts, and the high computational cost of 3D reasoning. Existing approaches either rely ...
Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric Features : Abstract: Facial expression recognition is a crucial component in enhancing human-computer interaction and developing emotion-aware systems. Real-time detection and interpretation of facial expression...
Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume Maximization : Abstract: Hyperspectral unmixing (HSU) aims to separate each pixel into its constituent endmembers and estimate their corresponding abundance fractions. This work presents an algorithm-unrolling-based...
Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration Correction : Abstract: Optical aberrations significantly degrade image quality in microscopy, particularly when imaging deeper into samples. These aberrations arise from distortions in the optical wavefront and ca...
OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning : Abstract: Unsupervised 3D object detection leverages heuristic algorithms to discover potential objects, offering a promising route to reduce annotation costs in autonomous driving. Existing approache...
Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature Learning : Abstract: Point cloud completion seeks to recover geometrically consistent shapes from partial or sparse 3D observations. Although recent methods have achieved reasonable global shape reconstruction, ...
Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision : Abstract: Recently, Vision Large Language Models (VLMs) have demonstrated high potential in computer-aided diagnosis and decision-support. However, current VLMs show deficits in domain specific surgic...
HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models : Abstract: Diffusion models have demonstrated significant applications in the field of image generation. However, their high computational and memory costs pose challenges for deployment. Model quantiz...
USV: Unified Sparsification for Accelerating Video Diffusion Models : Abstract: The scalability of high-fidelity video diffusion models (VDMs) is constrained by two key sources of redundancy: the quadratic complexity of global spatio-temporal attention and the computati...
Label-Efficient Point Cloud Segmentation with Active Learning : Abstract: Semantic segmentation of 3D point cloud data often comes with high annotation costs. Active learning automates the process of selecting which data to annotate, reducing the total amount of a...
FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural Operators : Abstract: We present FNOpt, a self-supervised cloth simulation framework that formulates time integration as an optimization problem and trains a resolution-agnostic neural optimizer parameterized by ...
Bring Your Dreams to Life: Continual Text-to-Video Customization : Abstract: Customized text-to-video generation (CTVG) has recently witnessed great progress in generating tailored videos from user-specific text. However, most CTVG methods assume that personalized co...
UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease Detection : Abstract: Alzheimer's disease (AD) is an irreversible neurodegenerative disorder, and early diagnosis is critical for timely intervention. However, most existing classification frameworks face challen...
VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack : Abstract: Multimodal Large Language Models (MLLMs) are widely used in various fields due to their powerful cross-modal comprehension and generation capabilities. However, more modalities bring more vu...
Edit-aware RAW Reconstruction : Abstract: Users frequently edit camera images post-capture to achieve their preferred photofinishing style. While editing in the RAW domain provides greater accuracy and flexibility, most edits are pe...
Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN Discriminator : Abstract: Underwater imaging is essential for marine exploration, environmental monitoring, and infrastructure inspection. However, water causes severe image degradation through wavelength-dependent a...
SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations : Abstract: Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a refe...
LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology reduces pixel-level labeling by learning from image-level labels, but it is hindered by inter-class homogeneity, intra-class ...
LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction : Abstract: Deep residual networks (ResNets) have demonstrated outstanding success in computer vision tasks, attributed to their ability to maintain gradient flow through deep architectures. Simultaneou...
KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity : Abstract: The Key-Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vectors to accelerate inference. Yet, as sequence length ...
On the Bayes Inconsistency of Disagreement Discrepancy Surrogates : Abstract: Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this...
Developing synthetic microdata through machine learning for firm-level business surveys : Abstract: Public-use microdata samples (PUMS) from the United States (US) Census Bureau on individuals have been available for decades. However, large increases in computing power and the greater avai...
Bayesian Optimization and Convolutional Neural Networks for Zernike-Based Wavefront Correction in High Harmonic Generation : Abstract: High harmonic generation (HHG) is a nonlinear process that enables table-top generation of tunable, high-energy, coherent, ultrashort radiation pulses in the extreme ultraviolet (EUV) to sof...
InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models : Abstract: Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present Inva...
Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative Models : Abstract: We present a transfer-learning generative downscaling framework to reconstruct fine resolution satellite images from coarse scale inputs. Our approach combines a lightweight U-Net transfer e...
Continuous-Time Homeostatic Dynamics for Reentrant Inference Models : Abstract: We formulate the Fast-Weights Homeostatic Reentry Network (FHRN) as a continuous-time neural-ODE system, revealing its role as a norm-regulated reentrant dynamical process. Starting from the...
Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models : Abstract: Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents...
Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem : Abstract: Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative...
STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings : Abstract: Accurate prediction of protein function is essential for elucidating molecular mechanisms and advancing biological and therapeutic discovery. Yet experimental annotation lags far behind the ...
One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow : Abstract: Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps ...
Robust forecast aggregation via additional queries : Abstract: We study the problem of robust forecast aggregation: combining expert forecasts with provable accuracy guarantees compared to the best possible aggregation of the underlying information. Pri...
Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats : Abstract: The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimi...
Symmetric Linear Dynamical Systems are Learnable from Few Observations : Abstract: We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce...
FieldSeer I: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability : Abstract: We introduce FieldSeer I, a geometry-aware world model that forecasts electromagnetic field dynamics from partial observations in 2-D TE waveguides. The model assimilates a short prefix of o...
PoolNet: Deep Learning for 2D to 3D Video Process Validation : Abstract: Lifting Structure-from-Motion (SfM) information from sequential and non-sequential image data is a time-consuming and computationally expensive task. In addition to this, the majority of pub...
EXR: An Interactive Immersive EHR Visualization in Extended Reality : Abstract: This paper presents the design and implementation of an Extended Reality (XR) platform for immersive, interactive visualization of Electronic Health Records (EHRs). The system extends beyond...
Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data : Abstract: As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), rese...
Model Gateway: Model Management Platform for Model-Driven Drug Discovery : Abstract: This paper presents the Model Gateway, a management platform for managing machine learning (ML) and scientific computational models in the drug discovery pipeline. The platform supports Larg...
SSDLabeler: Realistic semi-synthetic data generation for multi-label artifact classification in EEG : Abstract: EEG recordings are inherently contaminated by artifacts such as ocular, muscular, and environmental noise, which obscure neural activity and complicate preprocessing. Artifact classification...
DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis : Abstract: Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is chall...
Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement : Abstract: Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training exam...
Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening : Abstract: Art has long played a profound role in shaping human emotion, cognition, and behavior. While visual arts such as painting and architecture have been studied through eye tracking, revealing d...
Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches : Abstract: We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a desi...
Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces : Abstract: Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-...
Comparing the latent features of universal machine-learning interatomic potentials : Abstract: The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wid...
Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse Depth : Abstract: When depth sensors provide only 5% of needed measurements, reconstructing complete 3D scenes becomes difficult. Autonomous vehicles and robots cannot tolerate the geometric errors that spars...
Machine-learning-enabled interpretation of tribological deformation patterns in large-scale MD data : Abstract: Molecular dynamics (MD) simulations have become indispensable for exploring tribological deformation patterns at the atomic scale. However, transforming the resulting high-dimensional data i...
Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models : Abstract: Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same exten...
NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction : Abstract: Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate postoperative facial appearan...
BalLOT: Balanced $k$-means clustering with optimal transport : Abstract: We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it...
Designing an Optimal Sensor Network via Minimizing Information Loss : Abstract: Optimal experimental design is a classic topic in statistics, with many well-studied problems, applications, and solutions. The design problem we study is the placement of sensors to monitor...
Consequences of Kernel Regularity for Bandit Optimization : Abstract: In this work we investigate the relationship between kernel regularity and algorithmic performance in the bandit optimization of RKHS functions. While reproducing kernel Hilbert space (RKHS)...
Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks : Abstract: Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is unclear whether these theories explain the performances of ac...
SPARTAN: A Sparse Transformer World Model Attending to What Matters : Abstract: Capturing the interactions between entities in a structured way plays a central role in world models that flexibly adapt to changes in the environment. Recent works motivate the benefits of ...
Second Maximum of a Gaussian Random Field and Exact (t-)Spacing test : Abstract: In this article, we introduce the novel concept of the second maximum of a Gaussian random field on a Riemannian submanifold. This second maximum serves as a powerful tool for characterizing...
Semantic Communication and Control Co-Design for Multi-Objective Distinct Dynamics : Abstract: This letter introduces a machine-learning approach to learning the semantic dynamics of correlated systems with different control rules and dynamics. By leveraging the Koopman operator in an...
Operator learning meets inverse problems: A probabilistic perspective : Abstract: Operator learning offers a robust framework for approximating mappings between infinite-dimensional function spaces. It has also become a powerful tool for solving inverse problems in the co...
Unveiling Affective Polarization Trends in Parliamentary Proceedings : Abstract: Recent years have seen an increase in polarized discourse worldwide, on various platforms. We propose a novel method for quantifying polarization, based on the emotional style of the discour...
Decoding the Black Box: Discerning AI Rhetorics About and Through Poetic Prompting : Abstract: Prompt engineering has emerged as a useful way studying the algorithmic tendencies and biases of large language models. Meanwhile creatives and academics have leveraged LLMs to develop creat...
Meta-Learning Multi-armed Bandits for Beam Tracking in 5G and 6G Networks : Abstract: Beamforming-capable antenna arrays with many elements enable higher data rates in next generation 5G and 6G networks. In current practice, analog beamforming uses a codebook of pre-configure...
BERTO: an Adaptive BERT-based Network Time Series Predictor with Operator Preferences in Natural Language : Abstract: We introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO delivers high prediction accuracy, w...
Teaching Language Models Mechanistic Explainability Through Arrow-Pushing : Abstract: Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computa...
Towards agent-based-model informed neural networks : Abstract: In this article, we present a framework for designing neural networks that remain consistent with the underlying principles of agent-based models. We begin by highlighting the limitations of...
Learnability Window in Gated Recurrent Neural Networks : Abstract: We develop a theoretical framework that explains how gating mechanisms determine the learnability window $\mathcal{H}_N$ of recurrent neural networks, defined as the largest temporal horizon...
Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws : Abstract: Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and comp...
Predicting Price Movements in High-Frequency Financial Data with Spiking Neural Networks : Abstract: Modern high-frequency trading (HFT) environments are characterized by sudden price spikes that present both risk and opportunity, but conventional financial models often fail to capture the ...
Computational Design of Low-Volatility Lubricants for Space Using Interpretable Machine Learning : Abstract: The function and lifetime of moving mechanical assemblies (MMAs) in space depend on the properties of lubricants. MMAs that experience high speeds or high cycles require liquid based lubrica...
DAE-HardNet: A Physics Constrained Neural Network Enforcing Differential-Algebraic Hard Constraints : Abstract: Traditional physics-informed neural networks (PINNs) do not always satisfy physics based constraints, especially when the constraints include differential operators. Rather, they minimize th...
NeuroMemFPP: A recurrent neural approach for memory-aware parameter estimation in fractional Poisson process : Abstract: In this paper, we propose a recurrent neural network (RNN)-based framework for estimating the parameters of the fractional Poisson process (FPP), which models event arrivals with memory and ...
Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding : Abstract: Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervision, they still face various c...
Impugan: Learning Conditional Generative Models for Robust Data Imputation : Abstract: Incomplete data are common in real-world applications. Sensors fail, records are inconsistent, and datasets collected from different sources often differ in scale, sampling rate, and quality...
Trusted AI Agents in the Cloud : Abstract: AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However...
MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution : Abstract: Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem,...
M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG : Abstract: Vision-language models (VLMs) have achieved strong performance in visual question answering (VQA), yet they remain constrained by static training data. Retrieval-Augmented Generation (RAG) m...
AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement : Abstract: Underwater images often suffer from severe color distortion, low contrast, and a hazy appearance due to wavelength-dependent light absorption and scattering. Simultaneously, existing deep le...
Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity : Abstract: Reinforcement Learning (RL) has become the de facto standard for tuning LLMs to solve tasks involving reasoning. However, growing evidence shows that models trained in such way often suffer ...
Training-Time Action Conditioning for Efficient Real-Time Chunking : Abstract: Real-time chunking (RTC) enables vision-language-action models (VLAs) to generate smooth, reactive robot trajectories by asynchronously predicting action chunks and conditioning on previousl...
Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms : Abstract: In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable k...
Rolling in the deep of cognitive and AI biases : Abstract: Nowadays, we delegate many of our decisions to Artificial Intelligence (AI) that acts either in solo or as a human companion in decisions made to support several sensitive domains, like heal...
Debate over Mixed-knowledge: A Robust Multi-Agent Reasoning Framework for Incomplete Knowledge Graph Question Answering : Abstract: Knowledge Graph Question Answering (KGQA) aims to improve factual accuracy by leveraging structured knowledge. However, real-world Knowledge Graphs (KGs) are often incomplete, leading to the...
ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset : Abstract: Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still h...
GTM: Simulating the World of Tools for AI Agents : Abstract: The integration of external tools is pivotal for empowering Large Language Model (LLM) agents with real-world capabilities. However, training these agents through direct, continuous interact...
Towards Data-efficient Customer Intent Recognition with Prompt-based Learning Paradigm : Abstract: Recognizing customer intent accurately with language models based on customer-agent conversational data is essential in today's digital customer service marketplace, but it is often hindered...
A Scene-aware Models Adaptation Scheme for Cross-scene Online Inference on Mobile Devices : Abstract: Emerging Artificial Intelligence of Things (AIoT) applications desire online prediction using deep neural network (DNN) models on mobile devices. However, due to the movement of devices, unf...
Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling : Abstract: Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility a...
Detecting the Future: All-at-Once Event Sequence Forecasting with Horizon Matching : Abstract: Long-horizon events forecasting is a crucial task across various domains, including retail, finance, healthcare, and social networks. Traditional models for event sequences often extend to f...
Image-Guided Semantic Pseudo-LiDAR Point Generation for 3D Object Detection : Abstract: In autonomous driving scenarios, accurate perception is becoming an even more critical task for safe navigation. While LiDAR provides precise spatial data, its inherent sparsity makes it dif...
Edge-Only Universal Adversarial Attacks in Distributed Learning : Abstract: Distributed learning frameworks, which partition neural network models across multiple computing nodes, enhance efficiency in collaborative edge-cloud systems, but may also introduce new vul...
Coefficient of Variation Masking: A Volatility-Aware Strategy for EHR Foundation Models : Abstract: Masked autoencoders (MAEs) are increasingly applied to electronic health records (EHR) for learning general-purpose representations that support diverse clinical tasks. However, existing app...
Rethinking Tokenization for Clinical Time Series: When Less is More : Abstract: Tokenization strategies shape how models process electronic health records, yet fair comparisons of their effectiveness remain limited. We present a systematic evaluation of tokenization app...
Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance : Abstract: Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-int...
Variance Matters: Improving Domain Adaptation via Stratified Sampling : Abstract: Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during...
Edged Weisfeiler-Lehman Algorithm : Abstract: As a classical approach on graph learning, the propagation-aggregation methodology is widely exploited by many of Graph Neural Networks (GNNs), wherein the representation of a node is update...
Bridging quantum and classical computing for partial differential equations through multifidelity machine learning : Abstract: Quantum algorithms for partial differential equations (PDEs) face severe practical constraints on near-term hardware: limited qubit counts restrict spatial resolution to coarse grids, while ...
When unlearning is free: leveraging low influence points to reduce computational costs : Abstract: As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art u...
DMAGT: Unveiling miRNA-Drug Associations by Integrating SMILES and RNA Sequence Structures through Graph Transformer Models : Abstract: MiRNAs, due to their role in gene regulation, have paved a new pathway for pharmacology, focusing on drug development that targets miRNAs. However, traditional wet lab experiments are limite...
Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces : Abstract: Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training...
Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) : Abstract: Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification cap...
Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay : Abstract: Background: Deep Deterministic Policy Gradient-based reinforcement learning algorithms utilize Actor-Critic architectures, where both networks are typically trained using identical batches o...
Non-Convex Federated Optimization under Cost-Aware Client Selection : Abstract: Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, whil...
PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering : Abstract: Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer. With the help of Large Language Models and their reas...
Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models : Abstract: Large Language Models (LLMs) are typically aligned for safety during the post-training phase; however, they may still generate inappropriate outputs that could potentially pose risks to user...
When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation : Abstract: Large Language Models (LLMs) have shown strong potential in accelerating digital hardware design through automated code generation. Yet, ensuring their reliability remains a critical challen...
Enhancing Dimensionality Prediction in Hybrid Metal Halides via Feature Engineering and Class-Imbalance Mitigation : Abstract: We present a machine learning framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemicall...
RevoNAD: Reflective Evolutionary Exploration for Neural Architecture Design : Abstract: Recent progress in leveraging large language models (LLMs) has enabled Neural Architecture Design (NAD) systems to generate new architecture not limited from manually predefined search space...
Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets : Abstract: In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous na...
TS-HINT: Enhancing Semiconductor Time Series Regression Using Attention Hints From Large Language Model Reasoning : Abstract: Existing data-driven methods rely on the extraction of static features from time series to approximate the material removal rate (MRR) of semiconductor manufacturing processes such as chemic...
Turbulence Regression : Abstract: Air turbulence refers to the disordered and irregular motion state generated by drastic changes in velocity, pressure, or direction during airflow. Various complex factors lead to intricate ...
GRASP: Graph Reasoning Agents for Systems Pharmacology with Human-in-the-Loop : Abstract: Quantitative Systems Pharmacology (QSP) modeling is essential for drug development but it requires significant time investment that limits the throughput of domain experts. We present \textb...
Credal and Interval Deep Evidential Classifications : Abstract: Uncertainty Quantification (UQ) presents a pivotal challenge in the field of Artificial Intelligence (AI), profoundly impacting decision-making, risk assessment and model reliability. In thi...
IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection : Abstract: Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Her...
SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection : Abstract: The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local n...
Wasserstein distance based semi-supervised manifold learning and application to GNSS multi-path detection : Abstract: The main objective of this study is to propose an optimal transport based semi-supervised approach to learn from scarce labelled image data using deep convolutional networks. The principle l...
Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning : Abstract: Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift...
Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales : Abstract: Several recently introduced deep learning optimizers utilizing matrix-level preconditioning have shown promising speedups relative to the current dominant optimizer AdamW, particularly in re...
Bounded Graph Clustering with Graph Neural Networks : Abstract: In community detection, many methods require the user to specify the number of clusters in advance since an exhaustive search over all possible values is computationally infeasible. While so...
Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs : Abstract: Large Language Models increasingly possess capabilities that carry dual-use risks. While data filtering has emerged as a pretraining-time mitigation, it faces significant challenges: labelin...
Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes : Abstract: Coronary artery disease (CAD) remains a major global health burden. Accurate identification of the culprit vessel and assessment of stenosis severity are essential for guiding individualized...
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images : Abstract: Vision-Language Models (VLMs) have advanced multimodal understanding, yet still struggle when targets are embedded in cluttered backgrounds requiring figure-ground segregation. To address th...
FlowEO: Generative Unsupervised Domain Adaptation for Earth Observation : Abstract: The increasing availability of Earth observation data offers unprecedented opportunities for large-scale environmental monitoring and analysis. However, these datasets are inherently heterog...
How to Tame Your LLM: Semantic Collapse in Continuous Systems : Abstract: We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve und...
Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques : Abstract: Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the compl...
Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning : Abstract: The growing exploration of Large Language Models (LLM) and Vision-Language Models (VLM) has opened avenues for enhancing the effectiveness of reinforcement learning (RL). However, existing L...
Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge : Abstract: Large language models (LLMs) have emerged as a powerful technology, and thus, we have seen widespread adoption and use on software engineering teams. Most often, LLMs are designed as "genera...
Fine-Tuning BERT for Domain-Specific Question Answering: Toward Educational NLP Resources at University Scale : Abstract: Prior work on scientific question answering has largely emphasized chatbot-style systems, with limited exploration of fine-tuning foundation models for domain-specific reasoning. In this stu...
Invariance Co-training for Robot Visual Generalization : Abstract: Reasoning from diverse observations is a fundamental capability for generalist robot policies to operate in a wide range of environments. Despite recent advancements, many large-scale roboti...
MAR-FL: A Communication Efficient Peer-to-Peer Federated Learning System : Abstract: The convergence of next-generation wireless systems and distributed Machine Learning (ML) demands Federated Learning (FL) methods that remain efficient and robust with wireless connected pee...
A Survey of Bugs in AI-Generated Code : Abstract: Developers are widely using AI code-generation models, aiming to increase productivity and efficiency. However, there are also quality concerns regarding the AI-generated code. The generated...
Learning to Code with Context: A Study-Based Approach : Abstract: The rapid emergence of generative AI tools is transforming the way software is developed. Consequently, software engineering education must adapt to ensure that students not only learn tradi...
Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective : Abstract: In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data. This scarcity introd...
XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots : Abstract: As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While ...
From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model : Abstract: Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets a...
Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification : Abstract: Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While t...
CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators : Abstract: Neural operator surrogates for time-dependent partial differential equations (PDEs) conventionally employ autoregressive prediction schemes, which accumulate error over long rollouts and req...
The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing? : Abstract: With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research cap...
WhatsCode: Large-Scale GenAI Deployment for Developer Efficiency at WhatsApp : Abstract: The deployment of AI-assisted development tools in compliance-relevant, large-scale industrial environments represents significant gaps in academic literature, despite growing industry adopt...
To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples : Abstract: Chain-of-thought (CoT) prompting combined with few-shot in-context learning (ICL) has unlocked significant reasoning capabilities in large language models (LLMs). However, ICL with CoT examp...
Robustness Test for AI Forecasting of Hurricane Florence Using FourCastNetv2 and Random Perturbations of the Initial Condition : Abstract: Understanding the robustness of a weather forecasting model with respect to input noise or different uncertainties is important in assessing its output reliability, particularly for extreme ...
LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning : Abstract: Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough info...
The Effect of Document Summarization on LLM-Based Relevance Judgments : Abstract: Relevance judgments are central to the evaluation of Information Retrieval (IR) systems, but obtaining them from human annotators is costly and time-consuming. Large Language Models (LLMs) h...
Interaction Tensor Shap : Abstract: Machine learning models have grown increasingly deep and high dimensional, making it difficult to understand how individual and combined features influence their predictions. While Shapley v...
SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling : Abstract: Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches...
Invisible Load: Uncovering the Challenges of Neurodivergent Women in Software Engineering : Abstract: Neurodivergent women in Software Engineering (SE) encounter distinctive challenges at the intersection of gender bias and neurological differences. To the best of our knowledge, no prior wor...
Text Rationalization for Robust Causal Effect Estimation : Abstract: Recent advances in natural language processing have enabled the increasing use of text data in causal inference, particularly for adjusting confounding factors in treatment effect estimation...
Please Don't Kill My Vibe: Empowering Agents with Data Flow Control : Abstract: The promise of Large Language Model (LLM) agents is to perform complex, stateful tasks. This promise is stunted by significant risks - policy violations, process corruption, and security fla...
China Regional 3km Downscaling Based on Residual Corrective Diffusion Model : Abstract: A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downsc...
Mitigating Self-Preference by Authorship Obfuscation : Abstract: Language models (LMs) judges are widely used to evaluate the quality of LM outputs. Despite many advantages, LM judges display concerning biases that can impair their integrity in evaluation...
Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation : Abstract: Objective: Machine learning (ML) models are increasingly used to generate electrical stimulation patterns in neuroprosthetic devices such as visual prostheses. While these models promise pre...
Generalization Beyond Benchmarks: Evaluating Learnable Protein-Ligand Scoring Functions on Unseen Targets : Abstract: As machine learning becomes increasingly central to molecular design, it is vital to ensure the reliability of learnable protein-ligand scoring functions on novel protein targets. While many...
Simulating Life Paths with Digital Twins: AI-Generated Future Selves Influence Decision-Making and Expand Human Choice : Abstract: Major life transitions demand high-stakes decisions, yet people often struggle to imagine how their future selves will live with the consequences. To support this limited capacity for mental...
Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction : Abstract: Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a...
A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems : Abstract: In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This resear...
Moving object detection from multi-depth images with an attention-enhanced CNN : Abstract: One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other so...
ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering : Abstract: Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowle...
Building Capacity for Artificial Intelligence in Africa: A Cross-Country Survey of Challenges and Governance Pathways : Abstract: Artificial intelligence (AI) is transforming education and the workforce, but access to AI learning opportunities in Africa remains uneven. With rapid demographic shifts and growing labour m...
IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models? : Abstract: Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in predict...
Parajudica: An RDF-Based Reasoner and Metamodel for Multi-Framework Context-Dependent Data Compliance Assessments : Abstract: Motivated by the challenges of implementing policy-based data access control (PBAC) under multiple simultaneously applicable compliance frameworks, we present Parajudica, an open, modular, a...
Knowing Your Uncertainty -- On the application of LLM in social sciences : Abstract: Large language models (LLMs) are rapidly being integrated into computational social science research, yet their blackboxed training and designed stochastic elements in inference pose unique ...
Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment : Abstract: Large Language Models (LLMs) are typically aligned with human values using preference data or predefined principles such as helpfulness, honesty, and harmlessness. However, as AI systems pro...
University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor system : Abstract: Many industrial sectors have been using of machine learning at inference mode on edge devices. Future directions show that training on edge devices is promising due to improvements in semico...
How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data : Abstract: Ensemble models often achieve higher accuracy than single learners, but their ability to maintain small generalization gaps is not always well understood. This study examines how ensembles b...
PERM EQ x GRAPH EQ: Equivariant Neural Networks for Quantum Molecular Learning : Abstract: In hierarchal order of molecular geometry, we compare the performances of Geometric Quantum Machine Learning models. Two molecular datasets are considered: the simplistic linear shaped LiH-m...
UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial Fusion : Abstract: Recently, Multi-Contrast MR Reconstruction (MCMR) has emerged as a hot research topic that leverages high-quality auxiliary modalities to reconstruct undersampled target modalities of intere...
Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction : Abstract: Accurately predicting music popularity is a critical challenge in the music industry, offering benefits to artists, producers, and streaming platforms. Prior research has largely focused on ...
Matching Ranks Over Probability Yields Truly Deep Safety Alignment : Abstract: A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by simply prefilling the assistant response with ...
User Negotiations of Authenticity, Ownership, and Governance on AI-Generated Video Platforms: Evidence from Sora : Abstract: As AI-generated video platforms rapidly advance, ethical challenges such as copyright infringement emerge. This study examines how users make sense of AI-generated videos on OpenAI's Sora by...
See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors : Abstract: Pixel-wise segmentation of laparoscopic scenes is essential for computer-assisted surgery but difficult to scale due to the high cost of dense annotations. We propose depth-guided surgical s...
On the Theoretical Foundation of Sparse Dictionary Learning in Mechanistic Interpretability : Abstract: As AI models achieve remarkable capabilities across diverse domains, understanding what representations they learn and how they process information has become increasingly important for both...
RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs : Abstract: Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a ...
Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models : Abstract: Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding st...
Improving Local Fidelity Through Sampling and Modeling Nonlinearity : Abstract: With the increasing complexity of black-box machine learning models and their adoption in high-stakes areas, it is critical to provide explanations for their predictions. Local Interpretable...
2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency : Abstract: Sequential identity consistency under precise transient attribute control remains a long-standing challenge in controllable visual storytelling. Existing datasets lack sufficient fidelity an...
A Comprehensive Framework for Automated Quality Control in the Automotive Industry : Abstract: This paper presents a cutting-edge robotic inspection solution designed to automate quality control in automotive manufacturing. The system integrates a pair of collaborative robots, each eq...
Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability : Abstract: Classical supervised learning evaluates models primarily via predictive risk on hold-out data. Such evaluations quantify how well a function behaves on a distribution, but they do not addres...
Grounded Multilingual Medical Reasoning for Question Answering with Large Language Models : Abstract: Large Language Models (LLMs) with reasoning capabilities have recently demonstrated strong potential in medical Question Answering (QA). Existing approaches are largely English-focused and p...
Feasibility of AI-Assisted Programming for End-User Development : Abstract: End-user development,where non-programmers create or adapt their own digital tools, can play a key role in driving digital transformation within organizations. Currently, low-code/no-code pl...
On Dynamic Programming Theory for Leader-Follower Stochastic Games : Abstract: Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yieldin...
InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem : Abstract: Recent approaches to controllable 4D video generation often rely on fine-tuning pre-trained Video Diffusion Models (VDMs). This dominant paradigm is computationally expensive, requiring larg...
Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods : Abstract: Retrieving case law is a time-consuming task predominantly carried out by querying databases. We provide a comparison of two models in three different settings for Czech Constitutional Court...
HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies : Abstract: The development of foundation models for embodied intelligence critically depends on access to large-scale, high-quality robot demonstration data. Recent approaches have sought to address th...
Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains : Abstract: We present a methodology for improving the accuracy of faithfulness evaluation in Large Language Models (LLMs). The proposed methodology is based on the combination of elementary faithfulnes...
Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning : Abstract: This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-g...
Big Tech-Funded AI Papers Have Higher Citation Impact, Greater Insularity, and Larger Recency Bias : Abstract: Over the past four decades, artificial intelligence (AI) research has flourished at the nexus of academia and industry. However, Big Tech companies have increasingly acquired the edge in com...
Efficient Text Classification with Conformal In-Context Learning : Abstract: Large Language Models (LLMs) demonstrate strong in-context learning abilities, yet their effectiveness in text classification depends heavily on prompt design and incurs substantial computat...
Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding : Abstract: Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant conte...
Mechanistic Interpretability of Antibody Language Models Using SAEs : Abstract: Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ Top...
3D Path Planning for Robot-assisted Vertebroplasty from Arbitrary Bi-plane X-ray via Differentiable Rendering : Abstract: Robotic systems are transforming image-guided interventions by enhancing accuracy and minimizing radiation exposure. A significant challenge in robotic assistance lies in surgical path plann...
Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling : Abstract: Vision-Language Models (VLMs) remain limited in spatial reasoning tasks that require multi-view understanding and embodied perspective shifts. Recent approaches such as MindJourney attempt t...
Approximation of Box Decomposition Algorithm for Fast Hypervolume-Based Multi-Objective Optimization : Abstract: Hypervolume (HV)-based Bayesian optimization (BO) is one of the standard approaches for multi-objective decision-making. However, the computational cost of optimizing the acquisition functio...
Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning : Abstract: This study focuses on event detection in optical fibers, specifically classifying six events using the Phase-OTDR system. A novel approach is introduced to enhance Phase-OTDR data analysis b...
NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation : Abstract: Autoregressive models are a promising alternative to diffusion-based models for 3D molecular structure generation. However, a key limitation is the assumption of a token order: while text ha...
Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework : Abstract: Medical question-answering (QA) systems can benefit from advances in large language models (LLMs), but directly applying LLMs to the clinical domain poses challenges such as maintaining fact...
Sparse Attention Post-Training for Mechanistic Interpretability : Abstract: We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objec...
Neural Coherence : Find higher performance to out-of-distribution tasks from few samples : Abstract: To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best ...
Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures : Abstract: Bug localization in multi-repository microservice architectures is challenging due to the semantic gap between natural language bug reports and code, LLM context limitations, and the need to...
World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty : Abstract: Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is ...
Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception : Abstract: Common approaches to explainable AI (XAI) for deep learning focus on analyzing the importance of input features on the classification task in a given model: saliency methods like SHAP and Gr...
Documenting SME Processes with Conversational AI: From Tacit Knowledge to BPMN : Abstract: Small and medium-sized enterprises (SMEs) still depend heavily on tacit, experience-based know-how that rarely makes its way into formal documentation. This paper introduces a large-language...
Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations : Abstract: Evaluating faithfulness of Large Language Models (LLMs) to a given task is a complex challenge. We propose two new unsupervised metrics for faithfulness evaluation using insights from inform...
Bridging Traditional Machine Learning and Large Language Models: A Two-Part Course Design for Modern AI Education : Abstract: This paper presents an innovative pedagogical approach for teaching artificial intelligence and data science that systematically bridges traditional machine learning techniques with modern L...
On the Computability of Artificial General Intelligence : Abstract: In recent years we observed rapid and significant advancements in artificial intelligence (A.I.). So much so that many wonder how close humanity is to developing an A.I. model that can achie...
Resolving Zadehs Paradox Axiomatic Possibility Theory as a Foundation for Reliable Artificial Intelligence : Abstract: This work advances and substantiates the thesis that the resolution of this crisis lies in the domain of possibility theory, specifically in the axiomatic approach developed in Bychkovs arti...
AI & Human Co-Improvement for Safer Co-Superintelligence : Abstract: Self-improvement is a goal currently exciting the field of AI, but is fraught with danger, and may take time to fully achieve. We advocate that a more achievable and better goal for humanity...
MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare : Abstract: Healthcare AI systems have historically faced challenges in merging contextual reasoning, long-term state management, and human-verifiable workflows into a cohesive framework. This paper int...
ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications : Abstract: While Large Language Models (LLMs) demonstrate immense potential for automating integrated circuit (IC) development, their practical deployment is fundamentally limited by restricted context...
BEAVER: An Efficient Deterministic LLM Verifier : Abstract: As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints...
The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems : Abstract: Large language models display a peculiar form of inconsistency: they "know" the correct answer but fail to act on it. In human philosophy, this tension between global judgment and local impu...
MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models : Abstract: Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robu...
CureAgent: A Training-Free Executor-Analyst Framework for Clinical Reasoning : Abstract: Current clinical agent built on small LLMs, such as TxAgent suffer from a \textit{Context Utilization Failure}, where models successfully retrieve biomedical evidence due to supervised finet...
Ontology Learning with LLMs: A Benchmark Study on Axiom Identification : Abstract: Ontologies are an important tool for structuring domain knowledge, but their development is a complex task that requires significant modelling and domain expertise. Ontology learning, aimed ...
Enhancing Local Search for MaxSAT with Deep Differentiation Clause Weighting : Abstract: Partial Maximum Satisfiability (PMS) and Weighted Partial Maximum Satisfiability (WPMS) generalize Maximum Satisfiability (MaxSAT), with broad real-world applications. Recent advances in Sto...
KANFormer for Predicting Fill Probabilities via Survival Analysis in Limit Order Books : Abstract: This paper introduces KANFormer, a novel deep-learning-based model for predicting the time-to-fill of limit orders by leveraging both market- and agent-level information. KANFormer combines ...
A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning : Abstract: The fast deployment of cognitive radar to counter jamming remains a critical challenge in modern warfare, where more efficient deployment leads to quicker detection of targets. Existing meth...
Evolutionary System 2 Reasoning: An Empirical Proof : Abstract: Machine intelligence marks the ultimate dream of making machines' intelligence comparable to human beings. While recent progress in Large Language Models (LLMs) show substantial specific ski...
The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics : Abstract: Influential critiques argue that Large Language Models (LLMs) are a dead end for AGI: "mere pattern matchers" structurally incapable of reasoning or planning. We argue this conclusion miside...
Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade Glioma : Abstract: Low-grade gliomas frequently present IDH1 mutations that define clinically distinct subgroups with specific prognostic and therapeutic implications. This work introduces a Multimodal Oncolog...
Using Large Language Models to Create Personalized Networks From Therapy Sessions : Abstract: Recent advances in psychotherapy have focused on treatment personalization, such as by selecting treatment modules based on personalized networks. However, estimating personalized networks t...
To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis : Abstract: How many mistakes do published AI papers contain? Peer-reviewed publications form the foundation upon which new research and knowledge are built. Errors that persist in the literature can pr...
PRiSM: An Agentic Multimodal Benchmark for Scientific Reasoning via Python-Grounded Evaluation : Abstract: Evaluating vision-language models (VLMs) in scientific domains like mathematics and physics poses unique challenges that go far beyond predicting final answers. These domains demand conceptu...
TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models : Abstract: Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent fail...
Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem : Abstract: Resource allocation remains NP-hard due to combinatorial complexity. While deep reinforcement learning (DRL) methods, such as the Rainbow Deep Q-Network (DQN), improve scalability through pr...
SymPyBench: A Dynamic Benchmark for Scientific Reasoning with Executable Python Code : Abstract: We introduce, a large-scale synthetic benchmark of 15,045 university-level physics problems (90/10% train/test split). Each problem is fully parameterized, supporting an effectively infinite...
EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search : Abstract: Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven informatio...
RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering : Abstract: In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text gen...
PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles : Abstract: PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Du...
SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model : Abstract: Video dubbing aims to generate high-fidelity speech that is precisely temporally aligned with the visual content. Existing methods still suffer from limitations in speech naturalness and aud...
GNSS Jammer Direction Finding in Dynamic Scenarios Using an Inertial-based Multi-Antenna System : Abstract: Jamming devices disrupt signals from the global navigation satellite system (GNSS) and pose a significant threat by compromising the reliability of accurate positioning. Consequently, the de...
AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance : Abstract: Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-c...
Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training : Abstract: Zero-Shot Super-Resolution Spatiotemporal Forecasting requires a deep learning model to be trained on low-resolution data and deployed for inference on high-resolution. Existing studies cons...

Research Sources: 292 | Generated: 12/8/2025