AI RESEARCH PAPERS & ACADEMIC SOURCES
- A Comparative Study on Synthetic Facial Data Generation Techniques for Face Recognition : Abstract: Facial recognition has become a widely used method for authentication and identification, with applications for secure access and locating missing persons. Its success is largely attributed ...
- Synset Signset Germany: a Synthetic Dataset for German Traffic Sign Recognition : Abstract: In this paper, we present a synthesis pipeline and dataset for training / testing data in the task of traffic sign recognition that combines the advantages of data-driven and analytical mode...
- EditThinker: Unlocking Iterative Reasoning for Any Image Editor : Abstract: Instruction-based image editing has emerged as a prominent research area, which, benefiting from image generation foundation models, have achieved high aesthetic quality, making instruction-...
- ARCAS: An Augmented Reality Collision Avoidance System with SLAM-Based Tracking for Enhancing VRU Safety : Abstract: Vulnerable road users (VRUs) face high collision risks in mixed traffic, yet most existing safety systems prioritize driver or vehicle assistance over direct VRU support. This paper presents...
- Toward Efficient and Robust Behavior Models for Multi-Agent Driving Simulation : Abstract: Scalable multi-agent driving simulation requires behavior models that are both realistic and computationally efficient. We address this by optimizing the behavior model that controls individ...
- Physically-Based Simulation of Automotive LiDAR : Abstract: We present an analytic model for simulating automotive time-of-flight (ToF) LiDAR that includes blooming, echo pulse width, and ambient light, along with steps to determine model parameters ...
- SIMPACT: Simulation-Enabled Action Planning using Vision-Language Models : Abstract: Vision-Language Models (VLMs) exhibit remarkable common-sense and semantic reasoning capabilities. However, they lack a grounded understanding of physical dynamics. This limitation arises fr...
- Multi-Scale Direction-Aware Network for Infrared Small Target Detection : Abstract: Infrared small target detection faces the problem that it is difficult to effectively separate the background and the target. Existing deep learning-based methods focus on edge and shape fea...
- iMotion-LLM: Instruction-Conditioned Trajectory Generation : Abstract: We introduce iMotion-LLM, a large language model (LLM) integrated with trajectory prediction modules for interactive motion generation. Unlike conventional approaches, it generates feasible,...
- PLANesT-3D: A new annotated dataset for segmentation of 3D plant point clouds : Abstract: Creation of new annotated public datasets is crucial in helping advances in 3D computer vision and machine learning meet their full potential for automatic interpretation of 3D plant models....
- Neural Eulerian Scene Flow Fields : Abstract: We reframe scene flow as the task of estimating a continuous space-time ODE that describes motion for an entire observation sequence, represented with a neural prior. Our method, EulerFlow, ...
- AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM : Abstract: Video anomaly detection (VAD) is crucial for video analysis and surveillance in computer vision. However, existing VAD models rely on learned normal patterns, which makes them difficult to a...
- Perspective-Invariant 3D Object Detection : Abstract: With the rise of robotics, LiDAR-based 3D object detection has garnered significant attention in both academia and industry. However, existing datasets and methods predominantly focus on veh...
- Enhancing Clinical Note Generation with ICD-10, Clinical Ontology Knowledge Graphs, and Chain-of-Thought Prompting Using GPT-4 : Abstract: In the past decade a surge in the amount of electronic health record (EHR) data in the United States, attributed to a favorable policy environment created by the Health Information Technolog...
- Transformer-Enabled Diachronic Analysis of Vedic Sanskrit: Neural Methods for Quantifying Types of Language Change : Abstract: This study demonstrates how hybrid neural-symbolic methods can yield significant new insights into the evolution of a morphologically rich, low-resource language. We challenge the naive assu...
- Learning from Self Critique and Refinement for Faithful LLM Summarization : Abstract: Large Language Models (LLMs) often suffer from hallucinations: output content that is not grounded in the input context, when performing long-form text generation tasks such as summarization...
- SQ-format: A Unified Sparse-Quantized Hardware-friendly Data Format for LLMs : Abstract: Post-training quantization (PTQ) plays a crucial role in the democratization of large language models (LLMs). However, existing low-bit quantization and sparsification techniques are difficu...
- LMSpell: Neural Spell Checking for Low-Resource Languages : Abstract: Spell correction is still a challenging problem for low-resource languages (LRLs). While pretrained language models (PLMs) have been employed for spell correction, their use is still limited...
- SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures : Abstract: Safeguard models help large language models (LLMs) detect and block harmful content, but most evaluations remain English-centric and overlook linguistic and cultural diversity. Existing mult...
- Automated Identification of Incidentalomas Requiring Follow-Up: A Multi-Anatomy Evaluation of LLM-Based and Supervised Approaches : Abstract: Objective: To evaluate large language models (LLMs) against supervised baselines for fine-grained, lesion-level detection of incidentalomas requiring follow-up, addressing the limitations of...
- Structured Reasoning with Tree-of-Thoughts for Bengali Math Word Problems : Abstract: Mathematical Word Problems (MWPs) are among the most challenging tasks in natural language processing because they require both linguistic understanding and multi-step numerical reasoning. W...
- A Greek Government Decisions Dataset for Public-Sector Analysis and Insight : Abstract: We introduce an open, machine-readable corpus of Greek government decisions sourced from the national transparency platform Diavgeia. The resource comprises 1 million decisions, featuring an...
- Interleaved Latent Visual Reasoning with Selective Perceptual Modeling : Abstract: Interleaved reasoning paradigms enhance Multimodal Large Language Models (MLLMs) with visual feedback but are hindered by the prohibitive computational cost of repeatedly re-encoding pixel-d...
- MedTutor-R1: Socratic Personalized Medical Teaching with Multi-Agent Simulation : Abstract: The significant gap between rising demands for clinical training and the scarcity of expert instruction poses a major challenge to medical education. With powerful capabilities in personaliz...
- Capturing Classic Authorial Style in Long-Form Story Generation with GRPO Fine-Tuning : Abstract: Recent advances in large language models (LLMs) show impressive performance in open-ended story generation, but fine-grained stylistic control remains limited. Existing methods often rely on...
- Heard or Halted? Gender, Interruptions, and Emotional Tone in U.S. Supreme Court Oral Arguments : Abstract: This study examines how interruptions during U.S. Supreme Court oral arguments shape both the semantic content and emotional tone of advocates' speech, with a focus on gendered dynamics in j...
- Prompting Science Report 4: Playing Pretend: Expert Personas Don't Improve Factual Accuracy : Abstract: This is the fourth in a series of short reports that help business, education, and policy leaders understand the technical details of working with AI through rigorous testing. Here, we ask w...
- Vague Knowledge: Information without Transitivity and Partitions : Abstract: I relax the standard assumptions of transitivity and partition structure in economic models of information to formalize vague knowledge: non-transitive indistinguishability over states. I sh...
- Self-Improving VLM Judges Without Human Annotations : Abstract: Effective judges of Vision-Language Models (VLMs) are crucial for model development. Current methods for training VLM judges mainly rely on large-scale human preference annotations. However,...
- TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows : Abstract: Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically b...
- EFDiT: Efficient Fine-grained Image Generation Using Diffusion Transformer Models : Abstract: Diffusion models are highly regarded for their controllability and the diversity of images they generate. However, class-conditional generation methods based on diffusion models often focus ...
- DEAR: Dataset for Evaluating the Aesthetics of RenderingDEAR: Dataset for Evaluating the Aesthetics of Rendering : Abstract: Traditional Image Quality Assessment~(IQA) focuses on quantifying technical degradations such as noise, blur, or compression artifacts, using both full-reference and no-reference objective m...
- IE2Video: Adapting Pretrained Diffusion Models for Event-Based Video Reconstruction : Abstract: Continuous video monitoring in surveillance, robotics, and wearable systems faces a fundamental power constraint: conventional RGB cameras consume substantial energy through fixed-rate captu...
- Age-Inclusive 3D Human Mesh Recovery for Action-Preserving Data Anonymization : Abstract: While three-dimensional (3D) shape and pose estimation is a highly researched area that has yielded significant advances, the resulting methods, despite performing well for the adult populat...
- CARD: Correlation Aware Restoration with Diffusion : Abstract: Denoising diffusion models have achieved state-of-the-art performance in image restoration by modeling the process as sequential denoising steps. However, most approaches assume independent ...
- Inferring Compositional 4D Scenes without Ever Seeing One : Abstract: Scenes in the real world are often composed of several static and dynamic objects. Capturing their 4-dimensional structures, composition and spatio-temporal configuration in-the-wild, though...
- SplatPainter: Interactive Authoring of 3D Gaussians from 2D Edits via Test-Time Training : Abstract: The rise of 3D Gaussian Splatting has revolutionized photorealistic 3D asset creation, yet a critical gap remains for their interactive refinement and editing. Existing approaches based on d...
- Group Orthogonal Low-Rank Adaptation for RGB-T Tracking : Abstract: Parameter-efficient fine-tuning has emerged as a promising paradigm in RGB-T tracking, enabling downstream task adaptation by freezing pretrained parameters and fine-tuning only a small set ...
- ShaRP: SHAllow-LayeR Pruning for Video Large Language Models Acceleration : Abstract: Video Large Language Models (VLLMs) face the challenge of high computational load during the pre-filling stage due to the processing of an enormous number of visual tokens. Although attentio...
- LoC-Path: Learning to Compress for Pathology Multimodal Large Language Models : Abstract: Whole Slide Image (WSI) understanding is fundamentally challenging due to its gigapixel scale and the extreme sparsity of diagnostically relevant regions. Unlike human experts who primarily ...
- Delving into Latent Spectral Biasing of Video VAEs for Superior Diffusability : Abstract: Latent diffusion models pair VAEs with diffusion backbones, and the structure of VAE latents strongly influences the difficulty of diffusion training. However, existing video VAEs typically ...
- The Dynamic Prior: Understanding 3D Structures for Casual Dynamic Videos : Abstract: Estimating accurate camera poses, 3D scene geometry, and object motion from in-the-wild videos is a long-standing challenge for classical structure from motion pipelines due to the presence ...
- Genetic Algorithms For Parameter Optimization for Disparity Map Generation of Radiata Pine Branch Images : Abstract: Traditional stereo matching algorithms like Semi-Global Block Matching (SGBM) with Weighted Least Squares (WLS) filtering offer speed advantages over neural networks for UAV applications, ge...
- YOLO and SGBM Integration for Autonomous Tree Branch Detection and Depth Estimation in Radiata Pine Pruning Applications : Abstract: Manual pruning of radiata pine trees poses significant safety risks due to extreme working heights and challenging terrain. This paper presents a computer vision framework that integrates YO...
- Performance Evaluation of Deep Learning for Tree Branch Segmentation in Autonomous Forestry Systems : Abstract: UAV-based autonomous forestry operations require rapid and precise tree branch segmentation for safe navigation and automated pruning across varying pixel resolutions and operational conditi...
- ParaUni: Enhance Generation in Unified Multimodal Model with Reinforcement-driven Hierarchical Parallel Information Interaction : Abstract: Unified multimodal models significantly improve visual generation by combining vision-language models (VLMs) with diffusion models. However, existing methods struggle to fully balance suffic...
- TED-4DGS: Temporally Activated and Embedding-based Deformation for 4DGS Compression : Abstract: Building on the success of 3D Gaussian Splatting (3DGS) in static 3D scene representation, its extension to dynamic scenes, commonly referred to as 4DGS or dynamic 3DGS, has attracted increa...
- EmoStyle: Emotion-Driven Image Stylization : Abstract: Art has long been a profound medium for expressing emotions. While existing image stylization methods effectively transform visual appearance, they often overlook the emotional impact carrie...
- Concept-based Explainable Data Mining with VLM for 3D Detection : Abstract: Rare-object detection remains a challenging task in autonomous driving systems, particularly when relying solely on point cloud data. Although Vision-Language Models (VLMs) exhibit strong ca...
- WaterWave: Bridging Underwater Image Enhancement into Video Streams via Wavelet-based Temporal Consistency Field : Abstract: Underwater video pairs are fairly difficult to obtain due to the complex underwater imaging. In this case, most existing video underwater enhancement methods are performed by directly applyi...
- Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation : Abstract: To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specif...
- Rethinking Infrared Small Target Detection: A Foundation-Driven Efficient Paradigm : Abstract: While large-scale visual foundation models (VFMs) exhibit strong generalization across diverse visual domains, their potential for single-frame infrared small target (SIRST) detection remain...
- Know-Show: Benchmarking Video-Language Models on Spatio-Temporal Grounded Reasoning : Abstract: Large Video-Language Models (Video-LMs) have achieved impressive progress in multimodal understanding, yet their reasoning remains weakly grounded in space and time. We present Know-Show, a ...
- VOST-SGG: VLM-Aided One-Stage Spatio-Temporal Scene Graph Generation : Abstract: Spatio-temporal scene graph generation (ST-SGG) aims to model objects and their evolving relationships across video frames, enabling interpretable representations for downstream reasoning ta...
- Ideal Observer for Segmentation of Dead Leaves Images : Abstract: The human visual environment is comprised of different surfaces that are distributed in space. The parts of a scene that are visible at any one time are governed by the occlusion of overlapp...
- ProPhy: Progressive Physical Alignment for Dynamic World Simulation : Abstract: Recent advances in video generation have shown remarkable potential for constructing world simulators. However, current models still struggle to produce physically consistent results, partic...
- MedDIFT: Multi-Scale Diffusion-Based Correspondence in 3D Medical Imaging : Abstract: Accurate spatial correspondence between medical images is essential for longitudinal analysis, lesion tracking, and image-guided interventions. Medical image registration methods rely on loc...
- Learning High-Fidelity Cloth Animation via Skinning-Free Image Transfer : Abstract: We present a novel method for generating 3D garment deformations from given body poses, which is key to a wide range of applications, including virtual try-on and extended reality. To simpli...
- Fast SceneScript: Accurate and Efficient Structured Language Model via Multi-Token Prediction : Abstract: Recent perception-generalist approaches based on language models have achieved state-of-the-art results across diverse tasks, including 3D scene layout estimation, via unified architecture a...
- NormalView: sensor-agnostic tree species classification from backpack and aerial lidar data using geometric projections : Abstract: Laser scanning has proven to be an invaluable tool in assessing the decomposition of forest environments. Mobile laser scanning (MLS) has shown to be highly promising for extremely accurate,...
- DistillFSS: Synthesizing Few-Shot Knowledge into a Lightweight Segmentation Model : Abstract: Cross-Domain Few-Shot Semantic Segmentation (CD-FSS) seeks to segment unknown classes in unseen domains using only a few annotated examples. This setting is inherently challenging: source an...
- Experts-Guided Unbalanced Optimal Transport for ISP Learning from Unpaired and/or Paired Data : Abstract: Learned Image Signal Processing (ISP) pipelines offer powerful end-to-end performance but are critically dependent on large-scale paired raw-to-sRGB datasets. This reliance on costly-to-acqu...
- Self-Supervised AI-Generated Image Detection: A Camera Metadata Perspective : Abstract: The proliferation of AI-generated imagery poses escalating challenges for multimedia forensics, yet many existing detectors depend on assumptions about the internals of specific generative m...
- LeAD-M3D: Leveraging Asymmetric Distillation for Real-time Monocular 3D Detection : Abstract: Real-time monocular 3D object detection remains challenging due to severe depth ambiguity, viewpoint shifts, and the high computational cost of 3D reasoning. Existing approaches either rely ...
- Deep Learning-Based Real-Time Sequential Facial Expression Analysis Using Geometric Features : Abstract: Facial expression recognition is a crucial component in enhancing human-computer interaction and developing emotion-aware systems. Real-time detection and interpretation of facial expression...
- Hyperspectral Unmixing with 3D Convolutional Sparse Coding and Projected Simplex Volume Maximization : Abstract: Hyperspectral unmixing (HSU) aims to separate each pixel into its constituent endmembers and estimate their corresponding abundance fractions. This work presents an algorithm-unrolling-based...
- Physics-Informed Graph Neural Network with Frequency-Aware Learning for Optical Aberration Correction : Abstract: Optical aberrations significantly degrade image quality in microscopy, particularly when imaging deeper into samples. These aberrations arise from distortions in the optical wavefront and ca...
- OWL: Unsupervised 3D Object Detection by Occupancy Guided Warm-up and Large Model Priors Reasoning : Abstract: Unsupervised 3D object detection leverages heuristic algorithms to discover potential objects, offering a promising route to reduce annotation costs in autonomous driving. Existing approache...
- Manifold-Aware Point Cloud Completion via Geodesic-Attentive Hierarchical Feature Learning : Abstract: Point cloud completion seeks to recover geometrically consistent shapes from partial or sparse 3D observations. Although recent methods have achieved reasonable global shape reconstruction, ...
- Distilling Expert Surgical Knowledge: How to train local surgical VLMs for anatomy explanation in Complete Mesocolic Excision : Abstract: Recently, Vision Large Language Models (VLMs) have demonstrated high potential in computer-aided diagnosis and decision-support. However, current VLMs show deficits in domain specific surgic...
- HQ-DM: Single Hadamard Transformation-Based Quantization-Aware Training for Low-Bit Diffusion Models : Abstract: Diffusion models have demonstrated significant applications in the field of image generation. However, their high computational and memory costs pose challenges for deployment. Model quantiz...
- USV: Unified Sparsification for Accelerating Video Diffusion Models : Abstract: The scalability of high-fidelity video diffusion models (VDMs) is constrained by two key sources of redundancy: the quadratic complexity of global spatio-temporal attention and the computati...
- Label-Efficient Point Cloud Segmentation with Active Learning : Abstract: Semantic segmentation of 3D point cloud data often comes with high annotation costs. Active learning automates the process of selecting which data to annotate, reducing the total amount of a...
- FNOPT: Resolution-Agnostic, Self-Supervised Cloth Simulation using Meta-Optimization with Fourier Neural Operators : Abstract: We present FNOpt, a self-supervised cloth simulation framework that formulates time integration as an optimization problem and trains a resolution-agnostic neural optimizer parameterized by ...
- Bring Your Dreams to Life: Continual Text-to-Video Customization : Abstract: Customized text-to-video generation (CTVG) has recently witnessed great progress in generating tailored videos from user-specific text. However, most CTVG methods assume that personalized co...
- UG-FedDA: Uncertainty-Guided Federated Domain Adaptation for Multi-Center Alzheimer's Disease Detection : Abstract: Alzheimer's disease (AD) is an irreversible neurodegenerative disorder, and early diagnosis is critical for timely intervention. However, most existing classification frameworks face challen...
- VRSA: Jailbreaking Multimodal Large Language Models through Visual Reasoning Sequential Attack : Abstract: Multimodal Large Language Models (MLLMs) are widely used in various fields due to their powerful cross-modal comprehension and generation capabilities. However, more modalities bring more vu...
- Edit-aware RAW Reconstruction : Abstract: Users frequently edit camera images post-capture to achieve their preferred photofinishing style. While editing in the RAW domain provides greater accuracy and flexibility, most edits are pe...
- Underwater Image Reconstruction Using a Swin Transformer-Based Generator and PatchGAN Discriminator : Abstract: Underwater imaging is essential for marine exploration, environmental monitoring, and infrastructure inspection. However, water causes severe image degradation through wavelength-dependent a...
- SCAIL: Towards Studio-Grade Character Animation via In-Context Learning of 3D-Consistent Pose Representations : Abstract: Achieving character animation that meets studio-grade production standards remains challenging despite recent progress. Existing approaches can transfer motion from a driving video to a refe...
- LPD: Learnable Prototypes with Diversity Regularization for Weakly Supervised Histopathology Segmentation : Abstract: Weakly supervised semantic segmentation (WSSS) in histopathology reduces pixel-level labeling by learning from image-level labels, but it is hindered by inter-class homogeneity, intra-class ...
- LDLT $\mathcal{L}$-Lipschitz Network: Generalized Deep End-To-End Lipschitz Network Construction : Abstract: Deep residual networks (ResNets) have demonstrated outstanding success in computer vision tasks, attributed to their ability to maintain gradient flow through deep architectures. Simultaneou...
- KQ-SVD: Compressing the KV Cache with Provable Guarantees on Attention Fidelity : Abstract: The Key-Value (KV) cache is central to the efficiency of transformer-based large language models (LLMs), storing previously computed vectors to accelerate inference. Yet, as sequence length ...
- On the Bayes Inconsistency of Disagreement Discrepancy Surrogates : Abstract: Deep neural networks often fail when deployed in real-world contexts due to distribution shift, a critical barrier to building safe and reliable systems. An emerging approach to address this...
- Developing synthetic microdata through machine learning for firm-level business surveys : Abstract: Public-use microdata samples (PUMS) from the United States (US) Census Bureau on individuals have been available for decades. However, large increases in computing power and the greater avai...
- Bayesian Optimization and Convolutional Neural Networks for Zernike-Based Wavefront Correction in High Harmonic Generation : Abstract: High harmonic generation (HHG) is a nonlinear process that enables table-top generation of tunable, high-energy, coherent, ultrashort radiation pulses in the extreme ultraviolet (EUV) to sof...
- InvarDiff: Cross-Scale Invariance Caching for Accelerated Diffusion Models : Abstract: Diffusion models deliver high-fidelity synthesis but remain slow due to iterative sampling. We empirically observe there exists feature invariance in deterministic sampling, and present Inva...
- Spatiotemporal Satellite Image Downscaling with Transfer Encoders and Autoregressive Generative Models : Abstract: We present a transfer-learning generative downscaling framework to reconstruct fine resolution satellite images from coarse scale inputs. Our approach combines a lightweight U-Net transfer e...
- Continuous-Time Homeostatic Dynamics for Reentrant Inference Models : Abstract: We formulate the Fast-Weights Homeostatic Reentry Network (FHRN) as a continuous-time neural-ODE system, revealing its role as a norm-regulated reentrant dynamical process. Starting from the...
- Your Latent Mask is Wrong: Pixel-Equivalent Latent Compositing for Diffusion Models : Abstract: Latent inpainting in diffusion models still relies almost universally on linearly interpolating VAE latents under a downsampled mask. We propose a key principle for compositing image latents...
- Hierarchical Reinforcement Learning for the Dynamic VNE with Alternatives Problem : Abstract: Virtual Network Embedding (VNE) is a key enabler of network slicing, yet most formulations assume that each Virtual Network Request (VNR) has a fixed topology. Recently, VNE with Alternative...
- STAR-GO: Improving Protein Function Prediction by Learning to Hierarchically Integrate Ontology-Informed Semantic Embeddings : Abstract: Accurate prediction of protein function is essential for elucidating molecular mechanisms and advancing biological and therapeutic discovery. Yet experimental annotation lags far behind the ...
- One-Step Diffusion Samplers via Self-Distillation and Deterministic Flow : Abstract: Sampling from unnormalized target distributions is a fundamental yet challenging task in machine learning and statistics. Existing sampling algorithms typically require many iterative steps ...
- Robust forecast aggregation via additional queries : Abstract: We study the problem of robust forecast aggregation: combining expert forecasts with provable accuracy guarantees compared to the best possible aggregation of the underlying information. Pri...
- Exposing Pink Slime Journalism: Linguistic Signatures and Robust Detection Against LLM-Generated Threats : Abstract: The local news landscape, a vital source of reliable information for 28 million Americans, faces a growing threat from Pink Slime Journalism, a low-quality, auto-generated articles that mimi...
- Symmetric Linear Dynamical Systems are Learnable from Few Observations : Abstract: We consider the problem of learning the parameters of a $N$-dimensional stochastic linear dynamics under both full and partial observations from a single trajectory of time $T$. We introduce...
- FieldSeer I: Physics-Guided World Models for Long-Horizon Electromagnetic Dynamics under Partial Observability : Abstract: We introduce FieldSeer I, a geometry-aware world model that forecasts electromagnetic field dynamics from partial observations in 2-D TE waveguides. The model assimilates a short prefix of o...
- PoolNet: Deep Learning for 2D to 3D Video Process Validation : Abstract: Lifting Structure-from-Motion (SfM) information from sequential and non-sequential image data is a time-consuming and computationally expensive task. In addition to this, the majority of pub...
- EXR: An Interactive Immersive EHR Visualization in Extended Reality : Abstract: This paper presents the design and implementation of an Extended Reality (XR) platform for immersive, interactive visualization of Electronic Health Records (EHRs). The system extends beyond...
- Do We Really Even Need Data? A Modern Look at Drawing Inference with Predicted Data : Abstract: As artificial intelligence and machine learning tools become more accessible, and scientists face new obstacles to data collection (e.g., rising costs, declining survey response rates), rese...
- Model Gateway: Model Management Platform for Model-Driven Drug Discovery : Abstract: This paper presents the Model Gateway, a management platform for managing machine learning (ML) and scientific computational models in the drug discovery pipeline. The platform supports Larg...
- SSDLabeler: Realistic semi-synthetic data generation for multi-label artifact classification in EEG : Abstract: EEG recordings are inherently contaminated by artifacts such as ocular, muscular, and environmental noise, which obscure neural activity and complicate preprocessing. Artifact classification...
- DashFusion: Dual-stream Alignment with Hierarchical Bottleneck Fusion for Multimodal Sentiment Analysis : Abstract: Multimodal sentiment analysis (MSA) integrates various modalities, such as text, image, and audio, to provide a more comprehensive understanding of sentiment. However, effective MSA is chall...
- Poodle: Seamlessly Scaling Down Large Language Models with Just-in-Time Model Replacement : Abstract: Businesses increasingly rely on large language models (LLMs) to automate simple repetitive tasks instead of developing custom machine learning models. LLMs require few, if any, training exam...
- Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening : Abstract: Art has long played a profound role in shaping human emotion, cognition, and behavior. While visual arts such as painting and architecture have been studied through eye tracking, revealing d...
- Design-marginal calibration of Gaussian process predictive distributions: Bayesian and conformal approaches : Abstract: We study the calibration of Gaussian process (GP) predictive distributions in the interpolation setting from a design-marginal perspective. Conditioning on the data and averaging over a desi...
- Over-the-Air Semantic Alignment with Stacked Intelligent Metasurfaces : Abstract: Semantic communication systems aim to transmit task-relevant information between devices capable of artificial intelligence, but their performance can degrade when heterogeneous transmitter-...
- Comparing the latent features of universal machine-learning interatomic potentials : Abstract: The past few years have seen the development of ``universal'' machine-learning interatomic potentials (uMLIPs) capable of approximating the ground-state potential energy surface across a wid...
- Curvature-Regularized Variational Autoencoder for 3D Scene Reconstruction from Sparse Depth : Abstract: When depth sensors provide only 5% of needed measurements, reconstructing complete 3D scenes becomes difficult. Autonomous vehicles and robots cannot tolerate the geometric errors that spars...
- Machine-learning-enabled interpretation of tribological deformation patterns in large-scale MD data : Abstract: Molecular dynamics (MD) simulations have become indispensable for exploring tribological deformation patterns at the atomic scale. However, transforming the resulting high-dimensional data i...
- Bootstrapping Fuzzers for Compilers of Low-Resource Language Dialects Using Language Models : Abstract: Modern extensible compiler frameworks-such as MLIR-enable rapid creation of domain-specific language dialects. This flexibility, however, makes correctness harder to ensure as the same exten...
- NICE: Neural Implicit Craniofacial Model for Orthognathic Surgery Prediction : Abstract: Orthognathic surgery is a crucial intervention for correcting dentofacial skeletal deformities to enhance occlusal functionality and facial aesthetics. Accurate postoperative facial appearan...
- BalLOT: Balanced $k$-means clustering with optimal transport : Abstract: We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it...
- Designing an Optimal Sensor Network via Minimizing Information Loss : Abstract: Optimal experimental design is a classic topic in statistics, with many well-studied problems, applications, and solutions. The design problem we study is the placement of sensors to monitor...
- Consequences of Kernel Regularity for Bandit Optimization : Abstract: In this work we investigate the relationship between kernel regularity and algorithmic performance in the bandit optimization of RKHS functions. While reproducing kernel Hilbert space (RKHS)...
- Statistical Guarantees for Approximate Stationary Points of Shallow Neural Networks : Abstract: Since statistical guarantees for neural networks are usually restricted to global optima of intricate objective functions, it is unclear whether these theories explain the performances of ac...
- SPARTAN: A Sparse Transformer World Model Attending to What Matters : Abstract: Capturing the interactions between entities in a structured way plays a central role in world models that flexibly adapt to changes in the environment. Recent works motivate the benefits of ...
- Second Maximum of a Gaussian Random Field and Exact (t-)Spacing test : Abstract: In this article, we introduce the novel concept of the second maximum of a Gaussian random field on a Riemannian submanifold. This second maximum serves as a powerful tool for characterizing...
- Semantic Communication and Control Co-Design for Multi-Objective Distinct Dynamics : Abstract: This letter introduces a machine-learning approach to learning the semantic dynamics of correlated systems with different control rules and dynamics. By leveraging the Koopman operator in an...
- Operator learning meets inverse problems: A probabilistic perspective : Abstract: Operator learning offers a robust framework for approximating mappings between infinite-dimensional function spaces. It has also become a powerful tool for solving inverse problems in the co...
- Unveiling Affective Polarization Trends in Parliamentary Proceedings : Abstract: Recent years have seen an increase in polarized discourse worldwide, on various platforms. We propose a novel method for quantifying polarization, based on the emotional style of the discour...
- Decoding the Black Box: Discerning AI Rhetorics About and Through Poetic Prompting : Abstract: Prompt engineering has emerged as a useful way studying the algorithmic tendencies and biases of large language models. Meanwhile creatives and academics have leveraged LLMs to develop creat...
- Meta-Learning Multi-armed Bandits for Beam Tracking in 5G and 6G Networks : Abstract: Beamforming-capable antenna arrays with many elements enable higher data rates in next generation 5G and 6G networks. In current practice, analog beamforming uses a codebook of pre-configure...
- BERTO: an Adaptive BERT-based Network Time Series Predictor with Operator Preferences in Natural Language : Abstract: We introduce BERTO, a BERT-based framework for traffic prediction and energy optimization in cellular networks. Built on transformer architectures, BERTO delivers high prediction accuracy, w...
- Teaching Language Models Mechanistic Explainability Through Arrow-Pushing : Abstract: Chemical reaction mechanisms provide crucial insight into synthesizability, yet current Computer-Assisted Synthesis Planning (CASP) systems lack mechanistic grounding. We introduce a computa...
- Towards agent-based-model informed neural networks : Abstract: In this article, we present a framework for designing neural networks that remain consistent with the underlying principles of agent-based models. We begin by highlighting the limitations of...
- Learnability Window in Gated Recurrent Neural Networks : Abstract: We develop a theoretical framework that explains how gating mechanisms determine the learnability window $\mathcal{H}_N$ of recurrent neural networks, defined as the largest temporal horizon...
- Utility Boundary of Dataset Distillation: Scaling and Configuration-Coverage Laws : Abstract: Dataset distillation (DD) aims to construct compact synthetic datasets that allow models to achieve comparable performance to full-data training while substantially reducing storage and comp...
- Predicting Price Movements in High-Frequency Financial Data with Spiking Neural Networks : Abstract: Modern high-frequency trading (HFT) environments are characterized by sudden price spikes that present both risk and opportunity, but conventional financial models often fail to capture the ...
- Computational Design of Low-Volatility Lubricants for Space Using Interpretable Machine Learning : Abstract: The function and lifetime of moving mechanical assemblies (MMAs) in space depend on the properties of lubricants. MMAs that experience high speeds or high cycles require liquid based lubrica...
- DAE-HardNet: A Physics Constrained Neural Network Enforcing Differential-Algebraic Hard Constraints : Abstract: Traditional physics-informed neural networks (PINNs) do not always satisfy physics based constraints, especially when the constraints include differential operators. Rather, they minimize th...
- NeuroMemFPP: A recurrent neural approach for memory-aware parameter estimation in fractional Poisson process : Abstract: In this paper, we propose a recurrent neural network (RNN)-based framework for estimating the parameters of the fractional Poisson process (FPP), which models event arrivals with memory and ...
- Zoom in, Click out: Unlocking and Evaluating the Potential of Zooming for GUI Grounding : Abstract: Grounding is a fundamental capability for building graphical user interface (GUI) agents. Although existing approaches rely on large-scale bounding box supervision, they still face various c...
- Impugan: Learning Conditional Generative Models for Robust Data Imputation : Abstract: Incomplete data are common in real-world applications. Sensors fail, records are inconsistent, and datasets collected from different sources often differ in scale, sampling rate, and quality...
- Trusted AI Agents in the Cloud : Abstract: AI agents powered by large language models are increasingly deployed as cloud services that autonomously access sensitive data, invoke external tools, and interact with other agents. However...
- MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution : Abstract: Generative search engines based on large language models (LLMs) are replacing traditional search, fundamentally changing how information providers are compensated. To sustain this ecosystem,...
- M4-RAG: A Massive-Scale Multilingual Multi-Cultural Multimodal RAG : Abstract: Vision-language models (VLMs) have achieved strong performance in visual question answering (VQA), yet they remain constrained by static training data. Retrieval-Augmented Generation (RAG) m...
- AQUA-Net: Adaptive Frequency Fusion and Illumination Aware Network for Underwater Image Enhancement : Abstract: Underwater images often suffer from severe color distortion, low contrast, and a hazy appearance due to wavelength-dependent light absorption and scattering. Simultaneously, existing deep le...
- Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity : Abstract: Reinforcement Learning (RL) has become the de facto standard for tuning LLMs to solve tasks involving reasoning. However, growing evidence shows that models trained in such way often suffer ...
- Training-Time Action Conditioning for Efficient Real-Time Chunking : Abstract: Real-time chunking (RTC) enables vision-language-action models (VLAs) to generate smooth, reactive robot trajectories by asynchronously predicting action chunks and conditioning on previousl...
- Enhancing Retrieval-Augmented Generation with Entity Linking for Educational Platforms : Abstract: In the era of Large Language Models (LLMs), Retrieval-Augmented Generation (RAG) architectures are gaining significant attention for their ability to ground language generation in reliable k...
- Rolling in the deep of cognitive and AI biases : Abstract: Nowadays, we delegate many of our decisions to Artificial Intelligence (AI) that acts either in solo or as a human companion in decisions made to support several sensitive domains, like heal...
- Debate over Mixed-knowledge: A Robust Multi-Agent Reasoning Framework for Incomplete Knowledge Graph Question Answering : Abstract: Knowledge Graph Question Answering (KGQA) aims to improve factual accuracy by leveraging structured knowledge. However, real-world Knowledge Graphs (KGs) are often incomplete, leading to the...
- ToolMind Technical Report: A Large-Scale, Reasoning-Enhanced Tool-Use Dataset : Abstract: Large Language Model (LLM) agents have developed rapidly in recent years to solve complex real-world problems using external tools. However, the scarcity of high-quality trajectories still h...
- GTM: Simulating the World of Tools for AI Agents : Abstract: The integration of external tools is pivotal for empowering Large Language Model (LLM) agents with real-world capabilities. However, training these agents through direct, continuous interact...
- Towards Data-efficient Customer Intent Recognition with Prompt-based Learning Paradigm : Abstract: Recognizing customer intent accurately with language models based on customer-agent conversational data is essential in today's digital customer service marketplace, but it is often hindered...
- A Scene-aware Models Adaptation Scheme for Cross-scene Online Inference on Mobile Devices : Abstract: Emerging Artificial Intelligence of Things (AIoT) applications desire online prediction using deep neural network (DNN) models on mobile devices. However, due to the movement of devices, unf...
- Variational Learning of Gaussian Process Latent Variable Models through Stochastic Gradient Annealed Importance Sampling : Abstract: Gaussian Process Latent Variable Models (GPLVMs) have become increasingly popular for unsupervised tasks such as dimensionality reduction and missing data recovery due to their flexibility a...
- Detecting the Future: All-at-Once Event Sequence Forecasting with Horizon Matching : Abstract: Long-horizon events forecasting is a crucial task across various domains, including retail, finance, healthcare, and social networks. Traditional models for event sequences often extend to f...
- Image-Guided Semantic Pseudo-LiDAR Point Generation for 3D Object Detection : Abstract: In autonomous driving scenarios, accurate perception is becoming an even more critical task for safe navigation. While LiDAR provides precise spatial data, its inherent sparsity makes it dif...
- Edge-Only Universal Adversarial Attacks in Distributed Learning : Abstract: Distributed learning frameworks, which partition neural network models across multiple computing nodes, enhance efficiency in collaborative edge-cloud systems, but may also introduce new vul...
- Coefficient of Variation Masking: A Volatility-Aware Strategy for EHR Foundation Models : Abstract: Masked autoencoders (MAEs) are increasingly applied to electronic health records (EHR) for learning general-purpose representations that support diverse clinical tasks. However, existing app...
- Rethinking Tokenization for Clinical Time Series: When Less is More : Abstract: Tokenization strategies shape how models process electronic health records, yet fair comparisons of their effectiveness remain limited. We present a systematic evaluation of tokenization app...
- Mitigating the Antigenic Data Bottleneck: Semi-supervised Learning with Protein Language Models for Influenza A Surveillance : Abstract: Influenza A viruses (IAVs) evolve antigenically at a pace that requires frequent vaccine updates, yet the haemagglutination inhibition (HI) assays used to quantify antigenicity are labor-int...
- Variance Matters: Improving Domain Adaptation via Stratified Sampling : Abstract: Domain shift remains a key challenge in deploying machine learning models to the real world. Unsupervised domain adaptation (UDA) aims to address this by minimising domain discrepancy during...
- Edged Weisfeiler-Lehman Algorithm : Abstract: As a classical approach on graph learning, the propagation-aggregation methodology is widely exploited by many of Graph Neural Networks (GNNs), wherein the representation of a node is update...
- Bridging quantum and classical computing for partial differential equations through multifidelity machine learning : Abstract: Quantum algorithms for partial differential equations (PDEs) face severe practical constraints on near-term hardware: limited qubit counts restrict spatial resolution to coarse grids, while ...
- When unlearning is free: leveraging low influence points to reduce computational costs : Abstract: As concerns around data privacy in machine learning grow, the ability to unlearn, or remove, specific data points from trained models becomes increasingly important. While state of the art u...
- DMAGT: Unveiling miRNA-Drug Associations by Integrating SMILES and RNA Sequence Structures through Graph Transformer Models : Abstract: MiRNAs, due to their role in gene regulation, have paved a new pathway for pharmacology, focusing on drug development that targets miRNAs. However, traditional wet lab experiments are limite...
- Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces : Abstract: Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training...
- Uncertainty Quantification for Scientific Machine Learning using Sparse Variational Gaussian Process Kolmogorov-Arnold Networks (SVGP KAN) : Abstract: Kolmogorov-Arnold Networks have emerged as interpretable alternatives to traditional multi-layer perceptrons. However, standard implementations lack principled uncertainty quantification cap...
- Enhancing Deep Deterministic Policy Gradients on Continuous Control Tasks with Decoupled Prioritized Experience Replay : Abstract: Background: Deep Deterministic Policy Gradient-based reinforcement learning algorithms utilize Actor-Critic architectures, where both networks are typically trained using identical batches o...
- Non-Convex Federated Optimization under Cost-Aware Client Selection : Abstract: Different federated optimization algorithms typically employ distinct client-selection strategies: some methods communicate only with a randomly sampled subset of clients at each round, whil...
- PathFinder: MCTS and LLM Feedback-based Path Selection for Multi-Hop Question Answering : Abstract: Multi-hop question answering is a challenging task in which language models must reason over multiple steps to reach the correct answer. With the help of Large Language Models and their reas...
- Taxonomy-Adaptive Moderation Model with Robust Guardrails for Large Language Models : Abstract: Large Language Models (LLMs) are typically aligned for safety during the post-training phase; however, they may still generate inappropriate outputs that could potentially pose risks to user...
- When Forgetting Builds Reliability: LLM Unlearning for Reliable Hardware Code Generation : Abstract: Large Language Models (LLMs) have shown strong potential in accelerating digital hardware design through automated code generation. Yet, ensuring their reliability remains a critical challen...
- Enhancing Dimensionality Prediction in Hybrid Metal Halides via Feature Engineering and Class-Imbalance Mitigation : Abstract: We present a machine learning framework for predicting the structural dimensionality of hybrid metal halides (HMHs), including organic-inorganic perovskites, using a combination of chemicall...
- RevoNAD: Reflective Evolutionary Exploration for Neural Architecture Design : Abstract: Recent progress in leveraging large language models (LLMs) has enabled Neural Architecture Design (NAD) systems to generate new architecture not limited from manually predefined search space...
- Sepsis Prediction Using Graph Convolutional Networks over Patient-Feature-Value Triplets : Abstract: In the intensive care setting, sepsis continues to be a major contributor to patient illness and death; however, its timely detection is hindered by the complex, sparse, and heterogeneous na...
- TS-HINT: Enhancing Semiconductor Time Series Regression Using Attention Hints From Large Language Model Reasoning : Abstract: Existing data-driven methods rely on the extraction of static features from time series to approximate the material removal rate (MRR) of semiconductor manufacturing processes such as chemic...
- Turbulence Regression : Abstract: Air turbulence refers to the disordered and irregular motion state generated by drastic changes in velocity, pressure, or direction during airflow. Various complex factors lead to intricate ...
- GRASP: Graph Reasoning Agents for Systems Pharmacology with Human-in-the-Loop : Abstract: Quantitative Systems Pharmacology (QSP) modeling is essential for drug development but it requires significant time investment that limits the throughput of domain experts. We present \textb...
- Credal and Interval Deep Evidential Classifications : Abstract: Uncertainty Quantification (UQ) presents a pivotal challenge in the field of Artificial Intelligence (AI), profoundly impacting decision-making, risk assessment and model reliability. In thi...
- IDK-S: Incremental Distributional Kernel for Streaming Anomaly Detection : Abstract: Anomaly detection on data streams presents significant challenges, requiring methods to maintain high detection accuracy among evolving distributions while ensuring real-time efficiency. Her...
- SCoNE: Spherical Consistent Neighborhoods Ensemble for Effective and Efficient Multi-View Anomaly Detection : Abstract: The core problem in multi-view anomaly detection is to represent local neighborhoods of normal instances consistently across all views. Recent approaches consider a representation of local n...
- Wasserstein distance based semi-supervised manifold learning and application to GNSS multi-path detection : Abstract: The main objective of this study is to propose an optimal transport based semi-supervised approach to learn from scarce labelled image data using deep convolutional networks. The principle l...
- Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning : Abstract: Large language model post-training relies on reinforcement learning to improve model capability and alignment quality. However, the off-policy training paradigm introduces distribution shift...
- Hyperparameter Transfer Enables Consistent Gains of Matrix-Preconditioned Optimizers Across Scales : Abstract: Several recently introduced deep learning optimizers utilizing matrix-level preconditioning have shown promising speedups relative to the current dominant optimizer AdamW, particularly in re...
- Bounded Graph Clustering with Graph Neural Networks : Abstract: In community detection, many methods require the user to specify the number of clusters in advance since an exhaustive search over all possible values is computationally infeasible. While so...
- Beyond Data Filtering: Knowledge Localization for Capability Removal in LLMs : Abstract: Large Language Models increasingly possess capabilities that carry dual-use risks. While data filtering has emerged as a pretraining-time mitigation, it faces significant challenges: labelin...
- Fine-tuning an ECG Foundation Model to Predict Coronary CT Angiography Outcomes : Abstract: Coronary artery disease (CAD) remains a major global health burden. Accurate identification of the culprit vessel and assessment of stenosis severity are essential for guiding individualized...
- ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images : Abstract: Vision-Language Models (VLMs) have advanced multimodal understanding, yet still struggle when targets are embedded in cluttered backgrounds requiring figure-ground segregation. To address th...
- FlowEO: Generative Unsupervised Domain Adaptation for Earth Observation : Abstract: The increasing availability of Earth observation data offers unprecedented opportunities for large-scale environmental monitoring and analysis. However, these datasets are inherently heterog...
- How to Tame Your LLM: Semantic Collapse in Continuous Systems : Abstract: We develop a general theory of semantic dynamics for large language models by formalizing them as Continuous State Machines (CSMs): smooth dynamical systems whose latent manifolds evolve und...
- Advanced Unsupervised Learning: A Comprehensive Overview of Multi-View Clustering Techniques : Abstract: Machine learning techniques face numerous challenges to achieve optimal performance. These include computational constraints, the limitations of single-view learning algorithms and the compl...
- Semore: VLM-guided Enhanced Semantic Motion Representations for Visual Reinforcement Learning : Abstract: The growing exploration of Large Language Models (LLM) and Vision-Language Models (VLM) has opened avenues for enhancing the effectiveness of reinforcement learning (RL). However, existing L...
- Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge : Abstract: Large language models (LLMs) have emerged as a powerful technology, and thus, we have seen widespread adoption and use on software engineering teams. Most often, LLMs are designed as "genera...
- Fine-Tuning BERT for Domain-Specific Question Answering: Toward Educational NLP Resources at University Scale : Abstract: Prior work on scientific question answering has largely emphasized chatbot-style systems, with limited exploration of fine-tuning foundation models for domain-specific reasoning. In this stu...
- Invariance Co-training for Robot Visual Generalization : Abstract: Reasoning from diverse observations is a fundamental capability for generalist robot policies to operate in a wide range of environments. Despite recent advancements, many large-scale roboti...
- MAR-FL: A Communication Efficient Peer-to-Peer Federated Learning System : Abstract: The convergence of next-generation wireless systems and distributed Machine Learning (ML) demands Federated Learning (FL) methods that remain efficient and robust with wireless connected pee...
- A Survey of Bugs in AI-Generated Code : Abstract: Developers are widely using AI code-generation models, aiming to increase productivity and efficiency. However, there are also quality concerns regarding the AI-generated code. The generated...
- Learning to Code with Context: A Study-Based Approach : Abstract: The rapid emergence of generative AI tools is transforming the way software is developed. Consequently, software engineering education must adapt to ensure that students not only learn tradi...
- Uncertainty-Aware Data-Efficient AI: An Information-Theoretic Perspective : Abstract: In context-specific applications such as robotics, telecommunications, and healthcare, artificial intelligence systems often face the challenge of limited training data. This scarcity introd...
- XR-DT: Extended Reality-Enhanced Digital Twin for Agentic Mobile Robots : Abstract: As mobile robots increasingly operate alongside humans in shared workspaces, ensuring safe, efficient, and interpretable Human-Robot Interaction (HRI) has become a pressing challenge. While ...
- From Segments to Scenes: Temporal Understanding in Autonomous Driving via Vision-Language Model : Abstract: Temporal understanding in autonomous driving (AD) remains a significant challenge, even for recent state-of-the-art (SoTA) Vision-Language Models (VLMs). Prior work has introduced datasets a...
- Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification : Abstract: Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While t...
- CFO: Learning Continuous-Time PDE Dynamics via Flow-Matched Neural Operators : Abstract: Neural operator surrogates for time-dependent partial differential equations (PDEs) conventionally employ autoregressive prediction schemes, which accumulate error over long rollouts and req...
- The Erosion of LLM Signatures: Can We Still Distinguish Human and LLM-Generated Scientific Ideas After Iterative Paraphrasing? : Abstract: With the increasing reliance on LLMs as research agents, distinguishing between LLM and human-generated ideas has become crucial for understanding the cognitive nuances of LLMs' research cap...
- WhatsCode: Large-Scale GenAI Deployment for Developer Efficiency at WhatsApp : Abstract: The deployment of AI-assisted development tools in compliance-relevant, large-scale industrial environments represents significant gaps in academic literature, despite growing industry adopt...
- To Think or Not to Think: The Hidden Cost of Meta-Training with Excessive CoT Examples : Abstract: Chain-of-thought (CoT) prompting combined with few-shot in-context learning (ICL) has unlocked significant reasoning capabilities in large language models (LLMs). However, ICL with CoT examp...
- Robustness Test for AI Forecasting of Hurricane Florence Using FourCastNetv2 and Random Perturbations of the Initial Condition : Abstract: Understanding the robustness of a weather forecasting model with respect to input noise or different uncertainties is important in assessing its output reliability, particularly for extreme ...
- LYNX: Learning Dynamic Exits for Confidence-Controlled Reasoning : Abstract: Large reasoning models achieve strong performance on complex tasks by generating extended chains of thought, but they often "overthink": continuing to reason long after they have enough info...
- The Effect of Document Summarization on LLM-Based Relevance Judgments : Abstract: Relevance judgments are central to the evaluation of Information Retrieval (IR) systems, but obtaining them from human annotators is costly and time-consuming. Large Language Models (LLMs) h...
- Interaction Tensor Shap : Abstract: Machine learning models have grown increasingly deep and high dimensional, making it difficult to understand how individual and combined features influence their predictions. While Shapley v...
- SpaceControl: Introducing Test-Time Spatial Control to 3D Generative Modeling : Abstract: Generative methods for 3D assets have recently achieved remarkable progress, yet providing intuitive and precise control over the object geometry remains a key challenge. Existing approaches...
- Invisible Load: Uncovering the Challenges of Neurodivergent Women in Software Engineering : Abstract: Neurodivergent women in Software Engineering (SE) encounter distinctive challenges at the intersection of gender bias and neurological differences. To the best of our knowledge, no prior wor...
- Text Rationalization for Robust Causal Effect Estimation : Abstract: Recent advances in natural language processing have enabled the increasing use of text data in causal inference, particularly for adjusting confounding factors in treatment effect estimation...
- Please Don't Kill My Vibe: Empowering Agents with Data Flow Control : Abstract: The promise of Large Language Model (LLM) agents is to perform complex, stateful tasks. This promise is stunted by significant risks - policy violations, process corruption, and security fla...
- China Regional 3km Downscaling Based on Residual Corrective Diffusion Model : Abstract: A fundamental challenge in numerical weather prediction is to efficiently produce high-resolution forecasts. A common solution is applying downscaling methods, which include dynamical downsc...
- Mitigating Self-Preference by Authorship Obfuscation : Abstract: Language models (LMs) judges are widely used to evaluate the quality of LM outputs. Despite many advantages, LM judges display concerning biases that can impair their integrity in evaluation...
- Fuzzing the brain: Automated stress testing for the safety of ML-driven neurostimulation : Abstract: Objective: Machine learning (ML) models are increasingly used to generate electrical stimulation patterns in neuroprosthetic devices such as visual prostheses. While these models promise pre...
- Generalization Beyond Benchmarks: Evaluating Learnable Protein-Ligand Scoring Functions on Unseen Targets : Abstract: As machine learning becomes increasingly central to molecular design, it is vital to ensure the reliability of learnable protein-ligand scoring functions on novel protein targets. While many...
- Simulating Life Paths with Digital Twins: AI-Generated Future Selves Influence Decision-Making and Expand Human Choice : Abstract: Major life transitions demand high-stakes decisions, yet people often struggle to imagine how their future selves will live with the consequences. To support this limited capacity for mental...
- Smart Timing for Mining: A Deep Learning Framework for Bitcoin Hardware ROI Prediction : Abstract: Bitcoin mining hardware acquisition requires strategic timing due to volatile markets, rapid technological obsolescence, and protocol-driven revenue cycles. Despite mining's evolution into a...
- A Systematic Framework for Enterprise Knowledge Retrieval: Leveraging LLM-Generated Metadata to Enhance RAG Systems : Abstract: In enterprise settings, efficiently retrieving relevant information from large and complex knowledge bases is essential for operational productivity and informed decision-making. This resear...
- Moving object detection from multi-depth images with an attention-enhanced CNN : Abstract: One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other so...
- ArtistMus: A Globally Diverse, Artist-Centric Benchmark for Retrieval-Augmented Music Question Answering : Abstract: Recent advances in large language models (LLMs) have transformed open-domain question answering, yet their effectiveness in music-related reasoning remains limited due to sparse music knowle...
- Building Capacity for Artificial Intelligence in Africa: A Cross-Country Survey of Challenges and Governance Pathways : Abstract: Artificial intelligence (AI) is transforming education and the workforce, but access to AI learning opportunities in Africa remains uneven. With rapid demographic shifts and growing labour m...
- IdealTSF: Can Non-Ideal Data Contribute to Enhancing the Performance of Time Series Forecasting Models? : Abstract: Deep learning has shown strong performance in time series forecasting tasks. However, issues such as missing values and anomalies in sequential data hinder its further development in predict...
- Parajudica: An RDF-Based Reasoner and Metamodel for Multi-Framework Context-Dependent Data Compliance Assessments : Abstract: Motivated by the challenges of implementing policy-based data access control (PBAC) under multiple simultaneously applicable compliance frameworks, we present Parajudica, an open, modular, a...
- Knowing Your Uncertainty -- On the application of LLM in social sciences : Abstract: Large language models (LLMs) are rapidly being integrated into computational social science research, yet their blackboxed training and designed stochastic elements in inference pose unique ...
- Dynamic Alignment for Collective Agency: Toward a Scalable Self-Improving Framework for Open-Ended LLM Alignment : Abstract: Large Language Models (LLMs) are typically aligned with human values using preference data or predefined principles such as helpfulness, honesty, and harmlessness. However, as AI systems pro...
- University Building Recognition Dataset in Thailand for the mission-oriented IoT sensor system : Abstract: Many industrial sectors have been using of machine learning at inference mode on edge devices. Future directions show that training on edge devices is promising due to improvements in semico...
- How Ensemble Learning Balances Accuracy and Overfitting: A Bias-Variance Perspective on Tabular Data : Abstract: Ensemble models often achieve higher accuracy than single learners, but their ability to maintain small generalization gaps is not always well understood. This study examines how ensembles b...
- PERM EQ x GRAPH EQ: Equivariant Neural Networks for Quantum Molecular Learning : Abstract: In hierarchal order of molecular geometry, we compare the performances of Geometric Quantum Machine Learning models. Two molecular datasets are considered: the simplistic linear shaped LiH-m...
- UniFS: Unified Multi-Contrast MRI Reconstruction via Frequency-Spatial Fusion : Abstract: Recently, Multi-Contrast MR Reconstruction (MCMR) has emerged as a hot research topic that leverages high-quality auxiliary modalities to reconstruct undersampled target modalities of intere...
- Lyrics Matter: Exploiting the Power of Learnt Representations for Music Popularity Prediction : Abstract: Accurately predicting music popularity is a critical challenge in the music industry, offering benefits to artists, producers, and streaming platforms. Prior research has largely focused on ...
- Matching Ranks Over Probability Yields Truly Deep Safety Alignment : Abstract: A frustratingly easy technique known as the prefilling attack has been shown to effectively circumvent the safety alignment of frontier LLMs by simply prefilling the assistant response with ...
- User Negotiations of Authenticity, Ownership, and Governance on AI-Generated Video Platforms: Evidence from Sora : Abstract: As AI-generated video platforms rapidly advance, ethical challenges such as copyright infringement emerge. This study examines how users make sense of AI-generated videos on OpenAI's Sora by...
- See in Depth: Training-Free Surgical Scene Segmentation with Monocular Depth Priors : Abstract: Pixel-wise segmentation of laparoscopic scenes is essential for computer-assisted surgery but difficult to scale due to the high cost of dense annotations. We propose depth-guided surgical s...
- On the Theoretical Foundation of Sparse Dictionary Learning in Mechanistic Interpretability : Abstract: As AI models achieve remarkable capabilities across diverse domains, understanding what representations they learn and how they process information has become increasingly important for both...
- RoBoN: Routed Online Best-of-n for Test-Time Scaling with Multiple LLMs : Abstract: Best-of-$n$ is a widely used test-time scaling approach for LLM inference. Yet despite evidence that LLMs exhibit complementary strengths across tasks, traditionally best-of-$n$ relies on a ...
- Conscious Gaze: Adaptive Attention Mechanisms for Hallucination Mitigation in Vision-Language Models : Abstract: Large Vision-Language Models (VLMs) often exhibit text inertia, where attention drifts from visual evidence toward linguistic priors, resulting in object hallucinations. Existing decoding st...
- Improving Local Fidelity Through Sampling and Modeling Nonlinearity : Abstract: With the increasing complexity of black-box machine learning models and their adoption in high-stakes areas, it is critical to provide explanations for their predictions. Local Interpretable...
- 2K-Characters-10K-Stories: A Quality-Gated Stylized Narrative Dataset with Disentangled Control and Sequence Consistency : Abstract: Sequential identity consistency under precise transient attribute control remains a long-standing challenge in controllable visual storytelling. Existing datasets lack sufficient fidelity an...
- A Comprehensive Framework for Automated Quality Control in the Automotive Industry : Abstract: This paper presents a cutting-edge robotic inspection solution designed to automate quality control in automotive manufacturing. The system integrates a pair of collaborative robots, each eq...
- Modular Jets for Supervised Pipelines: Diagnosing Mirage vs Identifiability : Abstract: Classical supervised learning evaluates models primarily via predictive risk on hold-out data. Such evaluations quantify how well a function behaves on a distribution, but they do not addres...
- Grounded Multilingual Medical Reasoning for Question Answering with Large Language Models : Abstract: Large Language Models (LLMs) with reasoning capabilities have recently demonstrated strong potential in medical Question Answering (QA). Existing approaches are largely English-focused and p...
- Feasibility of AI-Assisted Programming for End-User Development : Abstract: End-user development,where non-programmers create or adapt their own digital tools, can play a key role in driving digital transformation within organizations. Currently, low-code/no-code pl...
- On Dynamic Programming Theory for Leader-Follower Stochastic Games : Abstract: Leader-follower general-sum stochastic games (LF-GSSGs) model sequential decision-making under asymmetric commitment, where a leader commits to a policy and a follower best responds, yieldin...
- InverseCrafter: Efficient Video ReCapture as a Latent Domain Inverse Problem : Abstract: Recent approaches to controllable 4D video generation often rely on fine-tuning pre-trained Video Diffusion Models (VDMs). This dominant paradigm is computationally expensive, requiring larg...
- Retrieving Semantically Similar Decisions under Noisy Institutional Labels: Robust Comparison of Embedding Methods : Abstract: Retrieving case law is a time-consuming task predominantly carried out by querying databases. We provide a comparison of two models in three different settings for Czech Constitutional Court...
- HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies : Abstract: The development of foundation models for embodied intelligence critically depends on access to large-scale, high-quality robot demonstration data. Recent approaches have sought to address th...
- Faithfulness metric fusion: Improving the evaluation of LLM trustworthiness across domains : Abstract: We present a methodology for improving the accuracy of faithfulness evaluation in Large Language Models (LLMs). The proposed methodology is based on the combination of elementary faithfulnes...
- Bayesian Active Inference for Intelligent UAV Anti-Jamming and Adaptive Trajectory Planning : Abstract: This paper proposes a hierarchical trajectory planning framework for UAVs operating under adversarial jamming conditions. Leveraging Bayesian Active Inference, the approach combines expert-g...
- Big Tech-Funded AI Papers Have Higher Citation Impact, Greater Insularity, and Larger Recency Bias : Abstract: Over the past four decades, artificial intelligence (AI) research has flourished at the nexus of academia and industry. However, Big Tech companies have increasingly acquired the edge in com...
- Efficient Text Classification with Conformal In-Context Learning : Abstract: Large Language Models (LLMs) demonstrate strong in-context learning abilities, yet their effectiveness in text classification depends heavily on prompt design and incurs substantial computat...
- Active Video Perception: Iterative Evidence Seeking for Agentic Long Video Understanding : Abstract: Long video understanding (LVU) is challenging because answering real-world queries often depends on sparse, temporally dispersed cues buried in hours of mostly redundant and irrelevant conte...
- Mechanistic Interpretability of Antibody Language Models Using SAEs : Abstract: Sparse autoencoders (SAEs) are a mechanistic interpretability technique that have been used to provide insight into learned concepts within large protein language models. Here, we employ Top...
- 3D Path Planning for Robot-assisted Vertebroplasty from Arbitrary Bi-plane X-ray via Differentiable Rendering : Abstract: Robotic systems are transforming image-guided interventions by enhancing accuracy and minimizing radiation exposure. A significant challenge in robotic assistance lies in surgical path plann...
- Probing the effectiveness of World Models for Spatial Reasoning through Test-time Scaling : Abstract: Vision-Language Models (VLMs) remain limited in spatial reasoning tasks that require multi-view understanding and embodied perspective shifts. Recent approaches such as MindJourney attempt t...
- Approximation of Box Decomposition Algorithm for Fast Hypervolume-Based Multi-Objective Optimization : Abstract: Hypervolume (HV)-based Bayesian optimization (BO) is one of the standard approaches for multi-objective decision-making. However, the computational cost of optimizing the acquisition functio...
- Phase-OTDR Event Detection Using Image-Based Data Transformation and Deep Learning : Abstract: This study focuses on event detection in optical fibers, specifically classifying six events using the Phase-OTDR system. A novel approach is introduced to enhance Phase-OTDR data analysis b...
- NEAT: Neighborhood-Guided, Efficient, Autoregressive Set Transformer for 3D Molecular Generation : Abstract: Autoregressive models are a promising alternative to diffusion-based models for 3D molecular structure generation. However, a key limitation is the assumption of a token order: while text ha...
- Optimizing Medical Question-Answering Systems: A Comparative Study of Fine-Tuned and Zero-Shot Large Language Models with RAG Framework : Abstract: Medical question-answering (QA) systems can benefit from advances in large language models (LLMs), but directly applying LLMs to the clinical domain poses challenges such as maintaining fact...
- Sparse Attention Post-Training for Mechanistic Interpretability : Abstract: We introduce a simple post-training method that makes transformer attention sparse without sacrificing performance. Applying a flexible sparsity regularisation under a constrained-loss objec...
- Neural Coherence : Find higher performance to out-of-distribution tasks from few samples : Abstract: To create state-of-the-art models for many downstream tasks, it has become common practice to fine-tune a pre-trained large vision model. However, it remains an open question of how to best ...
- Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures : Abstract: Bug localization in multi-repository microservice architectures is challenging due to the semantic gap between natural language bug reports and code, LLM context limitations, and the need to...
- World Models That Know When They Don't Know: Controllable Video Generation with Calibrated Uncertainty : Abstract: Recent advances in generative video models have led to significant breakthroughs in high-fidelity video synthesis, specifically in controllable video generation where the generated video is ...
- Measuring the Effect of Background on Classification and Feature Importance in Deep Learning for AV Perception : Abstract: Common approaches to explainable AI (XAI) for deep learning focus on analyzing the importance of input features on the classification task in a given model: saliency methods like SHAP and Gr...
- Documenting SME Processes with Conversational AI: From Tacit Knowledge to BPMN : Abstract: Small and medium-sized enterprises (SMEs) still depend heavily on tacit, experience-based know-how that rarely makes its way into formal documentation. This paper introduces a large-language...
- Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations : Abstract: Evaluating faithfulness of Large Language Models (LLMs) to a given task is a complex challenge. We propose two new unsupervised metrics for faithfulness evaluation using insights from inform...
- Bridging Traditional Machine Learning and Large Language Models: A Two-Part Course Design for Modern AI Education : Abstract: This paper presents an innovative pedagogical approach for teaching artificial intelligence and data science that systematically bridges traditional machine learning techniques with modern L...
- On the Computability of Artificial General Intelligence : Abstract: In recent years we observed rapid and significant advancements in artificial intelligence (A.I.). So much so that many wonder how close humanity is to developing an A.I. model that can achie...
- Resolving Zadehs Paradox Axiomatic Possibility Theory as a Foundation for Reliable Artificial Intelligence : Abstract: This work advances and substantiates the thesis that the resolution of this crisis lies in the domain of possibility theory, specifically in the axiomatic approach developed in Bychkovs arti...
- AI & Human Co-Improvement for Safer Co-Superintelligence : Abstract: Self-improvement is a goal currently exciting the field of AI, but is fraught with danger, and may take time to fully achieve. We advocate that a more achievable and better goal for humanity...
- MCP-AI: Protocol-Driven Intelligence Framework for Autonomous Reasoning in Healthcare : Abstract: Healthcare AI systems have historically faced challenges in merging contextual reasoning, long-term state management, and human-verifiable workflows into a cohesive framework. This paper int...
- ChipMind: Retrieval-Augmented Reasoning for Long-Context Circuit Design Specifications : Abstract: While Large Language Models (LLMs) demonstrate immense potential for automating integrated circuit (IC) development, their practical deployment is fundamentally limited by restricted context...
- BEAVER: An Efficient Deterministic LLM Verifier : Abstract: As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints...
- The Seeds of Scheming: Weakness of Will in the Building Blocks of Agentic Systems : Abstract: Large language models display a peculiar form of inconsistency: they "know" the correct answer but fail to act on it. In human philosophy, this tension between global judgment and local impu...
- MIND: Multi-rationale INtegrated Discriminative Reasoning Framework for Multi-modal Large Models : Abstract: Recently, multimodal large language models (MLLMs) have been widely applied to reasoning tasks. However, they suffer from limited multi-rationale semantic modeling, insufficient logical robu...
- CureAgent: A Training-Free Executor-Analyst Framework for Clinical Reasoning : Abstract: Current clinical agent built on small LLMs, such as TxAgent suffer from a \textit{Context Utilization Failure}, where models successfully retrieve biomedical evidence due to supervised finet...
- Ontology Learning with LLMs: A Benchmark Study on Axiom Identification : Abstract: Ontologies are an important tool for structuring domain knowledge, but their development is a complex task that requires significant modelling and domain expertise. Ontology learning, aimed ...
- Enhancing Local Search for MaxSAT with Deep Differentiation Clause Weighting : Abstract: Partial Maximum Satisfiability (PMS) and Weighted Partial Maximum Satisfiability (WPMS) generalize Maximum Satisfiability (MaxSAT), with broad real-world applications. Recent advances in Sto...
- KANFormer for Predicting Fill Probabilities via Survival Analysis in Limit Order Books : Abstract: This paper introduces KANFormer, a novel deep-learning-based model for predicting the time-to-fill of limit orders by leveraging both market- and agent-level information. KANFormer combines ...
- A Fast Anti-Jamming Cognitive Radar Deployment Algorithm Based on Reinforcement Learning : Abstract: The fast deployment of cognitive radar to counter jamming remains a critical challenge in modern warfare, where more efficient deployment leads to quicker detection of targets. Existing meth...
- Evolutionary System 2 Reasoning: An Empirical Proof : Abstract: Machine intelligence marks the ultimate dream of making machines' intelligence comparable to human beings. While recent progress in Large Language Models (LLMs) show substantial specific ski...
- The Missing Layer of AGI: From Pattern Alchemy to Coordination Physics : Abstract: Influential critiques argue that Large Language Models (LLMs) are a dead end for AGI: "mere pattern matchers" structurally incapable of reasoning or planning. We argue this conclusion miside...
- Multimodal Oncology Agent for IDH1 Mutation Prediction in Low-Grade Glioma : Abstract: Low-grade gliomas frequently present IDH1 mutations that define clinically distinct subgroups with specific prognostic and therapeutic implications. This work introduces a Multimodal Oncolog...
- Using Large Language Models to Create Personalized Networks From Therapy Sessions : Abstract: Recent advances in psychotherapy have focused on treatment personalization, such as by selecting treatment modules based on personalized networks. However, estimating personalized networks t...
- To Err Is Human: Systematic Quantification of Errors in Published AI Papers via LLM Analysis : Abstract: How many mistakes do published AI papers contain? Peer-reviewed publications form the foundation upon which new research and knowledge are built. Errors that persist in the literature can pr...
- PRiSM: An Agentic Multimodal Benchmark for Scientific Reasoning via Python-Grounded Evaluation : Abstract: Evaluating vision-language models (VLMs) in scientific domains like mathematics and physics poses unique challenges that go far beyond predicting final answers. These domains demand conceptu...
- TRACE: A Framework for Analyzing and Enhancing Stepwise Reasoning in Vision-Language Models : Abstract: Reliable mathematical and scientific reasoning remains an open challenge for large vision-language models. Standard final-answer evaluation often masks reasoning errors, allowing silent fail...
- Variational Quantum Rainbow Deep Q-Network for Optimizing Resource Allocation Problem : Abstract: Resource allocation remains NP-hard due to combinatorial complexity. While deep reinforcement learning (DRL) methods, such as the Rainbow Deep Q-Network (DQN), improve scalability through pr...
- SymPyBench: A Dynamic Benchmark for Scientific Reasoning with Executable Python Code : Abstract: We introduce, a large-scale synthetic benchmark of 15,045 university-level physics problems (90/10% train/test split). Each problem is fully parameterized, supporting an effectively infinite...
- EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search : Abstract: Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven informatio...
- RAG-IGBench: Innovative Evaluation for RAG-based Interleaved Generation in Open-domain Question Answering : Abstract: In real-world scenarios, providing user queries with visually enhanced responses can considerably benefit understanding and memory, underscoring the great value of interleaved image-text gen...
- PESTalk: Speech-Driven 3D Facial Animation with Personalized Emotional Styles : Abstract: PESTalk is a novel method for generating 3D facial animations with personalized emotional styles directly from speech. It overcomes key limitations of existing approaches by introducing a Du...
- SyncVoice: Towards Video Dubbing with Vision-Augmented Pretrained TTS Model : Abstract: Video dubbing aims to generate high-fidelity speech that is precisely temporally aligned with the visual content. Existing methods still suffer from limitations in speech naturalness and aud...
- GNSS Jammer Direction Finding in Dynamic Scenarios Using an Inertial-based Multi-Antenna System : Abstract: Jamming devices disrupt signals from the global navigation satellite system (GNSS) and pose a significant threat by compromising the reliability of accurate positioning. Consequently, the de...
- AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance : Abstract: Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-c...
- Breaking Scale Anchoring: Frequency Representation Learning for Accurate High-Resolution Inference from Low-Resolution Training : Abstract: Zero-Shot Super-Resolution Spatiotemporal Forecasting requires a deep learning model to be trained on low-resolution data and deployed for inference on high-resolution. Existing studies cons...
Research Sources: 292 | Generated: 12/8/2025
