AI Research News Feeds for December 24th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

TropNNC: Structured Neural Network Compression Using Tropical Geometry : Abstract: We present TropNNC, a framework for compressing neural networks with linear and convolutional layers and ReLU activations using tropical geometry. By representing a network's output as a tro...
UniMPR: A Unified Framework for Multimodal Place Recognition with Heterogeneous Sensor Configurations : Abstract: Place recognition is a critical component of autonomous vehicles and robotics, enabling global localization in GPS-denied environments. Recent advances have spurred significant interest in m...
Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation : Abstract: Frame-level autoregressive (frame-AR) models have achieved significant progress, enabling real-time video generation comparable to bidirectional diffusion models and serving as a foundation ...
Neural Implicit Heart Coordinates: 3D cardiac shape reconstruction from sparse segmentations : Abstract: Accurate reconstruction of cardiac anatomy from sparse clinical images remains a major challenge in patient-specific modeling. While neural implicit functions have previously been applied to...
SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion : Abstract: In recent years, the demand of image compression models for machine vision has increased dramatically. However, the training frameworks of image compression still focus on the vision of huma...
LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry : Abstract: Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across ...
Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits : Abstract: Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, mul...
MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts : Abstract: We present MoE-DiffuSeq, a mixture of experts based framework for enhancing diffusion models in long document generation. Existing diffusion based text generation models, such as DiffuSeq, s...
Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark : Abstract: Document image retrieval (DIR) aims to retrieve document images from a gallery according to a given query. Existing DIR methods are primarily based on image queries that retrieve documents w...
Coherence in the brain unfolds across separable temporal regimes : Abstract: Coherence in language requires the brain to satisfy two competing temporal demands: gradual accumulation of meaning across extended context and rapid reconfiguration of representations at ev...
Making Large Language Models Efficient Dense Retrievers : Abstract: Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter counts make them computationally i...
Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Object-Oriented Programming : Abstract: Object-Oriented Programming (OOP) has become a crucial paradigm for managing the growing complexity of modern software systems, particularly in fields like machine learning, deep learning, l...
Generating the Past, Present and Future from a Motion-Blurred Image : Abstract: We seek to answer the question: what can a motion-blurred image reveal about a scene's past, present, and future? Although motion blur obscures image details and degrades visual quality, it ...
Learning to Refocus with Video Diffusion Models : Abstract: Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method ...
RANSAC Scoring Functions: Analysis and Reality Check : Abstract: We revisit the problem of assigning a score (a quality of fit) to candidate geometric models -- one of the key components of RANSAC for robust geometric fitting. In a non-robust setting, the...
HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction : Abstract: 3D Panoptic Occupancy Prediction aims to reconstruct a dense volumetric scene map by predicting the semantic class and instance identity of every occupied region in 3D space. Achieving such ...
Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs : Abstract: User interface to code (UI2Code) aims to generate executable code that can faithfully reconstruct a given input UI. Prior work focuses largely on web pages and mobile screens, leaving app wi...
SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction : Abstract: While instruction-based image editing is emerging, extending it to 360$^\circ$ panoramas introduces additional challenges. Existing methods often produce implausible results in both equirect...
HistoWAS: A Pathomics Framework for Large-Scale Feature-Wide Association Studies of Tissue Topology and Patient Outcomes : Abstract: High-throughput "pathomic" analysis of Whole Slide Images (WSIs) offers new opportunities to study tissue characteristics and for biomarker discovery. However, the clinical relevance of the ...
WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification : Abstract: In recent years, the integration of pre-trained foundational models with multiple instance learning (MIL) has improved diagnostic accuracy in computational pathology. However, existing MIL m...
A Dual-Branch Local-Global Framework for Cross-Resolution Land Cover Mapping : Abstract: Cross-resolution land cover mapping aims to produce high-resolution semantic predictions from coarse or low-resolution supervision, yet the severe resolution mismatch makes effective learnin...
Few-Shot-Based Modular Image-to-Video Adapter for Diffusion Models : Abstract: Diffusion models (DMs) have recently achieved impressive photorealism in image and video generation. However, their application to image animation remains limited, even when trained on large...
PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification : Abstract: Automated pavement defect detection often struggles to generalize across diverse real-world conditions due to the lack of standardized datasets. Existing datasets differ in annotation styles...
SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images : Abstract: Effectively grounding complex language to pixels in remote sensing (RS) images is a critical challenge for applications like disaster response and environmental monitoring. Current models ca...
A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments : Abstract: Despite increasing interest in computer vision-based distracted driving detection, most existing models rely exclusively on driver-facing views and overlook crucial environmental context tha...
MAPI-GNN: Multi-Activation Plane Interaction Graph Neural Network for Multimodal Medical Diagnosis : Abstract: Graph neural networks are increasingly applied to multimodal medical diagnosis for their inherent relational modeling capabilities. However, their efficacy is often compromised by the prevai...
$\text{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning : Abstract: Compositional zero-shot learning (CZSL) aims to recognize unseen state-object compositions by generalizing from a training set of their primitives (state and object). Current methods often o...
VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement : Abstract: Visual Speech Recognition aims to transcribe spoken words from silent lip-motion videos. This task is particularly challenging for Mandarin, as visemes are highly ambiguous and homophones ar...
FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs : Abstract: We present FlashLips, a two-stage, mask-free lip-sync system that decouples lips control from rendering and achieves real-time performance running at over 100 FPS on a single GPU, while matc...
Progressive Learned Image Compression for Machine Perception : Abstract: Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image compression with fine granular scalability (FGS)-which ...
Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models : Abstract: Human Activity Recognition (HAR) plays a vital role in healthcare, surveillance, and innovative environments, where reliable action recognition supports timely decision-making and automation...
LiDARDraft: Generating LiDAR Point Cloud from Versatile Inputs : Abstract: Generating realistic and diverse LiDAR point clouds is crucial for autonomous driving simulation. Although previous methods achieve LiDAR point cloud generation from user inputs, they strugg...
UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis : Abstract: Novel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks re...
Multi Modal Attention Networks with Uncertainty Quantification for Automated Concrete Bridge Deck Delamination Detection : Abstract: Deteriorating civil infrastructure requires automated inspection techniques overcoming limitations of visual assessment. While Ground Penetrating Radar and Infrared Thermography enable subsu...
DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation : Abstract: Audio-Visual Segmentation (AVS) aims to localize sound-producing objects at the pixel level by jointly leveraging auditory and visual information. However, existing methods often suffer from...
HEART-VIT: Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer : Abstract: Vision Transformers (ViTs) deliver state-of-the-art accuracy but their quadratic attention cost and redundant computations severely hinder deployment on latency and resource-constrained plat...
milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion : Abstract: Millimeter-wave radar offers a privacy-preserving and lighting-invariant alternative to RGB sensors for Human Pose Estimation (HPE) task. However, the radar signals are often sparse due to s...
Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS) : Abstract: Automating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such...
CoDi -- an exemplar-conditioned diffusion model for low-shot counting : Abstract: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern ...
AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model : Abstract: Vision foundation models trained via multi-teacher distillation offer a promising path toward unified visual representations, yet the learning dynamics and data efficiency of such approaches...
Generative Latent Coding for Ultra-Low Bitrate Image Compression : Abstract: Most existing image compression approaches perform transform coding in the pixel space to reduce its spatial redundancy. However, they encounter difficulties in achieving both high-realism a...
JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement : Abstract: Given the complexity of underwater environments and the variability of water as a medium, underwater images are inevitably subject to various types of degradation. The degradations present n...
LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation : Abstract: 3D object detection is fundamental for safe and robust intelligent transportation systems. Current multi-modal 3D object detectors often rely on complex architectures and training strategies...
IndicDLP: A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing : Abstract: Document layout analysis is essential for downstream tasks such as information retrieval, extraction, OCR, and digitization. However, existing large-scale datasets like PubLayNet and DocBank...
Degradation-Aware Metric Prompting for Hyperspectral Image Restoration : Abstract: Unified hyperspectral image (HSI) restoration aims to recover various degraded HSIs using a single model, offering great practical value. However, existing methods often depend on explicit d...
BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation : Abstract: High-resolution remote sensing image semantic segmentation (HRSS) is a fundamental yet critical task in the field of Earth observation. However, it has long faced the challenges of high inte...
LADLE-MM: Limited Annotation based Detector with Learned Ensembles for Multimodal Misinformation : Abstract: With the rise of easily accessible tools for generating and manipulating multimedia content, realistic synthetic alterations to digital media have become a widespread threat, often involving...
The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection : Abstract: Although diffusion transformer (DiT)-based video virtual try-on (VVT) has made significant progress in synthesizing realistic videos, existing methods still struggle to capture fine-grained ...
CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation : Abstract: Recent work has shown that inference-time reasoning and reflection can improve text-to-image generation without retraining. However, existing approaches often rely on implicit, holistic crit...
Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge : Abstract: Over half of the world's population is bilingual and people often communicate under multilingual scenarios. The Face-Voice Association in Multilingual Environments (FAME) 2026 Challenge, hel...
SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images : Abstract: Recent advances in generative AI have accelerated the production of ultra-high-resolution visual content, posing significant challenges for efficient compression and real-time decoding on en...
Chain-of-Anomaly Thoughts with Large Vision-Language Models : Abstract: Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategie...
Skin Lesion Classification Using a Soft Voting Ensemble of Convolutional Neural Networks : Abstract: Skin cancer can be identified by dermoscopic examination and ocular inspection, but early detection significantly increases survival chances. Artificial intelligence (AI), using annotated sk...
High Dimensional Data Decomposition for Anomaly Detection of Textured Images : Abstract: In the realm of diverse high-dimensional data, images play a significant role across various processes of manufacturing systems where efficient image anomaly detection has emerged as a core ...
Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding : Abstract: Human motion understanding has advanced rapidly through vision-based progress in recognition, tracking, and captioning. However, most existing methods overlook physical cues such as joint ac...
UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images : Abstract: AI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners, and advertisements. While diffusion-based text-...
Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems : Abstract: Landmines remain a persistent humanitarian threat, with 110 million actively deployed mines across 60 countries, claiming 26,000 casualties annually. This research evaluates adaptive Red-Gre...
Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition : Abstract: This manuscript explores multimodal alignment, translation, fusion, and transference to enhance machine understanding of complex inputs. We organize the work into five chapters, each address...
SirenPose: Dynamic Scene Reconstruction via Geometric Supervision : Abstract: We introduce SirenPose, a geometry-aware loss formulation that integrates the periodic activation properties of sinusoidal representation networks with keypoint-based geometric supervision, ...
AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment : Abstract: Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation...
Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios : Abstract: Image fusion aims to synthesize a single high-quality image from a pair of inputs captured under challenging conditions, such as differing exposure levels or focal depths. A core challenge l...
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models : Abstract: Vision-language models (VLM) excel at general understanding yet remain weak at dynamic spatial reasoning (DSR), i.e., reasoning about the evolvement of object geometry and relationship in 3D...
FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models : Abstract: Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and substantial redundancy. Existing...
Repurposing Video Diffusion Transformers for Robust Point Tracking : Abstract: Point tracking aims to localize corresponding points across video frames, serving as a fundamental task for 4D reconstruction, robotics, and video editing. Existing methods commonly rely on ...
Active Intelligence in Video Avatars via Closed-loop World Modeling : Abstract: Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term goals through adaptive environm...
SpatialTree: How Spatial Abilities Branch Out in MLLMs : Abstract: Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood,...
SemanticGen: Video Generation in Semantic Space : Abstract: State-of-the-art video generative models typically learn the distribution of video latents in the VAE space and map them to pixels using a VAE decoder. While this approach can generate high-...
SAM Audio: Segment Anything in Audio : Abstract: General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation mode...
Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs : Abstract: Authoring 3D scenes is a central task for spatial computing applications. Competing visions for lowering existing barriers are (1) focus on immersive, direct manipulation of 3D content or (2...
CLIP Based Region-Aware Feature Fusion for Automated BBPS Scoring in Colonoscopy Images : Abstract: Accurate assessment of bowel cleanliness is essential for effective colonoscopy procedures. The Boston Bowel Preparation Scale (BBPS) offers a standardized scoring system but suffers from su...
Snapshot 3D image projection using a diffractive decoder : Abstract: 3D image display is essential for next-generation volumetric imaging; however, dense depth multiplexing for 3D image projection remains challenging because diffraction-induced cross-talk rap...
Machine Learning to Predict Digital Frustration from Clickstream Data : Abstract: Many businesses depend on their mobile apps and websites, so user frustration while trying to complete a task on these channels can cause lost sales and complaints. In this research, I use c...
Recurrent Off-Policy Deep Reinforcement Learning Doesn't Have to be Slow : Abstract: Recurrent off-policy deep reinforcement learning models achieve state-of-the-art performance but are often sidelined due to their high computational demands. In response, we introduce RISE (...
Explainable time-series forecasting with sampling-free SHAP for Transformers : Abstract: Time-series forecasts are essential for planning and decision-making in many domains. Explainability is key to building user trust and meeting transparency requirements. Shapley Additive Exp...
Improving ML Training Data with Gold-Standard Quality Metrics : Abstract: Hand-tagged training data is essential to many machine learning tasks. However, training data quality control has received little attention in the literature, despite data quality varying co...
Relu and softplus neural nets as zero-sum turn-based games : Abstract: We show that the output of a ReLU neural network can be interpreted as the value of a zero-sum, turn-based, stopping game, which we call the ReLU net game. The game runs in the direction opp...
Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures : Abstract: Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being widely observed across architectu...
ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval : Abstract: The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning. This process, however, is inherently limited by what we formalize as the Gradient Locali...
Chemically-Informed Machine Learning Approach for Prediction of Reactivity Ratios in Radical Copolymerization : Abstract: Predicting monomer reactivity ratios is crucial for controlling monomer sequence distribution in copolymers and their properties. Traditional experimental methods of determining reactivity r...
NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra : Abstract: Molecular structure elucidation from spectroscopic data is a long-standing challenge in Chemistry, traditionally requiring expert interpretation. We introduce NMIRacle, a two-stage generativ...
Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise : Abstract: In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for \textbf{robust...
Fundamentals of quantum Boltzmann machine learning with visible and hidden units : Abstract: One of the primary applications of classical Boltzmann machines is generative modeling, wherein the goal is to tune the parameters of a model distribution so that it closely approximates a t...
Efficient Learning of Lattice Gauge Theories with Fermions : Abstract: We introduce a learning method for recovering action parameters in lattice field theories. Our method is based on the minimization of a convex loss function constructed using the Schwinger-D...
Detecting cyberbullying in Spanish texts through deep learning techniques : Abstract: Recent recollected data suggests that it is possible to automatically detect events that may negatively affect the most vulnerable parts of our society, by using any communication technology...
Quasiprobabilistic Density Ratio Estimation with a Reverse Engineered Classification Loss Function : Abstract: We consider a generalization of the classifier-based density-ratio estimation task to a quasiprobabilistic setting where probability densities can be negative. The problem with most loss fun...
GIMLET: Generalizable and Interpretable Model Learning through Embedded Thermodynamics : Abstract: We develop a data-driven framework for discovering constitutive relations in models of fluid flow and scalar transport. Our approach infers unknown closure terms in the governing equations (...
Covariance-Aware Simplex Projection for Cardinality-Constrained Portfolio Optimization : Abstract: Metaheuristic algorithms for cardinality-constrained portfolio optimization require repair operators to map infeasible candidates onto the feasible region. Standard Euclidean projection trea...
A Novel CNN Gradient Boosting Ensemble for Guava Disease Detection : Abstract: As a significant agricultural country, Bangladesh utilizes its fertile land for guava cultivation and dedicated labor to boost its economic development. In a nation like Bangladesh, enhancin...
Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing : Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending ...
Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems : Abstract: Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and n...
Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models : Abstract: Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, obje...
Optimal Anytime-Valid Tests for Composite Nulls : Abstract: We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\...
Deep Eigenspace Network and Its Application to Parametric Non-selfadjoint Eigenvalue Problems : Abstract: We consider operator learning for efficiently solving parametric non-selfadjoint eigenvalue problems. To overcome the spectral instability and mode switching inherent in non-selfadjoint oper...
DS-HGCN: A Dual-Stream Hypergraph Convolutional Network for Predicting Student Engagement via Social Contagion : Abstract: Student engagement is a critical factor influencing academic success and learning outcomes. Accurately predicting student engagement is essential for optimizing teaching strategies and provi...
Optimality-Informed Neural Networks for Solving Parametric Optimization Problems : Abstract: Many engineering tasks require solving families of nonlinear constrained optimization problems, parametrized in setting-specific variables. This is computationally demanding, particularly, i...
KAN-AFT: An Interpretable Nonlinear Survival Model Integrating Kolmogorov-Arnold Networks with Accelerated Failure Time Analysis : Abstract: Survival analysis relies fundamentally on the semi-parametric Cox Proportional Hazards (CoxPH) model and the parametric Accelerated Failure Time (AFT) model. CoxPH assumes constant hazard ra...
Algorithm for Interpretable Graph Features via Motivic Persistent Cohomology : Abstract: We present the Chromatic Persistence Algorithm (CPA), an event-driven method for computing persistent cohomological features of weighted graphs via graphic arrangements, a classical object i...
Top-K Exterior Power Persistent Homology: Algorithm, Structure, and Stability : Abstract: Exterior powers play important roles in persistent homology in computational geometry. In the present paper we study the problem of extracting the $K$ longest intervals of the exterior-power...
Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability : Abstract: Statistical inference in contextual bandits is complicated by the adaptive, non-i.i.d. nature of the data. A growing body of work has shown that classical least-squares inference may fail un...
The Aligned Economic Index & The State Switching Model : Abstract: A growing empirical literature suggests that equity-premium predictability is state dependent, with much of the forecasting power concentrated around recessionary periods \parencite{Henkel20...
ScoreMatchingRiesz: Auto-DML with Infinitesimal Classification : Abstract: This study proposes Riesz representer estimation methods based on score matching. The Riesz representer is a key component in debiased machine learning for constructing $\sqrt{n}$-consistent...
Over-the-Air Goal-Oriented Communications : Abstract: Goal-oriented communications offer an attractive alternative to the Shannon-based communication paradigm, where the data is never reconstructed at the Receiver (RX) side. Rather, focusing on...
Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention : Abstract: We study the problem of learning a low-degree spherical polynomial of degree $\ell_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural ne...
FedPOD: the deployable units of training for federated learning : Abstract: This paper proposes FedPOD (Proportionally Orchestrated Derivative) for optimizing learning efficiency and communication cost in federated learning among multiple clients. Inspired by FedPID...
Deep Learning and Machine Learning -- Python Data Structures and Mathematics Fundamental: From Theory to Practice : Abstract: This book provides a comprehensive introduction to the foundational concepts of machine learning (ML) and deep learning (DL). It bridges the gap between theoretical mathematics and practical...
FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation : Abstract: Parameter-efficient fine-tuning (PEFT) adapts large pre-trained models by updating only a small subset of parameters. Recently, Representation Fine-Tuning (ReFT) has emerged as an effective ...
Estimating Graph Dimension with Cross-validated Eigenvalues : Abstract: In applied multivariate statistics, estimating the number of latent dimensions or the number of clusters, $k$, is a fundamental and recurring problem. We study a sequence of statistics calle...
Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler : Abstract: The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do ...
Boosted Control Functions: Distribution generalization and invariance in confounded models : Abstract: Modern machine learning methods and the availability of large-scale data have significantly advanced our ability to predict target quantities from large sets of covariates. However, these me...
Deep Learning and Machine Learning: Advancing Big Data Analytics and Management with Design Patterns : Abstract: This book, Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management, presents a comprehensive study of essential design patterns tailored for large-scal...
Non-Intrusive Parametrized-Background Data-Weak Reconstruction of Cardiac Displacement Fields from Sparse MRI-like Observations : Abstract: Personalized cardiac diagnostics require accurate reconstruction of myocardial displacement fields from sparse clinical imaging data, yet current methods often demand intrusive access to com...
How well do Large Language Models Recognize Instructional Moves? Establishing Baselines for Foundation Models in Educational Discourse : Abstract: Large language models (LLMs) are increasingly adopted in educational technologies for a variety of tasks, from generating instructional materials and assisting with assessment design to tuto...
Counterfactual LLM-based Framework for Measuring Rhetorical Style : Abstract: The rise of AI has fueled growing concerns about ``hype'' in machine learning papers, yet a reliable way to quantify rhetorical style independently of substantive content has remained elusiv...
PRISM: A Personality-Driven Multi-Agent Framework for Social Media Simulation : Abstract: Traditional agent-based models (ABMs) of opinion dynamics often fail to capture the psychological heterogeneity driving online polarization due to simplistic homogeneity assumptions. This li...
Bias Beneath the Tone: Empirical Characterisation of Tone Bias in LLM-Driven UX Systems : Abstract: Large Language Models are increasingly used in conversational systems such as digital personal assistants, shaping how people interact with technology through language. While their responses...
Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents : Abstract: Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories gr...
A Novel Graph-Sequence Learning Model for Inductive Text Classification : Abstract: Text classification plays an important role in various downstream text-related tasks, such as sentiment analysis, fake news detection, and public opinion analysis. Recently, text classificat...
Multi-hop Reasoning via Early Knowledge Alignment : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for Large Language Models (LLMs) to address knowledge-intensive queries requiring domain-specific or up-to-date inform...
AprielGuard : Abstract: Safeguarding large language models (LLMs) against unsafe or adversarial behavior is critical as they are increasingly deployed in conversational and agentic settings. Existing moderation too...
SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision : Abstract: The parallel advances in language modeling and speech representation learning have raised the prospect of learning language directly from speech without textual intermediates. This requires ...
Can LLMs Solve My Grandma's Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles : Abstract: Large Language Models (LLMs) show impressive performance on many NLP benchmarks, yet their ability to reason in figurative, culturally grounded, and low-resource settings remains underexplor...
Sentiment-Aware Extractive and Abstractive Summarization for Unstructured Text Mining : Abstract: With the rapid growth of unstructured data from social media, reviews, and forums, text mining has become essential in Information Systems (IS) for extracting actionable insights. Summarizat...
Step-DeepResearch Technical Report : Abstract: As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-en...
Reducing Label Dependency in Human Activity Recognition with Wearables: From Supervised Learning to Novel Weakly Self-Supervised Approaches : Abstract: Human activity recognition (HAR) using wearable sensors has advanced through various machine learning paradigms, each with inherent trade-offs between performance and labeling requirements. ...
Synthetic Data Blueprint (SDB): A modular framework for the statistical, structural, and graph-based evaluation of synthetic tabular data : Abstract: In the rapidly evolving era of Artificial Intelligence (AI), synthetic data are widely used to accelerate innovation while preserving privacy and enabling broader data accessibility. However...
Per-Axis Weight Deltas for Frequent Model Updates : Abstract: Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base ...
Sign-Aware Multistate Jaccard Kernels and Geometry for Real and Complex-Valued Signals : Abstract: We introduce a sign-aware, multistate Jaccard/Tanimoto framework that extends overlap-based distances from nonnegative vectors and measures to arbitrary real- and complex-valued signals whil...
Node-Level Financial Optimization in Demand Forecasting Through Dynamic Cost Asymmetry and Feedback Mechanism : Abstract: This work introduces a methodology to adjust forecasts based on node-specific cost function asymmetry. The proposed model generates savings by dynamically incorporating the cost asymmetry in...
End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment : Abstract: This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While...
Out-of-Distribution Detection for Continual Learning: Design Principles and Benchmarking : Abstract: Recent years have witnessed significant progress in the development of machine learning models across a wide range of fields, fueled by increased computational resources, large-scale dataset...
Trend Extrapolation for Technology Forecasting: Leveraging LSTM Neural Networks for Trend Analysis of Space Exploration Vessels : Abstract: Forecasting technological advancement in complex domains such as space exploration presents significant challenges due to the intricate interaction of technical, economic, and policy-related...
Hard Negative Sample-Augmented DPO Post-Training for Small Language Models : Abstract: Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-training pipelines often reduce each generated solution to a binary outcome: correct or incorre...
ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures : Abstract: Backdoor attacks pose a significant threat to the security and reliability of deep learning models. To mitigate such attacks, one promising approach is to learn to extract features from the ...
Exploring Deep-to-Shallow Transformable Neural Networks for Intelligent Embedded Systems : Abstract: Thanks to the evolving network depth, convolutional neural networks (CNNs) have achieved remarkable success across various embedded scenarios, paving the way for ubiquitous embedded intellig...
Leakage-Aware Bandgap Prediction on the JARVIS-DFT Dataset: A Phase-Wise Feature Analysis : Abstract: In this study, we perform a systematic analysis of the JARVIS-DFT bandgap dataset and identify and remove descriptors that may inadvertently encode band-structure information, such as effect...
The Deleuzian Representation Hypothesis : Abstract: We propose an alternative to sparse autoencoders (SAEs) as a simple and effective unsupervised method for extracting interpretable concepts from neural networks. The core idea is to cluster ...
Case Prompting to Mitigate Large Language Model Bias for ICU Mortality Prediction : Abstract: Accurate mortality risk prediction for intensive care unit (ICU) patients is essential for clinical decision-making. Although large language models (LLMs) show promise in predicting outcomes...
OpComm: A Reinforcement Learning Framework for Adaptive Buffer Control in Warehouse Volume Forecasting : Abstract: Accurate forecasting of package volumes at delivery stations is critical for last-mile logistics, where errors lead to inefficient resource allocation, higher costs, and delivery delays. We ...
OASI: Objective-Aware Surrogate Initialization for Multi-Objective Bayesian Optimization in TinyML Keyword Spotting : Abstract: Voice assistants utilize Keyword Spotting (KWS) to enable efficient, privacy-friendly activation. However, realizing accurate KWS models on ultra-low-power TinyML devices (often with less th...
Asia Cup 2025: A Structured T20 Match-Level Dataset and Exploratory Analysis for Cricket Analytics : Abstract: This paper presents a structured and comprehensive dataset corresponding to the 2025 Asia Cup T20 cricket tournament, designed to facilitate data-driven research in sports analytics. The dat...
EdgeFlex-Transformer: Transformer Inference for Edge Devices : Abstract: Deploying large-scale transformer models on edge devices presents significant challenges due to strict constraints on memory, compute, and latency. In this work, we propose a lightweight yet...
On-device Large Multi-modal Agent for Human Activity Recognition : Abstract: Human Activity Recognition (HAR) has been an active area of research, with applications ranging from healthcare to smart environments. The recent advancements in Large Language Models (LLMs)...
DeepBridge: A Unified and Production-Ready Framework for Multi-Dimensional Machine Learning Validation : Abstract: We present DeepBridge, an 80K-line Python library that unifies multi-dimensional validation, automatic compliance verification, knowledge distillation, and synthetic data generation. DeepBri...
Learning to Design City-scale Transit Routes : Abstract: Designing efficient transit route networks is an NP-hard problem with exponentially large solution spaces that traditionally relies on manual planning processes. We present an end-to-end rei...
Reduced Order Modeling for Tsunami Forecasting with Bayesian Hierarchical Pooling : Abstract: Reduced order models (ROM) can represent spatiotemporal processes in significantly fewer dimensions and can be solved many orders faster than their governing partial differential equations (...
Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy : Abstract: This paper introduces a marketing decision framework that converts heterogeneous-treatment uplift into constrained targeting strategies to maximize revenue and retention while honoring busin...
The Seismic Wavefield Common Task Framework : Abstract: Seismology faces fundamental challenges in state forecasting and reconstruction (e.g., earthquake early warning and ground motion prediction) and managing the parametric variability of sourc...
Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis : Abstract: This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices ...
Bloom Filter Encoding for Machine Learning : Abstract: We present a method that uses the Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact, privacy-preserving bit array. This reduces memory use...
LoFT-LLM: Low-Frequency Time-Series Forecasting with Large Language Models : Abstract: Time-series forecasting in real-world applications such as finance and energy often faces challenges due to limited training data and complex, noisy temporal dynamics. Existing deep forecast...
Control Variate Score Matching for Diffusion Models : Abstract: Diffusion models offer a robust framework for sampling from unnormalized probability densities, which requires accurately estimating the score of the noise-perturbed target distribution. Whi...
Orthogonal Activation with Implicit Group-Aware Bias Learning for Class Imbalance : Abstract: Class imbalance is a common challenge in machine learning and data mining, often leading to suboptimal performance in classifiers. While deep learning excels in feature extraction, its perfo...
PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models : Abstract: We introduce $\texttt{PairFlow}$, a lightweight preprocessing step for training Discrete Flow Models (DFMs) to achieve few-step sampling without requiring a pretrained teacher. DFMs have rec...
Jensen-Shannon Divergence Message-Passing for Rich-Text Graph Representation Learning : Abstract: In this paper, we investigate how the widely existing contextual and structural divergence may influence the representation learning in rich-text graphs. To this end, we propose Jensen-Shann...
Information-directed sampling for bandits: a primer : Abstract: The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directe...
Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering : Abstract: Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected return using a given static dataset of transitions. However, offline RL faces the distribution shift pr...
Learning to Reason in LLMs by Expectation Maximization : Abstract: Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive an expectation-maximiza...
NeuralCrop: Combining physics and machine learning for improved crop yield predictions : Abstract: Global gridded crop models (GGCMs) simulate daily crop growth by explicitly representing key biophysical processes and project end-of-season yield time series. They are a primary tool to qua...
Cost-TrustFL: Cost-Aware Hierarchical Federated Learning with Lightweight Reputation Evaluation across Multi-Cloud : Abstract: Federated learning across multi-cloud environments faces critical challenges, including non-IID data distributions, malicious participant detection, and substantial cross-cloud communication...
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning : Abstract: We study offline multitask reinforcement learning in settings where multiple tasks share a low-rank representation of their action-value functions. In this regime, a learner is provided with...
Adaptive Multi-task Learning for Probabilistic Load Forecasting : Abstract: Simultaneous load forecasting across multiple entities (e.g., regions, buildings) is crucial for the efficient, reliable, and cost-effective operation of power systems. Accurate load forecas...
How I Met Your Bias: Investigating Bias Amplification in Diffusion Models : Abstract: Diffusion-based generative models demonstrate state-of-the-art performance across various image synthesis tasks, yet their tendency to replicate and amplify dataset biases remains poorly und...
Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion : Abstract: Multimodal brain decoding aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural languag...
DeepONet-accelerated Bayesian inversion for moving boundary problems : Abstract: This work demonstrates that neural operator learning provides a powerful and flexible framework for building fast, accurate emulators of moving boundary systems, enabling their integration i...
HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training : Abstract: Neural Stochastic Differential Equations (Neural SDEs) provide a principled framework for modeling continuous-time stochastic processes and have been widely adopted in fields ranging from ph...
Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity : Abstract: Mixture-of-Experts (MoE) architectures achieve parameter efficiency through conditional computation, yet contemporary designs suffer from two fundamental limitations: structural parameter is...
FedDPC : Handling Data Heterogeneity and Partial Client Participation in Federated Learning : Abstract: Data heterogeneity is a significant challenge in modern federated learning (FL) as it creates variance in local model updates, causing the aggregated global model to shift away from the true...
Inverse Autoregressive Flows for Zero Degree Calorimeter fast simulation : Abstract: Physics-based machine learning blends traditional science with modern data-driven techniques. Rather than relying exclusively on empirical data or predefined equations, this methodology embe...
Physics-guided Neural Network-based Shaft Power Prediction for Vessels : Abstract: Optimizing maritime operations, particularly fuel consumption for vessels, is crucial, considering its significant share in global trade. As fuel consumption is closely related to the shaft ...
Field-Space Attention for Structure-Preserving Earth System Transformers : Abstract: Accurate and physically consistent modeling of Earth system dynamics requires machine-learning architectures that operate directly on continuous geophysical fields and preserve their underly...
GeoTransolver: Learning Physics on Irregumar Domains Using Multi-scale Geometry Aware Physics Attention Transformer : Abstract: We present GeoTransolver, a Multiscale Geometry-Aware Physics Attention Transformer for CAE that replaces standard attention with GALE, coupling physics-aware self-attention on learned state...
BRIDGE: Budget-aware Reasoning via Intermediate Distillation with Guided Examples : Abstract: Distilling knowledge from large proprietary models (e.g., GPT-4) to tiny deployable models (less than 1B parameters) faces a critical capacity-budget trap: the 1000x capacity gap between tea...
DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics : Abstract: Accurate and interpretable forecasting of multivariate time series is crucial for understanding the complex dynamics of cryptocurrency markets in digital asset systems. Advanced deep learnin...
Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieva : Abstract: Real-world image captions often lack contextual depth, omitting crucial details such as event background, temporal cues, outcomes, and named entities that are not visually discernible. This ...
An Optimal Policy for Learning Controllable Dynamics by Exploration : Abstract: Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the ge...
On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities : Abstract: Large Language Models (LLMs) show significant promise in automating software vulnerability analysis, a critical task given the impact of security failure of modern software systems. However,...
CBA: Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks : Abstract: We propose a communication-bound-aware cross-domain resource assignment framework for pipeline-parallel distributed training over multi-datacenter optical networks, which lowers iteration ti...
QE-Catalytic: A Graph-Language Multimodal Base Model for Relaxed-Energy Prediction in Catalytic Adsorption : Abstract: Adsorption energy is a key descriptor of catalytic reactivity. It is fundamentally defined as the difference between the relaxed total energy of the adsorbate-surface system and that of an a...
Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection : Abstract: Spatio-temporal graph neural networks (ST-GNNs) have achieved notable success in structured domains such as road traffic and public transportation, where spatial entities can be naturally re...
Item Region-based Style Classification Network (IRSN): A Fashion Style Classifier Based on Domain Knowledge of Fashion Experts : Abstract: Fashion style classification is a challenging task because of the large visual variation within the same style and the existence of visually similar styles. Styles are expressed not only b...
ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language : Abstract: As the length of sequential decision-making tasks increases, it becomes computationally impractical to keep full interaction histories in context. We introduce a general framework for LLM ag...
Evolutionary Neural Architecture Search with Dual Contrastive Learning : Abstract: Evolutionary Neural Architecture Search (ENAS) has gained attention for automatically designing neural network architectures. Recent studies use a neural predictor to guide the process, but ...
M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has recently been extended to multimodal settings, connecting multimodal large language models (MLLMs) with vast corpora of external knowledge such as mu...
Retrieval-augmented Prompt Learning for Pre-trained Foundation Models : Abstract: The pre-trained foundation models (PFMs) have become essential for facilitating large-scale multimodal learning. Researchers have effectively employed the ``pre-train, prompt, and predict'' ...
Fun-Audio-Chat Technical Report : Abstract: Recent advancements in joint speech-text models show great potential for seamless voice interactions. However, existing models face critical challenges: temporal resolution mismatch between ...
AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration : Abstract: Large language models (LLMs) have been increasingly deployed in real-world software engineering, fostering the development of code evaluation metrics to study the quality of LLM-generated co...
AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications : Abstract: Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vul...
Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography : Abstract: By integrating language understanding with perceptual modalities such as images, multimodal large language models (MLLMs) constitute a critical substrate for modern AI systems, particularly ...
FaithLens: Detecting and Explaining Faithfulness Hallucination : Abstract: Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarizatio...
Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation : Abstract: Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at...
Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings : Abstract: Speech processing and translation technology have the potential to facilitate meetings of individuals who do not share any common language. To evaluate automatic systems for such a task, a v...
Memory as Resonance: A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds : Abstract: The memory of contemporary Large Language Models is bound by a physical paradox: as they learn, they fill up. The linear accumulation (O(N)) of Key-Value states treats context as a warehouse...
${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations : Abstract: Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision su...
UbiQVision: Quantifying Uncertainty in XAI for Image Recognition : Abstract: Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectu...
SlideTailor: Personalized Presentation Slide Generation for Scientific Papers : Abstract: Automatic presentation slide generation can greatly streamline content creation. However, since preferences of each user may vary, existing under-specified formulations often lead to subopti...
TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation : Abstract: The objective of this paper is to jointly synthesize interactive videos and conversational speech from text and reference images. With the ultimate goal of building human-like conversational...
Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives : Abstract: Growing reliance on LLMs for psychiatric self-assessment raises questions about their ability to interpret qualitative patient narratives. We present the first direct comparison between stat...
KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System : Abstract: Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, ma...
TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning : Abstract: Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improve...
Deep Learning Classification of EEG Responses to Multi-Dimensional Transcranial Electrical Stimulation : Abstract: A major shortcoming of medical practice is the lack of an objective measure of conscious level. Impairment of consciousness is common, e.g. following brain injury and seizures, which can als...
Toward Explaining Large Language Models in Software Engineering Tasks : Abstract: Recent progress in Large Language Models (LLMs) has substantially advanced the automation of software engineering (SE) tasks, enabling complex activities such as code generation and code sum...
Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation : Abstract: Qualitative research faces a critical reliability challenge: traditional inter-rater agreement methods require multiple human coders, are time-intensive, and often yield moderate consistency...
Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning : Abstract: Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distribute...
Identifying Appropriately-Sized Services with Deep Reinforcement Learning : Abstract: Service-based architecture (SBA) has gained attention in industry and academia as a means to modernize legacy systems. It refers to a design style that enables systems to be developed as sui...
AUDRON: A Deep Learning Framework with Fused Acoustic Signatures for Drone Type Recognition : Abstract: Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems prov...
DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning : Abstract: Aligning egocentric video with wearable sensors have shown promise for human action recognition, but face practical limitations in user discomfort, privacy concerns, and scalability. We expl...
Simplifying Multi-Task Architectures Through Task-Specific Normalization : Abstract: Multi-task learning (MTL) aims to leverage shared knowledge across tasks to improve generalization and parameter efficiency, yet balancing resources and mitigating interference remain open c...
Evasion-Resilient Detection of DNS-over-HTTPS Data Exfiltration: A Practical Evaluation and Toolkit : Abstract: The purpose of this project is to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be used by attackers. While providing a reprod...
Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI : Abstract: Accurate segmentation of ischemic stroke lesions from diffusion magnetic resonance imaging (MRI) is essential for clinical decision-making and outcome assessment. Diffusion-Weighted Imaging ...
SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization : Abstract: Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be m...
LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving : Abstract: Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance. Motivated by this gap, we em...
Distilling to Hybrid Attention Models via KL-Guided Layer Selection : Abstract: Distilling pretrained softmax attention Transformers into more efficient hybrid architectures that interleave softmax and linear attention layers is a promising approach for improving the in...
Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs : Abstract: Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully appl...
Performative Policy Gradient: Optimality in Performative Reinforcement Learning : Abstract: Post-deployment machine learning algorithms often influence the environments they act in, and thus shift the underlying dynamics that the standard reinforcement learning (RL) methods ignore....
Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information : Abstract: As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which...
Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs : Abstract: We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark decomposes performance into five ...
Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning : Abstract: Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, t...
cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution : Abstract: Optimizing CUDA kernels is a challenging and labor-intensive task, given the need for hardware-software co-design expertise and the proprietary nature of high-performance kernel libraries. W...
Reduced-order autoregressive dynamics of a complex financial system: a PCA-based approach : Abstract: This study analyzes the dynamic interactions among the NASDAQ index, crude oil, gold, and the US dollar using a reduced-order modeling approach. Time-delay embedding and principal component ...
Improving Local Training in Federated Learning via Temperature Scaling : Abstract: Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, w...
Enhancing Topological Dependencies in Spatio-Temporal Graphs with Cycle Message Passing Blocks : Abstract: Graph Neural Networks (GNNs) and Transformer-based models have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-tem...
Tactile-based Object Retrieval From Granular Media : Abstract: We introduce GEOTACT, the first robotic system capable of grasping and retrieving objects of potentially unknown shapes buried in a granular environment. While important in many applications...
Reinforcement Learning for Unsupervised Video Summarization with Reward Generator Training : Abstract: This paper presents a novel approach for unsupervised video summarization using reinforcement learning (RL), addressing limitations like unstable adversarial training and reliance on heurist...
Explainable deep learning improves human mental models of self-driving cars : Abstract: Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging for the human behind the wheel to accur...
FP=xINT:Representing Neural Networks via Low-Bit Series Basis Functions : Abstract: Post-Training Quantization (PTQ) converts pre-trained Full-Precision (FP) models into quantized versions without training. While existing methods reduce size and computational costs, they al...
Lossless Model Compression via Joint Low-Rank Factorization Optimization : Abstract: Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the or...
Compression for Better: A General and Stable Lossless Compression Framework : Abstract: This work focus on how to stabilize and lossless model compression, aiming to reduce model complexity and enhance efficiency without sacrificing performance due to compression errors. A key ...
Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends : Abstract: Land Surface Temperature (LST) plays a key role in climate monitoring, urban heat assessment, and land-atmosphere interactions. However, current thermal infrared satellite sensors cannot sim...
PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research : Abstract: Advances in LLMs have produced agents with knowledge and operational capabilities comparable to human scientists, suggesting potential to assist, accelerate, and automate research. However, ...
A Branch-and-Price Algorithm for Fast and Equitable Last-Mile Relief Aid Distribution : Abstract: The distribution of relief supplies to shelters is a critical aspect of post-disaster humanitarian logistics. In major disasters, prepositioned supplies often fall short of meeting all deman...
Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs : Abstract: Recent research has explored using very large language models (LLMs) as proxies for humans in tasks such as simulation, surveys, and studies. While LLMs do not possess a human psychology, th...
Zero-Shot Segmentation through Prototype-Guidance for Multi-Label Plant Species Identification : Abstract: This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solut...
FGDCC: Fine-Grained Deep Cluster Categorization -- A Framework for Intra-Class Variability Problems in Plant Classification : Abstract: Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variabilit...
S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test : Abstract: The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints. However, existing evaluations fail...
Discovering Lie Groups with Flow Matching : Abstract: Symmetry is fundamental to understanding physical systems, and at the same time, can improve performance and sample efficiency in machine learning. Both pursuits require knowledge of the und...
Learning Skills from Action-Free Videos : Abstract: Learning from videos offers a promising path toward generalist robots by providing rich visual and temporal priors beyond what real robot datasets contain. While existing video generative mo...
Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach : Abstract: As Earth's climate changes, it is impacting disasters and extreme weather events across the planet. Record-breaking heat waves, drenching rainfalls, extreme wildfires, and widespread floodin...
Scaling Reinforcement Learning for Content Moderation with Large Language Models : Abstract: Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for po...
Reason2Decide: Rationale-Driven Multi-Task Learning : Abstract: Despite the wide adoption of Large Language Models (LLM)s, clinical decision support systems face a critical challenge: achieving high predictive accuracy while generating explanations align...
Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches : Abstract: Financial sentiment analysis plays a crucial role in informing investment decisions, assessing market risk, and predicting stock price trends. Existing works in financial sentiment analysis ...
MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization : Abstract: Molecular editing and optimization are multi-step problems that require iteratively improving properties while keeping molecules chemically valid and structurally similar. We frame both task...
Enhancing Zero-Shot Time Series Forecasting in Off-the-Shelf LLMs via Noise Injection : Abstract: Large Language Models (LLMs) have demonstrated effectiveness as zero-shot time series (TS) forecasters. The key challenge lies in tokenizing TS data into textual representations that align w...
A Bidirectional Gated Recurrent Unit Model for PUE Prediction in Data Centers : Abstract: Data centers account for significant global energy consumption and a carbon footprint. The recent increasing demand for edge computing and AI advancements drives the growth of data center st...
Concept Generalization in Humans and Large Language Models: Insights from the Number Game : Abstract: We compare human and large language model (LLM) generalization in the number game, a concept inference task. Using a Bayesian model as an analytical framework, we examined the inductive bias...
Offline Safe Policy Optimization From Heterogeneous Feedback : Abstract: Offline Preference-based Reinforcement Learning (PbRL) learns rewards and policies aligned with human preferences without the need for extensive reward engineering and direct interaction wit...
TongSIM: A General Platform for Simulating Intelligent Machines : Abstract: As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex ...
MemR$^3$: Memory Retrieval via Reflective Reasoning for LLM Agents : Abstract: Memory systems have been designed to leverage past experiences in Large Language Model (LLM) agents. However, many deployed memory systems primarily optimize compression and storage, with co...
Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks : Abstract: As networks evolve toward 5G Standalone and 6G, operators face orchestration challenges that exceed the limits of static automation and Deep Reinforcement Learning. Although Large Language M...
ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge : Abstract: Vision-Language-Action (VLA) models have emerged as a unified paradigm for robotic perception and control, enabling emergent generalization and long-horizon task execution. However, their de...
Synthesizing Procedural Memory: Challenges and Architectures in Automated Workflow Generation : Abstract: While CodeMem establishes executable code as the optimal representation for agentic procedural memory, the mechanism for autonomously synthesizing this memory from a blank slate remains unde...
SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization : Abstract: Generative artificial intelligence has revolutionized the exploration of chemical space, yet a critical bottleneck remains that a substantial fraction of generated molecules is synthetically...
A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice : Abstract: A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promis...
Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems : Abstract: We propose a Vision-Language Simulation Model (VLSM) that unifies visual and textual understanding to synthesize executable FlexScript from layout sketches and natural-language prompts, enab...
Bohrium + SciMaster: Building the Infrastructure and Ecosystem for Agentic Science at Scale : Abstract: AI agents are emerging as a practical way to run multi-step scientific workflows that interleave reasoning with tool use and verification, pointing to a shift from isolated AI-assisted steps...
Benchmarking LLMs for Predictive Applications in the Intensive Care Units : Abstract: With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study...
Advancing Multimodal Teacher Sentiment Analysis:The Large-Scale T-MED Dataset & The Effective AAM-TSA Model : Abstract: Teachers' emotional states are critical in educational scenarios, profoundly impacting teaching efficacy, student engagement, and learning achievements. However, existing studies often fail ...
Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent : Abstract: Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether ch...
LongVideoAgent: Multi-Agent Reasoning with Long Videos : Abstract: Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into l...
QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning : Abstract: Open Radio Access Network (O RAN) disaggregates conventional RAN into interoperable components, enabling flexible resource allocation, energy savings, and agile architectural design. In lega...
Automated Fault Detection in 5G Core Networks Using Large Language Models : Abstract: With the rapid growth of data volume in modern telecommunication networks and the continuous expansion of their scale, maintaining high reliability has become a critical requirement. These n...
Large Language Models for EDA Cloud Job Resource and Lifetime Prediction : Abstract: The rapid growth of cloud computing in the Electronic Design Automation (EDA) industry has created a critical need for resource and job lifetime prediction to achieve optimal scheduling. Tra...
Generative AI for Analysts : Abstract: We study how generative artificial intelligence (AI) transforms the work of financial analysts. Using the 2023 launch of FactSet's AI platform as a natural experiment, we find that adoption ...
Bidirectional human-AI collaboration in brain tumour assessments improves both expert human and AI agent performance : Abstract: The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an ...
PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility : Abstract: Connected autonomous vehicles (CAVs) rely on vision-based deep neural networks (DNNs) and low-latency (Vehicle-to-Everything) V2X communication to navigate safely and efficiently. Despite th...
Development and external validation of a multimodal artificial intelligence mortality prediction model of critically ill patients using multicenter data : Abstract: Early prediction of in-hospital mortality in critically ill patients can aid clinicians in optimizing treatment. The objective was to develop a multimodal deep learning model, using structur...
Thermodynamic Focusing for Inference-Time Search: Practical Methods for Target-Conditioned Sampling and Prompted Inference : Abstract: Finding rare but useful solutions in very large candidate spaces is a recurring practical challenge across language generation, planning, and reinforcement learning. We present a practical f...
Multiscale Dual-path Feature Aggregation Network for Remaining Useful Life Prediction of Lithium-Ion Batteries : Abstract: Targeted maintenance strategies, ensuring the dependability and safety of industrial machinery. However, current modeling techniques for assessing both local and global correlation of batter...
Tiny, On-Device Decision Makers with the MiniConv Library : Abstract: Reinforcement learning (RL) has achieved strong results, but deploying visual policies on resource-constrained edge devices remains challenging due to computational cost and communication la...
High-Performance Self-Supervised Learning by Joint Training of Flow Matching : Abstract: Diffusion models can learn rich representations during data generation, showing potential for Self-Supervised Learning (SSL), but they face a trade-off between generative quality and discrim...
CoPHo: Classifier-guided Conditional Topology Generation with Persistent Homology : Abstract: The structure of topology underpins much of the research on performance and robustness, yet available topology data are typically scarce, necessitating the generation of synthetic graphs wit...
Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach : Abstract: Reliable prediction of train delays is essential for enhancing the robustness and efficiency of railway transportation systems. In this work, we reframe delay forecasting as a stochastic sim...
From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning : Abstract: Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-...
QMBench: A Research Level Benchmark for Quantum Materials Research : Abstract: We introduce QMBench, a comprehensive benchmark designed to evaluate the capability of large language model agents in quantum materials research. This specialized benchmark assesses the mode...
Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models : Abstract: In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization and excellent detection performance. Howev...
How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts : Abstract: Finding the optimal configuration of Sparse Mixture-ofExperts (SMoE) that maximizes semantic differentiation among experts is essential for exploiting the full potential of MoE architectures...
A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows : Abstract: Building deployment-ready LLM agents requires complex orchestration of tools, data sources, and control flow logic, yet existing systems tightly couple agent logic to specific programming la...
A K-Means, Ward and DBSCAN repeatability study : Abstract: Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise ident...
Learned Digital Codes for Over-the-Air Computation in Federated Edge Learning : Abstract: Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communi...
UCCL-EP: Portable Expert-Parallel Communication : Abstract: Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but...
HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data : Abstract: Unstructured notes within the electronic health record (EHR) contain rich clinical information vital for cancer treatment decision making and research, yet reliably extracting structured onc...
Fine-Tuned In-Context Learners for Efficient Adaptation : Abstract: When adapting large language models (LLMs) to a specific downstream task, two primary approaches are commonly employed: (1) prompt engineering, often with in-context few-shot learning, lever...
Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling : Abstract: Recent developments in large language models have shown advantages in reallocating a notable share of computational resource from training time to inference time. However, the principles beh...
Modeling Non-Ergodic Path Effects Using Conditional Generative Model for Fourier Amplitude Spectra : Abstract: Recent developments in non-ergodic ground-motion models (GMMs) explicitly model systematic spatial variations in source, site, and path effects, reducing standard deviation to 30-40% of ergo...
A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones : Abstract: Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existi...
Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning : Abstract: LLM deployment in critical domains is currently impeded by persistent hallucinations--generating plausible but factually incorrect assertions. While scaling laws drove significant improvemen...
Unified Brain Surface and Volume Registration : Abstract: Accurate registration of brain MRI scans is fundamental for cross-subject analysis in neuroscientific studies. This involves aligning both the cortical surface of the brain and the interior ...
Vehicle-centric Perception via Multimodal Structured Pre-training : Abstract: Vehicle-centric perception plays a crucial role in many intelligent systems, including large-scale surveillance systems, intelligent transportation, and autonomous driving. Existing approach...
Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress : Abstract: Machine learning models used in financial decision systems operate in nonstationary economic environments, yet adversarial robustness is typically evaluated under static assumptions. This wo...
Block-Recurrent Dynamics in Vision Transformers : Abstract: As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical str...
How Much 3D Do Video Foundation Models Encode? : Abstract: Videos are continuous 2D projections of 3D worlds. After training on large video data, will global 3D understanding naturally emerge? We study this by quantifying the 3D understanding of exi...
Regression of Functions by Quantum Neural Networks Circuits : Abstract: The performance of quantum neural network models depends strongly on architectural decisions, including circuit depth, placement of parametrized operations, and data-encoding strategies. Sel...
Neuron-Guided Interpretation of Code LLMs: Where, Why, and How? : Abstract: Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source co...
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models : Abstract: Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We ad...
IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense : Abstract: Since the Internet of Things (IoT) is widely adopted using Android applications, detecting malicious Android apps is essential. In recent years, Android graph-based deep learning research ha...
Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting : Abstract: While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup", where the robot must act on one specific i...

Research Sources: 295 | Generated: 12/25/2025