AI RESEARCH PAPERS & ACADEMIC SOURCES
- TropNNC: Structured Neural Network Compression Using Tropical Geometry : Abstract: We present TropNNC, a framework for compressing neural networks with linear and convolutional layers and ReLU activations using tropical geometry. By representing a network's output as a tro...
- UniMPR: A Unified Framework for Multimodal Place Recognition with Heterogeneous Sensor Configurations : Abstract: Place recognition is a critical component of autonomous vehicles and robotics, enabling global localization in GPS-denied environments. Recent advances have spurred significant interest in m...
- Memorize-and-Generate: Towards Long-Term Consistency in Real-Time Video Generation : Abstract: Frame-level autoregressive (frame-AR) models have achieved significant progress, enabling real-time video generation comparable to bidirectional diffusion models and serving as a foundation ...
- Neural Implicit Heart Coordinates: 3D cardiac shape reconstruction from sparse segmentations : Abstract: Accurate reconstruction of cardiac anatomy from sparse clinical images remains a major challenge in patient-specific modeling. While neural implicit functions have previously been applied to...
- SLIM: Semantic-based Low-bitrate Image compression for Machines by leveraging diffusion : Abstract: In recent years, the demand of image compression models for machine vision has increased dramatically. However, the training frameworks of image compression still focus on the vision of huma...
- LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry : Abstract: Trajectory planning in unstructured environments is a fundamental and challenging capability for mobile robots. Traditional modular pipelines suffer from latency and cascading errors across ...
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits : Abstract: Large language models (LLMs) generate fluent and complex outputs but often fail to recognize their own mistakes and hallucinations. Existing approaches typically rely on external judges, mul...
- MoE-DiffuSeq: Enhancing Long-Document Diffusion Models with Sparse Attention and Mixture of Experts : Abstract: We present MoE-DiffuSeq, a mixture of experts based framework for enhancing diffusion models in long document generation. Existing diffusion based text generation models, such as DiffuSeq, s...
- Towards Natural Language-Based Document Image Retrieval: New Dataset and Benchmark : Abstract: Document image retrieval (DIR) aims to retrieve document images from a gallery according to a given query. Existing DIR methods are primarily based on image queries that retrieve documents w...
- Coherence in the brain unfolds across separable temporal regimes : Abstract: Coherence in language requires the brain to satisfy two competing temporal demands: gradual accumulation of meaning across extended context and rapid reconfiguration of representations at ev...
- Making Large Language Models Efficient Dense Retrievers : Abstract: Recent work has shown that directly fine-tuning large language models (LLMs) for dense retrieval yields strong performance, but their substantial parameter counts make them computationally i...
- Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Object-Oriented Programming : Abstract: Object-Oriented Programming (OOP) has become a crucial paradigm for managing the growing complexity of modern software systems, particularly in fields like machine learning, deep learning, l...
- Generating the Past, Present and Future from a Motion-Blurred Image : Abstract: We seek to answer the question: what can a motion-blurred image reveal about a scene's past, present, and future? Although motion blur obscures image details and degrades visual quality, it ...
- Learning to Refocus with Video Diffusion Models : Abstract: Focus is a cornerstone of photography, yet autofocus systems often fail to capture the intended subject, and users frequently wish to adjust focus after capture. We introduce a novel method ...
- RANSAC Scoring Functions: Analysis and Reality Check : Abstract: We revisit the problem of assigning a score (a quality of fit) to candidate geometric models -- one of the key components of RANSAC for robust geometric fitting. In a non-robust setting, the...
- HyGE-Occ: Hybrid View-Transformation with 3D Gaussian and Edge Priors for 3D Panoptic Occupancy Prediction : Abstract: 3D Panoptic Occupancy Prediction aims to reconstruct a dense volumetric scene map by predicting the semantic class and instance identity of every occupied region in 3D space. Achieving such ...
- Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs : Abstract: User interface to code (UI2Code) aims to generate executable code that can faithfully reconstruct a given input UI. Prior work focuses largely on web pages and mobile screens, leaving app wi...
- SE360: Semantic Edit in 360$^\circ$ Panoramas via Hierarchical Data Construction : Abstract: While instruction-based image editing is emerging, extending it to 360$^\circ$ panoramas introduces additional challenges. Existing methods often produce implausible results in both equirect...
- HistoWAS: A Pathomics Framework for Large-Scale Feature-Wide Association Studies of Tissue Topology and Patient Outcomes : Abstract: High-throughput "pathomic" analysis of Whole Slide Images (WSIs) offers new opportunities to study tissue characteristics and for biomarker discovery. However, the clinical relevance of the ...
- WSD-MIL: Window Scale Decay Multiple Instance Learning for Whole Slide Image Classification : Abstract: In recent years, the integration of pre-trained foundational models with multiple instance learning (MIL) has improved diagnostic accuracy in computational pathology. However, existing MIL m...
- A Dual-Branch Local-Global Framework for Cross-Resolution Land Cover Mapping : Abstract: Cross-resolution land cover mapping aims to produce high-resolution semantic predictions from coarse or low-resolution supervision, yet the severe resolution mismatch makes effective learnin...
- Few-Shot-Based Modular Image-to-Video Adapter for Diffusion Models : Abstract: Diffusion models (DMs) have recently achieved impressive photorealism in image and video generation. However, their application to image animation remains limited, even when trained on large...
- PaveSync: A Unified and Comprehensive Dataset for Pavement Distress Analysis and Classification : Abstract: Automated pavement defect detection often struggles to generalize across diverse real-world conditions due to the lack of standardized datasets. Existing datasets differ in annotation styles...
- SegEarth-R2: Towards Comprehensive Language-guided Segmentation for Remote Sensing Images : Abstract: Effectively grounding complex language to pixels in remote sensing (RS) images is a critical challenge for applications like disaster response and environmental monitoring. Current models ca...
- A Contextual Analysis of Driver-Facing and Dual-View Video Inputs for Distraction Detection in Naturalistic Driving Environments : Abstract: Despite increasing interest in computer vision-based distracted driving detection, most existing models rely exclusively on driver-facing views and overlook crucial environmental context tha...
- MAPI-GNN: Multi-Activation Plane Interaction Graph Neural Network for Multimodal Medical Diagnosis : Abstract: Graph neural networks are increasingly applied to multimodal medical diagnosis for their inherent relational modeling capabilities. However, their efficacy is often compromised by the prevai...
- $\text{H}^2$em: Learning Hierarchical Hyperbolic Embeddings for Compositional Zero-Shot Learning : Abstract: Compositional zero-shot learning (CZSL) aims to recognize unseen state-object compositions by generalizing from a training set of their primitives (state and object). Current methods often o...
- VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement : Abstract: Visual Speech Recognition aims to transcribe spoken words from silent lip-motion videos. This task is particularly challenging for Mandarin, as visemes are highly ambiguous and homophones ar...
- FlashLips: 100-FPS Mask-Free Latent Lip-Sync using Reconstruction Instead of Diffusion or GANs : Abstract: We present FlashLips, a two-stage, mask-free lip-sync system that decouples lips control from rendering and achieves real-time performance running at over 100 FPS on a single GPU, while matc...
- Progressive Learned Image Compression for Machine Perception : Abstract: Recent advances in learned image codecs have been extended from human perception toward machine perception. However, progressive image compression with fine granular scalability (FGS)-which ...
- Effect of Activation Function and Model Optimizer on the Performance of Human Activity Recognition System Using Various Deep Learning Models : Abstract: Human Activity Recognition (HAR) plays a vital role in healthcare, surveillance, and innovative environments, where reliable action recognition supports timely decision-making and automation...
- LiDARDraft: Generating LiDAR Point Cloud from Versatile Inputs : Abstract: Generating realistic and diverse LiDAR point clouds is crucial for autonomous driving simulation. Although previous methods achieve LiDAR point cloud generation from user inputs, they strugg...
- UMAMI: Unifying Masked Autoregressive Models and Deterministic Rendering for View Synthesis : Abstract: Novel view synthesis (NVS) seeks to render photorealistic, 3D-consistent images of a scene from unseen camera poses given only a sparse set of posed views. Existing deterministic networks re...
- Multi Modal Attention Networks with Uncertainty Quantification for Automated Concrete Bridge Deck Delamination Detection : Abstract: Deteriorating civil infrastructure requires automated inspection techniques overcoming limitations of visual assessment. While Ground Penetrating Radar and Infrared Thermography enable subsu...
- DDAVS: Disentangled Audio Semantics and Delayed Bidirectional Alignment for Audio-Visual Segmentation : Abstract: Audio-Visual Segmentation (AVS) aims to localize sound-producing objects at the pixel level by jointly leveraging auditory and visual information. However, existing methods often suffer from...
- HEART-VIT: Hessian-Guided Efficient Dynamic Attention and Token Pruning in Vision Transformer : Abstract: Vision Transformers (ViTs) deliver state-of-the-art accuracy but their quadratic attention cost and redundant computations severely hinder deployment on latency and resource-constrained plat...
- milliMamba: Specular-Aware Human Pose Estimation via Dual mmWave Radar with Multi-Frame Mamba Fusion : Abstract: Millimeter-wave radar offers a privacy-preserving and lighting-invariant alternative to RGB sensors for Human Pose Estimation (HPE) task. However, the radar signals are often sparse due to s...
- Enhancing annotations for 5D apple pose estimation through 3D Gaussian Splatting (3DGS) : Abstract: Automating tasks in orchards is challenging because of the large amount of variation in the environment and occlusions. One of the challenges is apple pose estimation, where key points, such...
- CoDi -- an exemplar-conditioned diffusion model for low-shot counting : Abstract: Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern ...
- AMoE: Agglomerative Mixture-of-Experts Vision Foundation Model : Abstract: Vision foundation models trained via multi-teacher distillation offer a promising path toward unified visual representations, yet the learning dynamics and data efficiency of such approaches...
- Generative Latent Coding for Ultra-Low Bitrate Image Compression : Abstract: Most existing image compression approaches perform transform coding in the pixel space to reduce its spatial redundancy. However, they encounter difficulties in achieving both high-realism a...
- JDPNet: A Network Based on Joint Degradation Processing for Underwater Image Enhancement : Abstract: Given the complexity of underwater environments and the variability of water as a medium, underwater images are inevitably subject to various types of degradation. The degradations present n...
- LiteFusion: Taming 3D Object Detectors from Vision-Based to Multi-Modal with Minimal Adaptation : Abstract: 3D object detection is fundamental for safe and robust intelligent transportation systems. Current multi-modal 3D object detectors often rely on complex architectures and training strategies...
- IndicDLP: A Foundational Dataset for Multi-Lingual and Multi-Domain Document Layout Parsing : Abstract: Document layout analysis is essential for downstream tasks such as information retrieval, extraction, OCR, and digitization. However, existing large-scale datasets like PubLayNet and DocBank...
- Degradation-Aware Metric Prompting for Hyperspectral Image Restoration : Abstract: Unified hyperspectral image (HSI) restoration aims to recover various degraded HSIs using a single model, offering great practical value. However, existing methods often depend on explicit d...
- BiCoR-Seg: Bidirectional Co-Refinement Framework for High-Resolution Remote Sensing Image Segmentation : Abstract: High-resolution remote sensing image semantic segmentation (HRSS) is a fundamental yet critical task in the field of Earth observation. However, it has long faced the challenges of high inte...
- LADLE-MM: Limited Annotation based Detector with Learned Ensembles for Multimodal Misinformation : Abstract: With the rise of easily accessible tools for generating and manipulating multimedia content, realistic synthetic alterations to digital media have become a widespread threat, often involving...
- The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection : Abstract: Although diffusion transformer (DiT)-based video virtual try-on (VVT) has made significant progress in synthesizing realistic videos, existing methods still struggle to capture fine-grained ...
- CRAFT: Continuous Reasoning and Agentic Feedback Tuning for Multimodal Text-to-Image Generation : Abstract: Recent work has shown that inference-time reasoning and reflection can improve text-to-image generation without retraining. However, existing approaches often rely on implicit, holistic crit...
- Linking Faces and Voices Across Languages: Insights from the FAME 2026 Challenge : Abstract: Over half of the world's population is bilingual and people often communicate under multilingual scenarios. The Face-Voice Association in Multilingual Environments (FAME) 2026 Challenge, hel...
- SmartSplat: Feature-Smart Gaussians for Scalable Compression of Ultra-High-Resolution Images : Abstract: Recent advances in generative AI have accelerated the production of ultra-high-resolution visual content, posing significant challenges for efficient compression and real-time decoding on en...
- Chain-of-Anomaly Thoughts with Large Vision-Language Models : Abstract: Automated video surveillance with Large Vision-Language Models is limited by their inherent bias towards normality, often failing to detect crimes. While Chain-of-Thought reasoning strategie...
- Skin Lesion Classification Using a Soft Voting Ensemble of Convolutional Neural Networks : Abstract: Skin cancer can be identified by dermoscopic examination and ocular inspection, but early detection significantly increases survival chances. Artificial intelligence (AI), using annotated sk...
- High Dimensional Data Decomposition for Anomaly Detection of Textured Images : Abstract: In the realm of diverse high-dimensional data, images play a significant role across various processes of manufacturing systems where efficient image anomaly detection has emerged as a core ...
- Beyond Motion Pattern: An Empirical Study of Physical Forces for Human Motion Understanding : Abstract: Human motion understanding has advanced rapidly through vision-based progress in recognition, tracking, and captioning. However, most existing methods overlook physical cues such as joint ac...
- UTDesign: A Unified Framework for Stylized Text Editing and Generation in Graphic Design Images : Abstract: AI-assisted graphic design has emerged as a powerful tool for automating the creation and editing of design elements such as posters, banners, and advertisements. While diffusion-based text-...
- Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems : Abstract: Landmines remain a persistent humanitarian threat, with 110 million actively deployed mines across 60 countries, claiming 26,000 casualties annually. This research evaluates adaptive Red-Gre...
- Bridging Modalities and Transferring Knowledge: Enhanced Multimodal Understanding and Recognition : Abstract: This manuscript explores multimodal alignment, translation, fusion, and transference to enhance machine understanding of complex inputs. We organize the work into five chapters, each address...
- SirenPose: Dynamic Scene Reconstruction via Geometric Supervision : Abstract: We introduce SirenPose, a geometry-aware loss formulation that integrates the periodic activation properties of sinusoidal representation networks with keypoint-based geometric supervision, ...
- AlignPose: Generalizable 6D Pose Estimation via Multi-view Feature-metric Alignment : Abstract: Single-view RGB model-based object pose estimation methods achieve strong generalization but are fundamentally limited by depth ambiguity, clutter, and occlusions. Multi-view pose estimation...
- Multi-Grained Text-Guided Image Fusion for Multi-Exposure and Multi-Focus Scenarios : Abstract: Image fusion aims to synthesize a single high-quality image from a pair of inputs captured under challenging conditions, such as differing exposure levels or focal depths. A core challenge l...
- Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models : Abstract: Vision-language models (VLM) excel at general understanding yet remain weak at dynamic spatial reasoning (DSR), i.e., reasoning about the evolvement of object geometry and relationship in 3D...
- FlashVLM: Text-Guided Visual Token Selection for Large Multimodal Models : Abstract: Large vision-language models (VLMs) typically process hundreds or thousands of visual tokens per image or video frame, incurring quadratic attention cost and substantial redundancy. Existing...
- Repurposing Video Diffusion Transformers for Robust Point Tracking : Abstract: Point tracking aims to localize corresponding points across video frames, serving as a fundamental task for 4D reconstruction, robotics, and video editing. Existing methods commonly rely on ...
- Active Intelligence in Video Avatars via Closed-loop World Modeling : Abstract: Current video avatar generation methods excel at identity preservation and motion alignment but lack genuine agency, they cannot autonomously pursue long-term goals through adaptive environm...
- SpatialTree: How Spatial Abilities Branch Out in MLLMs : Abstract: Cognitive science suggests that spatial ability develops progressively-from perception to reasoning and interaction. Yet in multimodal LLMs (MLLMs), this hierarchy remains poorly understood,...
- SemanticGen: Video Generation in Semantic Space : Abstract: State-of-the-art video generative models typically learn the distribution of video latents in the VAE space and map them to pixels using a VAE decoder. While this approach can generate high-...
- SAM Audio: Segment Anything in Audio : Abstract: General audio source separation is a key capability for multimodal AI systems that can perceive and reason about sound. Despite substantial progress in recent years, existing separation mode...
- Dreamcrafter: Immersive Editing of 3D Radiance Fields Through Flexible, Generative Inputs and Outputs : Abstract: Authoring 3D scenes is a central task for spatial computing applications. Competing visions for lowering existing barriers are (1) focus on immersive, direct manipulation of 3D content or (2...
- CLIP Based Region-Aware Feature Fusion for Automated BBPS Scoring in Colonoscopy Images : Abstract: Accurate assessment of bowel cleanliness is essential for effective colonoscopy procedures. The Boston Bowel Preparation Scale (BBPS) offers a standardized scoring system but suffers from su...
- Snapshot 3D image projection using a diffractive decoder : Abstract: 3D image display is essential for next-generation volumetric imaging; however, dense depth multiplexing for 3D image projection remains challenging because diffraction-induced cross-talk rap...
- Machine Learning to Predict Digital Frustration from Clickstream Data : Abstract: Many businesses depend on their mobile apps and websites, so user frustration while trying to complete a task on these channels can cause lost sales and complaints. In this research, I use c...
- Recurrent Off-Policy Deep Reinforcement Learning Doesn't Have to be Slow : Abstract: Recurrent off-policy deep reinforcement learning models achieve state-of-the-art performance but are often sidelined due to their high computational demands. In response, we introduce RISE (...
- Explainable time-series forecasting with sampling-free SHAP for Transformers : Abstract: Time-series forecasts are essential for planning and decision-making in many domains. Explainability is key to building user trust and meeting transparency requirements. Shapley Additive Exp...
- Improving ML Training Data with Gold-Standard Quality Metrics : Abstract: Hand-tagged training data is essential to many machine learning tasks. However, training data quality control has received little attention in the literature, despite data quality varying co...
- Relu and softplus neural nets as zero-sum turn-based games : Abstract: We show that the output of a ReLU neural network can be interpreted as the value of a zero-sum, turn-based, stopping game, which we call the ReLU net game. The game runs in the direction opp...
- Saddle-to-Saddle Dynamics Explains A Simplicity Bias Across Neural Network Architectures : Abstract: Neural networks trained with gradient descent often learn solutions of increasing complexity over time, a phenomenon known as simplicity bias. Despite being widely observed across architectu...
- ASK: Adaptive Self-improving Knowledge Framework for Audio Text Retrieval : Abstract: The dominant paradigm for Audio-Text Retrieval (ATR) relies on mini-batch-based contrastive learning. This process, however, is inherently limited by what we formalize as the Gradient Locali...
- Chemically-Informed Machine Learning Approach for Prediction of Reactivity Ratios in Radical Copolymerization : Abstract: Predicting monomer reactivity ratios is crucial for controlling monomer sequence distribution in copolymers and their properties. Traditional experimental methods of determining reactivity r...
- NMIRacle: Multi-modal Generative Molecular Elucidation from IR and NMR Spectra : Abstract: Molecular structure elucidation from spectroscopic data is a long-standing challenge in Chemistry, traditionally requiring expert interpretation. We introduce NMIRacle, a two-stage generativ...
- Robust Causal Directionality Inference in Quantum Inference under MNAR Observation and High-Dimensional Noise : Abstract: In quantum mechanics, observation actively shapes the system, paralleling the statistical notion of Missing Not At Random (MNAR). This study introduces a unified framework for \textbf{robust...
- Fundamentals of quantum Boltzmann machine learning with visible and hidden units : Abstract: One of the primary applications of classical Boltzmann machines is generative modeling, wherein the goal is to tune the parameters of a model distribution so that it closely approximates a t...
- Efficient Learning of Lattice Gauge Theories with Fermions : Abstract: We introduce a learning method for recovering action parameters in lattice field theories. Our method is based on the minimization of a convex loss function constructed using the Schwinger-D...
- Detecting cyberbullying in Spanish texts through deep learning techniques : Abstract: Recent recollected data suggests that it is possible to automatically detect events that may negatively affect the most vulnerable parts of our society, by using any communication technology...
- Quasiprobabilistic Density Ratio Estimation with a Reverse Engineered Classification Loss Function : Abstract: We consider a generalization of the classifier-based density-ratio estimation task to a quasiprobabilistic setting where probability densities can be negative. The problem with most loss fun...
- GIMLET: Generalizable and Interpretable Model Learning through Embedded Thermodynamics : Abstract: We develop a data-driven framework for discovering constitutive relations in models of fluid flow and scalar transport. Our approach infers unknown closure terms in the governing equations (...
- Covariance-Aware Simplex Projection for Cardinality-Constrained Portfolio Optimization : Abstract: Metaheuristic algorithms for cardinality-constrained portfolio optimization require repair operators to map infeasible candidates onto the feasible region. Standard Euclidean projection trea...
- A Novel CNN Gradient Boosting Ensemble for Guava Disease Detection : Abstract: As a significant agricultural country, Bangladesh utilizes its fertile land for guava cultivation and dedicated labor to boost its economic development. In a nation like Bangladesh, enhancin...
- Semiparametric KSD test: unifying score and distance-based approaches for goodness-of-fit testing : Abstract: Goodness-of-fit (GoF) tests are fundamental for assessing model adequacy. Score-based tests are appealing because they require fitting the model only once under the null. However, extending ...
- Reliable LLM-Based Edge-Cloud-Expert Cascades for Telecom Knowledge Systems : Abstract: Large language models (LLMs) are emerging as key enablers of automation in domains such as telecommunications, assisting with tasks including troubleshooting, standards interpretation, and n...
- Gaussian Process Assisted Meta-learning for Image Classification and Object Detection Models : Abstract: Collecting operationally realistic data to inform machine learning models can be costly. Before collecting new data, it is helpful to understand where a model is deficient. For example, obje...
- Optimal Anytime-Valid Tests for Composite Nulls : Abstract: We consider the problem of designing optimal level-$α$ power-one tests for composite nulls. Given a parameter $α\in (0,1)$ and a stream of $\mathcal{X}$-valued observations $\{X_n: n \geq 1\...
- Deep Eigenspace Network and Its Application to Parametric Non-selfadjoint Eigenvalue Problems : Abstract: We consider operator learning for efficiently solving parametric non-selfadjoint eigenvalue problems. To overcome the spectral instability and mode switching inherent in non-selfadjoint oper...
- DS-HGCN: A Dual-Stream Hypergraph Convolutional Network for Predicting Student Engagement via Social Contagion : Abstract: Student engagement is a critical factor influencing academic success and learning outcomes. Accurately predicting student engagement is essential for optimizing teaching strategies and provi...
- Optimality-Informed Neural Networks for Solving Parametric Optimization Problems : Abstract: Many engineering tasks require solving families of nonlinear constrained optimization problems, parametrized in setting-specific variables. This is computationally demanding, particularly, i...
- KAN-AFT: An Interpretable Nonlinear Survival Model Integrating Kolmogorov-Arnold Networks with Accelerated Failure Time Analysis : Abstract: Survival analysis relies fundamentally on the semi-parametric Cox Proportional Hazards (CoxPH) model and the parametric Accelerated Failure Time (AFT) model. CoxPH assumes constant hazard ra...
- Algorithm for Interpretable Graph Features via Motivic Persistent Cohomology : Abstract: We present the Chromatic Persistence Algorithm (CPA), an event-driven method for computing persistent cohomological features of weighted graphs via graphic arrangements, a classical object i...
- Top-K Exterior Power Persistent Homology: Algorithm, Structure, and Stability : Abstract: Exterior powers play important roles in persistent homology in computational geometry. In the present paper we study the problem of extracting the $K$ longest intervals of the exterior-power...
- Avoiding the Price of Adaptivity: Inference in Linear Contextual Bandits via Stability : Abstract: Statistical inference in contextual bandits is complicated by the adaptive, non-i.i.d. nature of the data. A growing body of work has shown that classical least-squares inference may fail un...
- The Aligned Economic Index & The State Switching Model : Abstract: A growing empirical literature suggests that equity-premium predictability is state dependent, with much of the forecasting power concentrated around recessionary periods \parencite{Henkel20...
- ScoreMatchingRiesz: Auto-DML with Infinitesimal Classification : Abstract: This study proposes Riesz representer estimation methods based on score matching. The Riesz representer is a key component in debiased machine learning for constructing $\sqrt{n}$-consistent...
- Over-the-Air Goal-Oriented Communications : Abstract: Goal-oriented communications offer an attractive alternative to the Shannon-based communication paradigm, where the data is never reconstructed at the Receiver (RX) side. Rather, focusing on...
- Shallow Neural Networks Learn Low-Degree Spherical Polynomials with Learnable Channel Attention : Abstract: We study the problem of learning a low-degree spherical polynomial of degree $\ell_0 = Θ(1) \ge 1$ defined on the unit sphere in $\RR^d$ by training an over-parameterized two-layer neural ne...
- FedPOD: the deployable units of training for federated learning : Abstract: This paper proposes FedPOD (Proportionally Orchestrated Derivative) for optimizing learning efficiency and communication cost in federated learning among multiple clients. Inspired by FedPID...
- Deep Learning and Machine Learning -- Python Data Structures and Mathematics Fundamental: From Theory to Practice : Abstract: This book provides a comprehensive introduction to the foundational concepts of machine learning (ML) and deep learning (DL). It bridges the gap between theoretical mathematics and practical...
- FedReFT: Federated Representation Fine-Tuning with All-But-Me Aggregation : Abstract: Parameter-efficient fine-tuning (PEFT) adapts large pre-trained models by updating only a small subset of parameters. Recently, Representation Fine-Tuning (ReFT) has emerged as an effective ...
- Estimating Graph Dimension with Cross-validated Eigenvalues : Abstract: In applied multivariate statistics, estimating the number of latent dimensions or the number of clusters, $k$, is a fundamental and recurring problem. We study a sequence of statistics calle...
- Algorithmic Aspects of the Log-Laplace Transform and a Non-Euclidean Proximal Sampler : Abstract: The development of efficient sampling algorithms catering to non-Euclidean geometries has been a challenging endeavor, as discretization techniques which succeed in the Euclidean setting do ...
- Boosted Control Functions: Distribution generalization and invariance in confounded models : Abstract: Modern machine learning methods and the availability of large-scale data have significantly advanced our ability to predict target quantities from large sets of covariates. However, these me...
- Deep Learning and Machine Learning: Advancing Big Data Analytics and Management with Design Patterns : Abstract: This book, Design Patterns in Machine Learning and Deep Learning: Advancing Big Data Analytics Management, presents a comprehensive study of essential design patterns tailored for large-scal...
- Non-Intrusive Parametrized-Background Data-Weak Reconstruction of Cardiac Displacement Fields from Sparse MRI-like Observations : Abstract: Personalized cardiac diagnostics require accurate reconstruction of myocardial displacement fields from sparse clinical imaging data, yet current methods often demand intrusive access to com...
- How well do Large Language Models Recognize Instructional Moves? Establishing Baselines for Foundation Models in Educational Discourse : Abstract: Large language models (LLMs) are increasingly adopted in educational technologies for a variety of tasks, from generating instructional materials and assisting with assessment design to tuto...
- Counterfactual LLM-based Framework for Measuring Rhetorical Style : Abstract: The rise of AI has fueled growing concerns about ``hype'' in machine learning papers, yet a reliable way to quantify rhetorical style independently of substantive content has remained elusiv...
- PRISM: A Personality-Driven Multi-Agent Framework for Social Media Simulation : Abstract: Traditional agent-based models (ABMs) of opinion dynamics often fail to capture the psychological heterogeneity driving online polarization due to simplistic homogeneity assumptions. This li...
- Bias Beneath the Tone: Empirical Characterisation of Tone Bias in LLM-Driven UX Systems : Abstract: Large Language Models are increasingly used in conversational systems such as digital personal assistants, shaping how people interact with technology through language. While their responses...
- Memory-T1: Reinforcement Learning for Temporal Reasoning in Multi-session Agents : Abstract: Temporal reasoning over long, multi-session dialogues is a critical capability for conversational agents. However, existing works and our pilot study have shown that as dialogue histories gr...
- A Novel Graph-Sequence Learning Model for Inductive Text Classification : Abstract: Text classification plays an important role in various downstream text-related tasks, such as sentiment analysis, fake news detection, and public opinion analysis. Recently, text classificat...
- Multi-hop Reasoning via Early Knowledge Alignment : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for Large Language Models (LLMs) to address knowledge-intensive queries requiring domain-specific or up-to-date inform...
- AprielGuard : Abstract: Safeguarding large language models (LLMs) against unsafe or adversarial behavior is critical as they are increasingly deployed in conversational and agentic settings. Existing moderation too...
- SpidR: Learning Fast and Stable Linguistic Units for Spoken Language Models Without Supervision : Abstract: The parallel advances in language modeling and speech representation learning have raised the prospect of learning language directly from speech without textual intermediates. This requires ...
- Can LLMs Solve My Grandma's Riddle? Evaluating Multilingual Large Language Models on Reasoning Traditional Bangla Tricky Riddles : Abstract: Large Language Models (LLMs) show impressive performance on many NLP benchmarks, yet their ability to reason in figurative, culturally grounded, and low-resource settings remains underexplor...
- Sentiment-Aware Extractive and Abstractive Summarization for Unstructured Text Mining : Abstract: With the rapid growth of unstructured data from social media, reviews, and forums, text mining has become essential in Information Systems (IS) for extracting actionable insights. Summarizat...
- Step-DeepResearch Technical Report : Abstract: As LLMs shift toward autonomous agents, Deep Research has emerged as a pivotal metric. However, existing academic benchmarks like BrowseComp often fail to meet real-world demands for open-en...
- Reducing Label Dependency in Human Activity Recognition with Wearables: From Supervised Learning to Novel Weakly Self-Supervised Approaches : Abstract: Human activity recognition (HAR) using wearable sensors has advanced through various machine learning paradigms, each with inherent trade-offs between performance and labeling requirements. ...
- Synthetic Data Blueprint (SDB): A modular framework for the statistical, structural, and graph-based evaluation of synthetic tabular data : Abstract: In the rapidly evolving era of Artificial Intelligence (AI), synthetic data are widely used to accelerate innovation while preserving privacy and enabling broader data accessibility. However...
- Per-Axis Weight Deltas for Frequent Model Updates : Abstract: Serving many task-specialized LLM variants is often limited by the large size of fine-tuned checkpoints and the resulting cold-start latency. Since fine-tuned weights differ from their base ...
- Sign-Aware Multistate Jaccard Kernels and Geometry for Real and Complex-Valued Signals : Abstract: We introduce a sign-aware, multistate Jaccard/Tanimoto framework that extends overlap-based distances from nonnegative vectors and measures to arbitrary real- and complex-valued signals whil...
- Node-Level Financial Optimization in Demand Forecasting Through Dynamic Cost Asymmetry and Feedback Mechanism : Abstract: This work introduces a methodology to adjust forecasts based on node-specific cost function asymmetry. The proposed model generates savings by dynamically incorporating the cost asymmetry in...
- End-to-End Data Quality-Driven Framework for Machine Learning in Production Environment : Abstract: This paper introduces a novel end-to-end framework that efficiently integrates data quality assessment with machine learning (ML) model operations in real-time production environments. While...
- Out-of-Distribution Detection for Continual Learning: Design Principles and Benchmarking : Abstract: Recent years have witnessed significant progress in the development of machine learning models across a wide range of fields, fueled by increased computational resources, large-scale dataset...
- Trend Extrapolation for Technology Forecasting: Leveraging LSTM Neural Networks for Trend Analysis of Space Exploration Vessels : Abstract: Forecasting technological advancement in complex domains such as space exploration presents significant challenges due to the intricate interaction of technical, economic, and policy-related...
- Hard Negative Sample-Augmented DPO Post-Training for Small Language Models : Abstract: Large language models (LLMs) continue to struggle with mathematical reasoning, and common post-training pipelines often reduce each generated solution to a binary outcome: correct or incorre...
- ArcGen: Generalizing Neural Backdoor Detection Across Diverse Architectures : Abstract: Backdoor attacks pose a significant threat to the security and reliability of deep learning models. To mitigate such attacks, one promising approach is to learn to extract features from the ...
- Exploring Deep-to-Shallow Transformable Neural Networks for Intelligent Embedded Systems : Abstract: Thanks to the evolving network depth, convolutional neural networks (CNNs) have achieved remarkable success across various embedded scenarios, paving the way for ubiquitous embedded intellig...
- Leakage-Aware Bandgap Prediction on the JARVIS-DFT Dataset: A Phase-Wise Feature Analysis : Abstract: In this study, we perform a systematic analysis of the JARVIS-DFT bandgap dataset and identify and remove descriptors that may inadvertently encode band-structure information, such as effect...
- The Deleuzian Representation Hypothesis : Abstract: We propose an alternative to sparse autoencoders (SAEs) as a simple and effective unsupervised method for extracting interpretable concepts from neural networks. The core idea is to cluster ...
- Case Prompting to Mitigate Large Language Model Bias for ICU Mortality Prediction : Abstract: Accurate mortality risk prediction for intensive care unit (ICU) patients is essential for clinical decision-making. Although large language models (LLMs) show promise in predicting outcomes...
- OpComm: A Reinforcement Learning Framework for Adaptive Buffer Control in Warehouse Volume Forecasting : Abstract: Accurate forecasting of package volumes at delivery stations is critical for last-mile logistics, where errors lead to inefficient resource allocation, higher costs, and delivery delays. We ...
- OASI: Objective-Aware Surrogate Initialization for Multi-Objective Bayesian Optimization in TinyML Keyword Spotting : Abstract: Voice assistants utilize Keyword Spotting (KWS) to enable efficient, privacy-friendly activation. However, realizing accurate KWS models on ultra-low-power TinyML devices (often with less th...
- Asia Cup 2025: A Structured T20 Match-Level Dataset and Exploratory Analysis for Cricket Analytics : Abstract: This paper presents a structured and comprehensive dataset corresponding to the 2025 Asia Cup T20 cricket tournament, designed to facilitate data-driven research in sports analytics. The dat...
- EdgeFlex-Transformer: Transformer Inference for Edge Devices : Abstract: Deploying large-scale transformer models on edge devices presents significant challenges due to strict constraints on memory, compute, and latency. In this work, we propose a lightweight yet...
- On-device Large Multi-modal Agent for Human Activity Recognition : Abstract: Human Activity Recognition (HAR) has been an active area of research, with applications ranging from healthcare to smart environments. The recent advancements in Large Language Models (LLMs)...
- DeepBridge: A Unified and Production-Ready Framework for Multi-Dimensional Machine Learning Validation : Abstract: We present DeepBridge, an 80K-line Python library that unifies multi-dimensional validation, automatic compliance verification, knowledge distillation, and synthetic data generation. DeepBri...
- Learning to Design City-scale Transit Routes : Abstract: Designing efficient transit route networks is an NP-hard problem with exponentially large solution spaces that traditionally relies on manual planning processes. We present an end-to-end rei...
- Reduced Order Modeling for Tsunami Forecasting with Bayesian Hierarchical Pooling : Abstract: Reduced order models (ROM) can represent spatiotemporal processes in significantly fewer dimensions and can be solved many orders faster than their governing partial differential equations (...
- Guardrailed Uplift Targeting: A Causal Optimization Playbook for Marketing Strategy : Abstract: This paper introduces a marketing decision framework that converts heterogeneous-treatment uplift into constrained targeting strategies to maximize revenue and retention while honoring busin...
- The Seismic Wavefield Common Task Framework : Abstract: Seismology faces fundamental challenges in state forecasting and reconstruction (e.g., earthquake early warning and ground motion prediction) and managing the parametric variability of sourc...
- Spatio-Temporal Graph Neural Networks for Dairy Farm Sustainability Forecasting and Counterfactual Policy Analysis : Abstract: This study introduces a novel data-driven framework and the first-ever county-scale application of Spatio-Temporal Graph Neural Networks (STGNN) to forecast composite sustainability indices ...
- Bloom Filter Encoding for Machine Learning : Abstract: We present a method that uses the Bloom filter transform to preprocess data for machine learning. Each sample is encoded into a compact, privacy-preserving bit array. This reduces memory use...
- LoFT-LLM: Low-Frequency Time-Series Forecasting with Large Language Models : Abstract: Time-series forecasting in real-world applications such as finance and energy often faces challenges due to limited training data and complex, noisy temporal dynamics. Existing deep forecast...
- Control Variate Score Matching for Diffusion Models : Abstract: Diffusion models offer a robust framework for sampling from unnormalized probability densities, which requires accurately estimating the score of the noise-perturbed target distribution. Whi...
- Orthogonal Activation with Implicit Group-Aware Bias Learning for Class Imbalance : Abstract: Class imbalance is a common challenge in machine learning and data mining, often leading to suboptimal performance in classifiers. While deep learning excels in feature extraction, its perfo...
- PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models : Abstract: We introduce $\texttt{PairFlow}$, a lightweight preprocessing step for training Discrete Flow Models (DFMs) to achieve few-step sampling without requiring a pretrained teacher. DFMs have rec...
- Jensen-Shannon Divergence Message-Passing for Rich-Text Graph Representation Learning : Abstract: In this paper, we investigate how the widely existing contextual and structural divergence may influence the representation learning in rich-text graphs. To this end, we propose Jensen-Shann...
- Information-directed sampling for bandits: a primer : Abstract: The Multi-Armed Bandit problem provides a fundamental framework for analyzing the tension between exploration and exploitation in sequential learning. This paper explores Information Directe...
- Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering : Abstract: Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected return using a given static dataset of transitions. However, offline RL faces the distribution shift pr...
- Learning to Reason in LLMs by Expectation Maximization : Abstract: Large language models (LLMs) solve reasoning problems by first generating a rationale and then answering. We formalize reasoning as a latent variable model and derive an expectation-maximiza...
- NeuralCrop: Combining physics and machine learning for improved crop yield predictions : Abstract: Global gridded crop models (GGCMs) simulate daily crop growth by explicitly representing key biophysical processes and project end-of-season yield time series. They are a primary tool to qua...
- Cost-TrustFL: Cost-Aware Hierarchical Federated Learning with Lightweight Reputation Evaluation across Multi-Cloud : Abstract: Federated learning across multi-cloud environments faces critical challenges, including non-IID data distributions, malicious participant detection, and substantial cross-cloud communication...
- Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning : Abstract: We study offline multitask reinforcement learning in settings where multiple tasks share a low-rank representation of their action-value functions. In this regime, a learner is provided with...
- Adaptive Multi-task Learning for Probabilistic Load Forecasting : Abstract: Simultaneous load forecasting across multiple entities (e.g., regions, buildings) is crucial for the efficient, reliable, and cost-effective operation of power systems. Accurate load forecas...
- How I Met Your Bias: Investigating Bias Amplification in Diffusion Models : Abstract: Diffusion-based generative models demonstrate state-of-the-art performance across various image synthesis tasks, yet their tendency to replicate and amplify dataset biases remains poorly und...
- Unified Multimodal Brain Decoding via Cross-Subject Soft-ROI Fusion : Abstract: Multimodal brain decoding aims to reconstruct semantic information that is consistent with visual stimuli from brain activity signals such as fMRI, and then generate readable natural languag...
- DeepONet-accelerated Bayesian inversion for moving boundary problems : Abstract: This work demonstrates that neural operator learning provides a powerful and flexible framework for building fast, accurate emulators of moving boundary systems, enabling their integration i...
- HGAN-SDEs: Learning Neural Stochastic Differential Equations with Hermite-Guided Adversarial Training : Abstract: Neural Stochastic Differential Equations (Neural SDEs) provide a principled framework for modeling continuous-time stochastic processes and have been widely adopted in fields ranging from ph...
- Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity : Abstract: Mixture-of-Experts (MoE) architectures achieve parameter efficiency through conditional computation, yet contemporary designs suffer from two fundamental limitations: structural parameter is...
- FedDPC : Handling Data Heterogeneity and Partial Client Participation in Federated Learning : Abstract: Data heterogeneity is a significant challenge in modern federated learning (FL) as it creates variance in local model updates, causing the aggregated global model to shift away from the true...
- Inverse Autoregressive Flows for Zero Degree Calorimeter fast simulation : Abstract: Physics-based machine learning blends traditional science with modern data-driven techniques. Rather than relying exclusively on empirical data or predefined equations, this methodology embe...
- Physics-guided Neural Network-based Shaft Power Prediction for Vessels : Abstract: Optimizing maritime operations, particularly fuel consumption for vessels, is crucial, considering its significant share in global trade. As fuel consumption is closely related to the shaft ...
- Field-Space Attention for Structure-Preserving Earth System Transformers : Abstract: Accurate and physically consistent modeling of Earth system dynamics requires machine-learning architectures that operate directly on continuous geophysical fields and preserve their underly...
- GeoTransolver: Learning Physics on Irregumar Domains Using Multi-scale Geometry Aware Physics Attention Transformer : Abstract: We present GeoTransolver, a Multiscale Geometry-Aware Physics Attention Transformer for CAE that replaces standard attention with GALE, coupling physics-aware self-attention on learned state...
- BRIDGE: Budget-aware Reasoning via Intermediate Distillation with Guided Examples : Abstract: Distilling knowledge from large proprietary models (e.g., GPT-4) to tiny deployable models (less than 1B parameters) faces a critical capacity-budget trap: the 1000x capacity gap between tea...
- DecoKAN: Interpretable Decomposition for Forecasting Cryptocurrency Market Dynamics : Abstract: Accurate and interpretable forecasting of multivariate time series is crucial for understanding the complex dynamics of cryptocurrency markets in digital asset systems. Advanced deep learnin...
- Beyond Vision: Contextually Enriched Image Captioning with Multi-Modal Retrieva : Abstract: Real-world image captions often lack contextual depth, omitting crucial details such as event background, temporal cues, outcomes, and named entities that are not visually discernible. This ...
- An Optimal Policy for Learning Controllable Dynamics by Exploration : Abstract: Controllable Markov chains describe the dynamics of sequential decision making tasks and are the central component in optimal control and reinforcement learning. In this work, we give the ge...
- On the Effectiveness of Instruction-Tuning Local LLMs for Identifying Software Vulnerabilities : Abstract: Large Language Models (LLMs) show significant promise in automating software vulnerability analysis, a critical task given the impact of security failure of modern software systems. However,...
- CBA: Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks : Abstract: We propose a communication-bound-aware cross-domain resource assignment framework for pipeline-parallel distributed training over multi-datacenter optical networks, which lowers iteration ti...
- QE-Catalytic: A Graph-Language Multimodal Base Model for Relaxed-Energy Prediction in Catalytic Adsorption : Abstract: Adsorption energy is a key descriptor of catalytic reactivity. It is fundamentally defined as the difference between the relaxed total energy of the adsorbate-surface system and that of an a...
- Spatio-Temporal Graphs Beyond Grids: Benchmark for Maritime Anomaly Detection : Abstract: Spatio-temporal graph neural networks (ST-GNNs) have achieved notable success in structured domains such as road traffic and public transportation, where spatial entities can be naturally re...
- Item Region-based Style Classification Network (IRSN): A Fashion Style Classifier Based on Domain Knowledge of Fashion Experts : Abstract: Fashion style classification is a challenging task because of the large visual variation within the same style and the existence of visually similar styles. Styles are expressed not only b...
- ABBEL: LLM Agents Acting through Belief Bottlenecks Expressed in Language : Abstract: As the length of sequential decision-making tasks increases, it becomes computationally impractical to keep full interaction histories in context. We introduce a general framework for LLM ag...
- Evolutionary Neural Architecture Search with Dual Contrastive Learning : Abstract: Evolutionary Neural Architecture Search (ENAS) has gained attention for automatically designing neural network architectures. Recent studies use a neural predictor to guide the process, but ...
- M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has recently been extended to multimodal settings, connecting multimodal large language models (MLLMs) with vast corpora of external knowledge such as mu...
- Retrieval-augmented Prompt Learning for Pre-trained Foundation Models : Abstract: The pre-trained foundation models (PFMs) have become essential for facilitating large-scale multimodal learning. Researchers have effectively employed the ``pre-train, prompt, and predict'' ...
- Fun-Audio-Chat Technical Report : Abstract: Recent advancements in joint speech-text models show great potential for seamless voice interactions. However, existing models face critical challenges: temporal resolution mismatch between ...
- AXIOM: Benchmarking LLM-as-a-Judge for Code via Rule-Based Perturbation and Multisource Quality Calibration : Abstract: Large language models (LLMs) have been increasingly deployed in real-world software engineering, fostering the development of code evaluation metrics to study the quality of LLM-generated co...
- AI Security Beyond Core Domains: Resume Screening as a Case Study of Adversarial Vulnerabilities in Specialized LLM Applications : Abstract: Large Language Models (LLMs) excel at text comprehension and generation, making them ideal for automated tasks like code review and content moderation. However, our research identifies a vul...
- Odysseus: Jailbreaking Commercial Multimodal LLM-integrated Systems via Dual Steganography : Abstract: By integrating language understanding with perceptual modalities such as images, multimodal large language models (MLLMs) constitute a critical substrate for modern AI systems, particularly ...
- FaithLens: Detecting and Explaining Faithfulness Hallucination : Abstract: Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarizatio...
- Asynchronous Fast-Slow Vision-Language-Action Policies for Whole-Body Robotic Manipulation : Abstract: Most Vision-Language-Action (VLA) systems integrate a Vision-Language Model (VLM) for semantic reasoning with an action expert generating continuous action signals, yet both typically run at...
- Corpus of Cross-lingual Dialogues with Minutes and Detection of Misunderstandings : Abstract: Speech processing and translation technology have the potential to facilitate meetings of individuals who do not share any common language. To evaluate automatic systems for such a task, a v...
- Memory as Resonance: A Biomimetic Architecture for Infinite Context Memory on Ergodic Phonetic Manifolds : Abstract: The memory of contemporary Large Language Models is bound by a physical paradox: as they learn, they fill up. The linear accumulation (O(N)) of Key-Value states treats context as a warehouse...
- ${D}^{3}${ETOR}: ${D}$ebate-Enhanced Pseudo Labeling and Frequency-Aware Progressive ${D}$ebiasing for Weakly-Supervised Camouflaged Object ${D}$etection with Scribble Annotations : Abstract: Weakly-Supervised Camouflaged Object Detection (WSCOD) aims to locate and segment objects that are visually concealed within their surrounding scenes, relying solely on sparse supervision su...
- UbiQVision: Quantifying Uncertainty in XAI for Image Recognition : Abstract: Recent advances in deep learning have led to its widespread adoption across diverse domains, including medical imaging. This progress is driven by increasingly sophisticated model architectu...
- SlideTailor: Personalized Presentation Slide Generation for Scientific Papers : Abstract: Automatic presentation slide generation can greatly streamline content creation. However, since preferences of each user may vary, existing under-specified formulations often lead to subopti...
- TAVID: Text-Driven Audio-Visual Interactive Dialogue Generation : Abstract: The objective of this paper is to jointly synthesize interactive videos and conversational speech from text and reference images. With the ultimate goal of building human-like conversational...
- Patterns vs. Patients: Evaluating LLMs against Mental Health Professionals on Personality Disorder Diagnosis through First-Person Narratives : Abstract: Growing reliance on LLMs for psychiatric self-assessment raises questions about their ability to interpret qualitative patient narratives. We present the first direct comparison between stat...
- KnowVal: A Knowledge-Augmented and Value-Guided Autonomous Driving System : Abstract: Visual-language reasoning, driving knowledge, and value alignment are essential for advanced autonomous driving systems. However, existing approaches largely rely on data-driven learning, ma...
- TableGPT-R1: Advancing Tabular Reasoning Through Reinforcement Learning : Abstract: Tabular data serves as the backbone of modern data analysis and scientific research. While Large Language Models (LLMs) fine-tuned via Supervised Fine-Tuning (SFT) have significantly improve...
- Deep Learning Classification of EEG Responses to Multi-Dimensional Transcranial Electrical Stimulation : Abstract: A major shortcoming of medical practice is the lack of an objective measure of conscious level. Impairment of consciousness is common, e.g. following brain injury and seizures, which can als...
- Toward Explaining Large Language Models in Software Engineering Tasks : Abstract: Recent progress in Large Language Models (LLMs) has substantially advanced the automation of software engineering (SE) tasks, enabling complex activities such as code generation and code sum...
- Multi-LLM Thematic Analysis with Dual Reliability Metrics: Combining Cohen's Kappa and Semantic Similarity for Qualitative Research Validation : Abstract: Qualitative research faces a critical reliability challenge: traditional inter-rater agreement methods require multiple human coders, are time-intensive, and often yield moderate consistency...
- Clust-PSI-PFL: A Population Stability Index Approach for Clustered Non-IID Personalized Federated Learning : Abstract: Federated learning (FL) supports privacy-preserving, decentralized machine learning (ML) model training by keeping data on client devices. However, non-independent and identically distribute...
- Identifying Appropriately-Sized Services with Deep Reinforcement Learning : Abstract: Service-based architecture (SBA) has gained attention in industry and academia as a means to modernize legacy systems. It refers to a design style that enables systems to be developed as sui...
- AUDRON: A Deep Learning Framework with Fused Acoustic Signatures for Drone Type Recognition : Abstract: Unmanned aerial vehicles (UAVs), commonly known as drones, are increasingly used across diverse domains, including logistics, agriculture, surveillance, and defense. While these systems prov...
- DETACH : Decomposed Spatio-Temporal Alignment for Exocentric Video and Ambient Sensors with Staged Learning : Abstract: Aligning egocentric video with wearable sensors have shown promise for human action recognition, but face practical limitations in user discomfort, privacy concerns, and scalability. We expl...
- Simplifying Multi-Task Architectures Through Task-Specific Normalization : Abstract: Multi-task learning (MTL) aims to leverage shared knowledge across tasks to improve generalization and parameter efficiency, yet balancing resources and mitigating interference remain open c...
- Evasion-Resilient Detection of DNS-over-HTTPS Data Exfiltration: A Practical Evaluation and Toolkit : Abstract: The purpose of this project is to assess how well defenders can detect DNS-over-HTTPS (DoH) file exfiltration, and which evasion strategies can be used by attackers. While providing a reprod...
- Dual-Encoder Transformer-Based Multimodal Learning for Ischemic Stroke Lesion Segmentation Using Diffusion MRI : Abstract: Accurate segmentation of ischemic stroke lesions from diffusion magnetic resonance imaging (MRI) is essential for clinical decision-making and outcome assessment. Diffusion-Weighted Imaging ...
- SweRank+: Multilingual, Multi-Turn Code Ranking for Software Issue Localization : Abstract: Maintaining large-scale, multilingual codebases hinges on accurately localizing issues, which requires mapping natural-language error descriptions to the relevant functions that need to be m...
- LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving : Abstract: Simulators can generate virtually unlimited driving data, yet imitation learning policies in simulation still struggle to achieve robust closed-loop performance. Motivated by this gap, we em...
- Distilling to Hybrid Attention Models via KL-Guided Layer Selection : Abstract: Distilling pretrained softmax attention Transformers into more efficient hybrid architectures that interleave softmax and linear attention layers is a promising approach for improving the in...
- Fail Fast, Win Big: Rethinking the Drafting Strategy in Speculative Decoding via Diffusion LLMs : Abstract: Diffusion Large Language Models (dLLMs) offer fast, parallel token generation, but their standalone use is plagued by an inherent efficiency-quality tradeoff. We show that, if carefully appl...
- Performative Policy Gradient: Optimality in Performative Reinforcement Learning : Abstract: Post-deployment machine learning algorithms often influence the environments they act in, and thus shift the underlying dynamics that the standard reinforcement learning (RL) methods ignore....
- Leveraging High-Fidelity Digital Models and Reinforcement Learning for Mission Engineering: A Case Study of Aerial Firefighting Under Perfect Information : Abstract: As systems engineering (SE) objectives evolve from design and operation of monolithic systems to complex System of Systems (SoS), the discipline of Mission Engineering (ME) has emerged which...
- Cube Bench: A Benchmark for Spatial Visual Reasoning in MLLMs : Abstract: We introduce Cube Bench, a Rubik's-cube benchmark for evaluating spatial and sequential reasoning in multimodal large language models (MLLMs). The benchmark decomposes performance into five ...
- Emergent temporal abstractions in autoregressive models enable hierarchical reinforcement learning : Abstract: Large-scale autoregressive models pretrained on next-token prediction and finetuned with reinforcement learning (RL) have achieved unprecedented success on many problem domains. During RL, t...
- cuPilot: A Strategy-Coordinated Multi-agent Framework for CUDA Kernel Evolution : Abstract: Optimizing CUDA kernels is a challenging and labor-intensive task, given the need for hardware-software co-design expertise and the proprietary nature of high-performance kernel libraries. W...
- Reduced-order autoregressive dynamics of a complex financial system: a PCA-based approach : Abstract: This study analyzes the dynamic interactions among the NASDAQ index, crude oil, gold, and the US dollar using a reduced-order modeling approach. Time-delay embedding and principal component ...
- Improving Local Training in Federated Learning via Temperature Scaling : Abstract: Federated learning is inherently hampered by data heterogeneity: non-i.i.d. training data over local clients. We propose a novel model training approach for federated learning, FLex&Chill, w...
- Enhancing Topological Dependencies in Spatio-Temporal Graphs with Cycle Message Passing Blocks : Abstract: Graph Neural Networks (GNNs) and Transformer-based models have been increasingly adopted to learn the complex vector representations of spatio-temporal graphs, capturing intricate spatio-tem...
- Tactile-based Object Retrieval From Granular Media : Abstract: We introduce GEOTACT, the first robotic system capable of grasping and retrieving objects of potentially unknown shapes buried in a granular environment. While important in many applications...
- Reinforcement Learning for Unsupervised Video Summarization with Reward Generator Training : Abstract: This paper presents a novel approach for unsupervised video summarization using reinforcement learning (RL), addressing limitations like unstable adversarial training and reliance on heurist...
- Explainable deep learning improves human mental models of self-driving cars : Abstract: Self-driving cars increasingly rely on deep neural networks to achieve human-like driving. The opacity of such black-box planners makes it challenging for the human behind the wheel to accur...
- FP=xINT:Representing Neural Networks via Low-Bit Series Basis Functions : Abstract: Post-Training Quantization (PTQ) converts pre-trained Full-Precision (FP) models into quantized versions without training. While existing methods reduce size and computational costs, they al...
- Lossless Model Compression via Joint Low-Rank Factorization Optimization : Abstract: Low-rank factorization is a popular model compression technique that minimizes the error $δ$ between approximated and original weight matrices. Despite achieving performances close to the or...
- Compression for Better: A General and Stable Lossless Compression Framework : Abstract: This work focus on how to stabilize and lossless model compression, aiming to reduce model complexity and enhance efficiency without sacrificing performance due to compression errors. A key ...
- Deep Learning for Spatio-Temporal Fusion in Land Surface Temperature Estimation: A Comprehensive Survey, Experimental Analysis, and Future Trends : Abstract: Land Surface Temperature (LST) plays a key role in climate monitoring, urban heat assessment, and land-atmosphere interactions. However, current thermal infrared satellite sensors cannot sim...
- PhysMaster: Building an Autonomous AI Physicist for Theoretical and Computational Physics Research : Abstract: Advances in LLMs have produced agents with knowledge and operational capabilities comparable to human scientists, suggesting potential to assist, accelerate, and automate research. However, ...
- A Branch-and-Price Algorithm for Fast and Equitable Last-Mile Relief Aid Distribution : Abstract: The distribution of relief supplies to shelters is a critical aspect of post-disaster humanitarian logistics. In major disasters, prepositioned supplies often fall short of meeting all deman...
- Interpolative Decoding: Exploring the Spectrum of Personality Traits in LLMs : Abstract: Recent research has explored using very large language models (LLMs) as proxies for humans in tasks such as simulation, surveys, and studies. While LLMs do not possess a human psychology, th...
- Zero-Shot Segmentation through Prototype-Guidance for Multi-Label Plant Species Identification : Abstract: This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solut...
- FGDCC: Fine-Grained Deep Cluster Categorization -- A Framework for Intra-Class Variability Problems in Plant Classification : Abstract: Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variabilit...
- S$^3$IT: A Benchmark for Spatially Situated Social Intelligence Test : Abstract: The integration of embodied agents into human environments demands embodied social intelligence: reasoning over both social norms and physical constraints. However, existing evaluations fail...
- Discovering Lie Groups with Flow Matching : Abstract: Symmetry is fundamental to understanding physical systems, and at the same time, can improve performance and sample efficiency in machine learning. Both pursuits require knowledge of the und...
- Learning Skills from Action-Free Videos : Abstract: Learning from videos offers a promising path toward generalist robots by providing rich visual and temporal priors beyond what real robot datasets contain. While existing video generative mo...
- Towards Generative Location Awareness for Disaster Response: A Probabilistic Cross-view Geolocalization Approach : Abstract: As Earth's climate changes, it is impacting disasters and extreme weather events across the planet. Record-breaking heat waves, drenching rainfalls, extreme wildfires, and widespread floodin...
- Scaling Reinforcement Learning for Content Moderation with Large Language Models : Abstract: Content moderation at scale remains one of the most pressing challenges in today's digital ecosystem, where billions of user- and AI-generated artifacts must be continuously evaluated for po...
- Reason2Decide: Rationale-Driven Multi-Task Learning : Abstract: Despite the wide adoption of Large Language Models (LLM)s, clinical decision support systems face a critical challenge: achieving high predictive accuracy while generating explanations align...
- Adaptive Financial Sentiment Analysis for NIFTY 50 via Instruction-Tuned LLMs , RAG and Reinforcement Learning Approaches : Abstract: Financial sentiment analysis plays a crucial role in informing investment decisions, assessing market risk, and predicting stock price trends. Existing works in financial sentiment analysis ...
- MolAct: An Agentic RL Framework for Molecular Editing and Property Optimization : Abstract: Molecular editing and optimization are multi-step problems that require iteratively improving properties while keeping molecules chemically valid and structurally similar. We frame both task...
- Enhancing Zero-Shot Time Series Forecasting in Off-the-Shelf LLMs via Noise Injection : Abstract: Large Language Models (LLMs) have demonstrated effectiveness as zero-shot time series (TS) forecasters. The key challenge lies in tokenizing TS data into textual representations that align w...
- A Bidirectional Gated Recurrent Unit Model for PUE Prediction in Data Centers : Abstract: Data centers account for significant global energy consumption and a carbon footprint. The recent increasing demand for edge computing and AI advancements drives the growth of data center st...
- Concept Generalization in Humans and Large Language Models: Insights from the Number Game : Abstract: We compare human and large language model (LLM) generalization in the number game, a concept inference task. Using a Bayesian model as an analytical framework, we examined the inductive bias...
- Offline Safe Policy Optimization From Heterogeneous Feedback : Abstract: Offline Preference-based Reinforcement Learning (PbRL) learns rewards and policies aligned with human preferences without the need for extensive reward engineering and direct interaction wit...
- TongSIM: A General Platform for Simulating Intelligent Machines : Abstract: As artificial intelligence (AI) rapidly advances, especially in multimodal large language models (MLLMs), research focus is shifting from single-modality text processing to the more complex ...
- MemR$^3$: Memory Retrieval via Reflective Reasoning for LLM Agents : Abstract: Memory systems have been designed to leverage past experiences in Large Language Model (LLM) agents. However, many deployed memory systems primarily optimize compression and storage, with co...
- Graph-Symbolic Policy Enforcement and Control (G-SPEC): A Neuro-Symbolic Framework for Safe Agentic AI in 5G Autonomous Networks : Abstract: As networks evolve toward 5G Standalone and 6G, operators face orchestration challenges that exceed the limits of static automation and Deep Reinforcement Learning. Although Large Language M...
- ActionFlow: A Pipelined Action Acceleration for Vision Language Models on Edge : Abstract: Vision-Language-Action (VLA) models have emerged as a unified paradigm for robotic perception and control, enabling emergent generalization and long-horizon task execution. However, their de...
- Synthesizing Procedural Memory: Challenges and Architectures in Automated Workflow Generation : Abstract: While CodeMem establishes executable code as the optimal representation for agentic procedural memory, the mechanism for autonomously synthesizing this memory from a blank slate remains unde...
- SynCraft: Guiding Large Language Models to Predict Edit Sequences for Molecular Synthesizability Optimization : Abstract: Generative artificial intelligence has revolutionized the exploration of chemical space, yet a critical bottleneck remains that a substantial fraction of generated molecules is synthetically...
- A DeepSeek-Powered AI System for Automated Chest Radiograph Interpretation in Clinical Practice : Abstract: A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promis...
- Generative Digital Twins: Vision-Language Simulation Models for Executable Industrial Systems : Abstract: We propose a Vision-Language Simulation Model (VLSM) that unifies visual and textual understanding to synthesize executable FlexScript from layout sketches and natural-language prompts, enab...
- Bohrium + SciMaster: Building the Infrastructure and Ecosystem for Agentic Science at Scale : Abstract: AI agents are emerging as a practical way to run multi-step scientific workflows that interleave reasoning with tool use and verification, pointing to a shift from isolated AI-assisted steps...
- Benchmarking LLMs for Predictive Applications in the Intensive Care Units : Abstract: With the advent of LLMs, various tasks across the natural language processing domain have been transformed. However, their application in predictive tasks remains less researched. This study...
- Advancing Multimodal Teacher Sentiment Analysis:The Large-Scale T-MED Dataset & The Effective AAM-TSA Model : Abstract: Teachers' emotional states are critical in educational scenarios, profoundly impacting teaching efficacy, student engagement, and learning achievements. However, existing studies often fail ...
- Automated stereotactic radiosurgery planning using a human-in-the-loop reasoning large language model agent : Abstract: Stereotactic radiosurgery (SRS) demands precise dose shaping around critical structures, yet black-box AI systems have limited clinical adoption due to opacity concerns. We tested whether ch...
- LongVideoAgent: Multi-Agent Reasoning with Long Videos : Abstract: Recent advances in multimodal LLMs and systems that use tools for long-video QA point to the promise of reasoning over hour-long episodes. However, many methods still compress content into l...
- QoS-Aware Dynamic CU Selection in O-RAN with Graph-Based Reinforcement Learning : Abstract: Open Radio Access Network (O RAN) disaggregates conventional RAN into interoperable components, enabling flexible resource allocation, energy savings, and agile architectural design. In lega...
- Automated Fault Detection in 5G Core Networks Using Large Language Models : Abstract: With the rapid growth of data volume in modern telecommunication networks and the continuous expansion of their scale, maintaining high reliability has become a critical requirement. These n...
- Large Language Models for EDA Cloud Job Resource and Lifetime Prediction : Abstract: The rapid growth of cloud computing in the Electronic Design Automation (EDA) industry has created a critical need for resource and job lifetime prediction to achieve optimal scheduling. Tra...
- Generative AI for Analysts : Abstract: We study how generative artificial intelligence (AI) transforms the work of financial analysts. Using the 2023 launch of FactSet's AI platform as a natural experiment, we find that adoption ...
- Bidirectional human-AI collaboration in brain tumour assessments improves both expert human and AI agent performance : Abstract: The benefits of artificial intelligence (AI) human partnerships-evaluating how AI agents enhance expert human performance-are increasingly studied. Though rarely evaluated in healthcare, an ...
- PHANTOM: PHysical ANamorphic Threats Obstructing Connected Vehicle Mobility : Abstract: Connected autonomous vehicles (CAVs) rely on vision-based deep neural networks (DNNs) and low-latency (Vehicle-to-Everything) V2X communication to navigate safely and efficiently. Despite th...
- Development and external validation of a multimodal artificial intelligence mortality prediction model of critically ill patients using multicenter data : Abstract: Early prediction of in-hospital mortality in critically ill patients can aid clinicians in optimizing treatment. The objective was to develop a multimodal deep learning model, using structur...
- Thermodynamic Focusing for Inference-Time Search: Practical Methods for Target-Conditioned Sampling and Prompted Inference : Abstract: Finding rare but useful solutions in very large candidate spaces is a recurring practical challenge across language generation, planning, and reinforcement learning. We present a practical f...
- Multiscale Dual-path Feature Aggregation Network for Remaining Useful Life Prediction of Lithium-Ion Batteries : Abstract: Targeted maintenance strategies, ensuring the dependability and safety of industrial machinery. However, current modeling techniques for assessing both local and global correlation of batter...
- Tiny, On-Device Decision Makers with the MiniConv Library : Abstract: Reinforcement learning (RL) has achieved strong results, but deploying visual policies on resource-constrained edge devices remains challenging due to computational cost and communication la...
- High-Performance Self-Supervised Learning by Joint Training of Flow Matching : Abstract: Diffusion models can learn rich representations during data generation, showing potential for Self-Supervised Learning (SSL), but they face a trade-off between generative quality and discrim...
- CoPHo: Classifier-guided Conditional Topology Generation with Persistent Homology : Abstract: The structure of topology underpins much of the research on performance and robustness, yet available topology data are typically scarce, necessitating the generation of synthetic graphs wit...
- Simulation-Driven Railway Delay Prediction: An Imitation Learning Approach : Abstract: Reliable prediction of train delays is essential for enhancing the robustness and efficiency of railway transportation systems. In this work, we reframe delay forecasting as a stochastic sim...
- From Theory to Throughput: CUDA-Optimized APML for Large-Batch 3D Learning : Abstract: Loss functions are fundamental to learning accurate 3D point cloud models, yet common choices trade geometric fidelity for computational cost. Chamfer Distance is efficient but permits many-...
- QMBench: A Research Level Benchmark for Quantum Materials Research : Abstract: We introduce QMBench, a comprehensive benchmark designed to evaluate the capability of large language model agents in quantum materials research. This specialized benchmark assesses the mode...
- Attention Distance: A Novel Metric for Directed Fuzzing with Large Language Models : Abstract: In the domain of software security testing, Directed Grey-Box Fuzzing (DGF) has garnered widespread attention for its efficient target localization and excellent detection performance. Howev...
- How Many Experts Are Enough? Towards Optimal Semantic Specialization for Mixture-of-Experts : Abstract: Finding the optimal configuration of Sparse Mixture-ofExperts (SMoE) that maximizes semantic differentiation among experts is essential for exploiting the full potential of MoE architectures...
- A Declarative Language for Building And Orchestrating LLM-Powered Agent Workflows : Abstract: Building deployment-ready LLM agents requires complex orchestration of tools, data sources, and control flow logic, yet existing systems tightly couple agent logic to specific programming la...
- A K-Means, Ward and DBSCAN repeatability study : Abstract: Reproducibility is essential in machine learning because it ensures that a model or experiment yields the same scientific conclusion. For specific algorithms repeatability with bitwise ident...
- Learned Digital Codes for Over-the-Air Computation in Federated Edge Learning : Abstract: Federated edge learning (FEEL) enables wireless devices to collaboratively train a centralised model without sharing raw data, but repeated uplink transmission of model updates makes communi...
- UCCL-EP: Portable Expert-Parallel Communication : Abstract: Mixture-of-Experts (MoE) workloads rely on expert parallelism (EP) to achieve high GPU efficiency. State-of-the-art EP communication systems such as DeepEP demonstrate strong performance but...
- HARMON-E: Hierarchical Agentic Reasoning for Multimodal Oncology Notes to Extract Structured Data : Abstract: Unstructured notes within the electronic health record (EHR) contain rich clinical information vital for cancer treatment decision making and research, yet reliably extracting structured onc...
- Fine-Tuned In-Context Learners for Efficient Adaptation : Abstract: When adapting large language models (LLMs) to a specific downstream task, two primary approaches are commonly employed: (1) prompt engineering, often with in-context few-shot learning, lever...
- Demystifying LLM-as-a-Judge: Analytically Tractable Model for Inference-Time Scaling : Abstract: Recent developments in large language models have shown advantages in reallocating a notable share of computational resource from training time to inference time. However, the principles beh...
- Modeling Non-Ergodic Path Effects Using Conditional Generative Model for Fourier Amplitude Spectra : Abstract: Recent developments in non-ergodic ground-motion models (GMMs) explicitly model systematic spatial variations in source, site, and path effects, reducing standard deviation to 30-40% of ergo...
- A Time-efficient Prioritised Scheduling Algorithm to Optimise Initial Flock Formation of Drones : Abstract: Drone applications continue to expand across various domains, with flocking offering enhanced cooperative capabilities but introducing significant challenges during initial formation. Existi...
- Mitigating LLM Hallucination via Behaviorally Calibrated Reinforcement Learning : Abstract: LLM deployment in critical domains is currently impeded by persistent hallucinations--generating plausible but factually incorrect assertions. While scaling laws drove significant improvemen...
- Unified Brain Surface and Volume Registration : Abstract: Accurate registration of brain MRI scans is fundamental for cross-subject analysis in neuroscientific studies. This involves aligning both the cortical surface of the brain and the interior ...
- Vehicle-centric Perception via Multimodal Structured Pre-training : Abstract: Vehicle-centric perception plays a crucial role in many intelligent systems, including large-scale surveillance systems, intelligent transportation, and autonomous driving. Existing approach...
- Conditional Adversarial Fragility in Financial Machine Learning under Macroeconomic Stress : Abstract: Machine learning models used in financial decision systems operate in nonstationary economic environments, yet adversarial robustness is typically evaluated under static assumptions. This wo...
- Block-Recurrent Dynamics in Vision Transformers : Abstract: As Vision Transformers (ViTs) become standard vision backbones, a mechanistic account of their computational phenomenology is essential. Despite architectural cues that hint at dynamical str...
- How Much 3D Do Video Foundation Models Encode? : Abstract: Videos are continuous 2D projections of 3D worlds. After training on large video data, will global 3D understanding naturally emerge? We study this by quantifying the 3D understanding of exi...
- Regression of Functions by Quantum Neural Networks Circuits : Abstract: The performance of quantum neural network models depends strongly on architectural decisions, including circuit depth, placement of parametrized operations, and data-encoding strategies. Sel...
- Neuron-Guided Interpretation of Code LLMs: Where, Why, and How? : Abstract: Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source co...
- Schoenfeld's Anatomy of Mathematical Reasoning by Language Models : Abstract: Large language models increasingly expose reasoning traces, yet their underlying cognitive structure and steps remain difficult to identify and analyze beyond surface-level statistics. We ad...
- IoT-based Android Malware Detection Using Graph Neural Network With Adversarial Defense : Abstract: Since the Internet of Things (IoT) is widely adopted using Android applications, detecting malicious Android apps is essential. In recent years, Android graph-based deep learning research ha...
- Bring My Cup! Personalizing Vision-Language-Action Models with Visual Attentive Prompting : Abstract: While Vision-Language-Action (VLA) models generalize well to generic instructions, they struggle with personalized commands such as "bring my cup", where the robot must act on one specific i...
Research Sources: 295 | Generated: 12/25/2025
