AI Research News Feeds for January 5th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

Unified Primitive Proxies for Structured Shape Completion : Abstract: Structured shape completion recovers missing geometry as primitives rather than as unstructured points, which enables primitive-based surface reconstruction. Instead of following the prevail...
Fusion-SSAT: Unleashing the Potential of Self-supervised Auxiliary Task by Feature Fusion for Generalized Deepfake Detection : Abstract: In this work, we attempted to unleash the potential of self-supervised learning as an auxiliary task that can optimise the primary task of generalised deepfake detection. To explore this, we...
AdaGaR: Adaptive Gabor Representation for Dynamic Scene Reconstruction : Abstract: Reconstructing dynamic 3D scenes from monocular videos requires simultaneously capturing high-frequency appearance details and temporally continuous motion. Existing methods using single Gau...
The Impact of Lesion Focus on the Performance of AI-Based Melanoma Classification : Abstract: Melanoma is the most lethal subtype of skin cancer, and early and accurate detection of this disease can greatly improve patients' outcomes. Although machine learning models, especially conv...
DefVINS: Visual-Inertial Odometry for Deformable Scenes : Abstract: Deformable scenes violate the rigidity assumptions underpinning classical visual-inertial odometry (VIO), often leading to over-fitting to local non-rigid motion or severe drift when deforma...
Investigating the Viability of Employing Multi-modal Large Language Models in the Context of Audio Deepfake Detection : Abstract: While Vision-Language Models (VLMs) and Multimodal Large Language Models (MLLMs) have shown strong generalisation in detecting image and video deepfakes, their use for audio deepfake detecti...
Efficient Multi-Task Scene Analysis with RGB-D Transformers : Abstract: Scene analysis is essential for enabling autonomous systems, such as mobile robots, to operate in real-world environments. However, obtaining a comprehensive understanding of the scene requi...
Test-time generative augmentation for medical image segmentation : Abstract: Medical image segmentation is critical for clinical diagnosis, treatment planning, and monitoring, yet segmentation models often struggle with uncertainties stemming from occlusions, ambiguo...
NeRF-VIO: Map-Based Visual-Inertial Odometry with Initialization Leveraging Neural Radiance Fields : Abstract: A prior map serves as a foundational reference for localization in context-aware applications such as augmented reality (AR). Providing valuable contextual information about the environment,...
EndoStreamDepth: Temporally Consistent Monocular Depth Estimation for Endoscopic Video Streams : Abstract: This work presents EndoStreamDepth, a monocular depth estimation framework for endoscopic video streams. It provides accurate depth maps with sharp anatomical boundaries for each frame, temp...
StyGazeTalk: Learning Stylized Generation of Gaze and Head Dynamics : Abstract: Gaze and head movements play a central role in expressive 3D media, human-agent interaction, and immersive communication. Existing works often model facial components in isolation and lack m...
TeleWorld: Towards Dynamic Multimodal Synthesis with a 4D World Model : Abstract: World models aim to endow AI systems with the ability to represent, generate, and interact with dynamic environments in a coherent and temporally consistent manner. While recent video genera...
Spatial4D-Bench: A Versatile 4D Spatial Intelligence Benchmark : Abstract: 4D spatial intelligence involves perceiving and processing how objects move or change over time. Humans naturally possess 4D spatial intelligence, supporting a broad spectrum of spatial reas...
A Spatially Masked Adaptive Gated Network for multimodal post-flood water extent mapping using SAR and incomplete multispectral data : Abstract: Mapping water extent during a flood event is essential for effective disaster management throughout all phases: mitigation, preparedness, response, and recovery. In particular, during the re...
Compressed Map Priors for 3D Perception : Abstract: Human drivers rarely travel where no person has gone before. After all, thousands of drivers use busy city roads every day, and only one can claim to be the first. The same holds for autonom...
Attention to Detail: Global-Local Attention for High-Resolution AI-Generated Image Detection : Abstract: The rapid development of generative AI has made AI-generated images increasingly realistic and high-resolution. Most AI-generated image detection architectures typically downsample images be...
Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions : Abstract: In this paper, we introduce an underexplored problem in facial analysis: generating and recognizing multi-attribute natural language descriptions, containing facial action units (AUs), emoti...
DichroGAN: Towards Restoration of in-air Colours of Seafloor from Satellite Imagery : Abstract: Recovering the in-air colours of seafloor from satellite imagery is a challenging task due to the exponential attenuation of light with depth in the water column. In this study, we present D...
MorphAny3D: Unleashing the Power of Structured Latent in 3D Morphing : Abstract: 3D morphing remains challenging due to the difficulty of generating semantically consistent and temporally smooth deformations, especially across categories. We present MorphAny3D, a trainin...
CropNeRF: A Neural Radiance Field-Based Framework for Crop Counting : Abstract: Rigorous crop counting is crucial for effective agricultural management and informed intervention strategies. However, in outdoor field environments, partial occlusions combined with inheren...
IntraStyler: Exemplar-based Style Synthesis for Cross-modality Domain Adaptation : Abstract: Image-level domain alignment is the de facto approach for unsupervised domain adaptation, where unpaired image translation is used to minimize the domain gap. Prior studies mainly focus on t...
LooC: Effective Low-Dimensional Codebook for Compositional Vector Quantization : Abstract: Vector quantization (VQ) is a prevalent and fundamental technique that discretizes continuous feature vectors by approximating them using a codebook. As the diversity and complexity of data ...
Towards Syn-to-Real IQA: A Novel Perspective on Reshaping Synthetic Data Distributions : Abstract: Blind Image Quality Assessment (BIQA) has advanced significantly through deep learning, but the scarcity of large-scale labeled datasets remains a challenge. While synthetic data offers a pr...
Context-Aware Pesticide Recommendation via Few-Shot Pest Recognition for Precision Agriculture : Abstract: Effective pest management is crucial for enhancing agricultural productivity, especially for crops such as sugarcane and wheat that are highly vulnerable to pest infestations. Traditional pe...
TotalFM: An Organ-Separated Framework for 3D-CT Vision Foundation Models : Abstract: While foundation models in radiology are expected to be applied to various clinical tasks, computational cost constraints remain a major challenge when training on 3D-CT volumetric data. In ...
S1-MMAlign: A Large-Scale, Multi-Disciplinary Dataset for Scientific Figure-Text Understanding : Abstract: Multimodal learning has revolutionized general domain tasks, yet its application in scientific discovery is hindered by the profound semantic gap between complex scientific imagery and spars...
ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Patching : Abstract: Recent advances in text-to-image diffusion models have demonstrated remarkable generation capabilities, yet they raise significant concerns regarding safety, copyright, and ethical implicati...
Disentangling Hardness from Noise: An Uncertainty-Driven Model-Agnostic Framework for Long-Tailed Remote Sensing Classification : Abstract: Long-Tailed distributions are pervasive in remote sensing due to the inherently imbalanced occurrence of grounded objects. However, a critical challenge remains largely overlooked, i.e., dis...
SV-GS: Sparse View 4D Reconstruction with Skeleton-Driven Gaussian Splatting : Abstract: Reconstructing a dynamic target moving over a large area is challenging. Standard approaches for dynamic object reconstruction require dense coverage in both the viewing space and the tempor...
TimeColor: Flexible Reference Colorization via Temporal Concatenation : Abstract: Most colorization models condition only on a single reference, typically the first frame of the scene. However, this approach ignores other sources of conditional data, such as character she...
ReMA: A Training-Free Plug-and-Play Mixing Augmentation for Video Behavior Recognition : Abstract: Video behavior recognition demands stable and discriminative representations under complex spatiotemporal variations. However, prevailing data augmentation strategies for videos remain large...
Depth-Synergized Mamba Meets Memory Experts for All-Day Image Reflection Separation : Abstract: Image reflection separation aims to disentangle the transmission layer and the reflection layer from a blended image. Existing methods rely on limited information from a single image, tendin...
Joint Geometry-Appearance Human Reconstruction in a Unified Latent Space via Bridge Diffusion : Abstract: Achieving consistent and high-fidelity geometry and appearance reconstruction of 3D digital humans from a single RGB image is inherently a challenging task. Existing studies typically resort...
Intelligent Traffic Surveillance for Real-Time Vehicle Detection, License Plate Recognition, and Speed Estimation : Abstract: Speeding is a major contributor to road fatalities, particularly in developing countries such as Uganda, where road safety infrastructure is limited. This study proposes a real-time intellig...
OmniVaT: Single Domain Generalization for Multimodal Visual-Tactile Learning : Abstract: Visual-tactile learning (VTL) enables embodied agents to perceive the physical world by integrating visual (VIS) and tactile (TAC) sensors. However, VTL still suffers from modality discrepan...
Efficient Prediction of Dense Visual Embeddings via Distillation and RGB-D Transformers : Abstract: In domestic environments, robots require a comprehensive understanding of their surroundings to interact effectively and intuitively with untrained humans. In this paper, we propose DVEForme...
Mask-Conditioned Voxel Diffusion for Joint Geometry and Color Inpainting : Abstract: We present a lightweight two-stage framework for joint geometry and color inpainting of damaged 3D objects, motivated by the digital restoration of cultural heritage artifacts. The pipeline ...
BHaRNet: Reliability-Aware Body-Hand Modality Expertized Networks for Fine-grained Skeleton Action Recognition : Abstract: Skeleton-based human action recognition (HAR) has achieved remarkable progress with graph-based architectures. However, most existing methods remain body-centric, focusing on large-scale mot...
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos : Abstract: In this paper, we propose NeoVerse, a versatile 4D world model that is capable of 4D reconstruction, novel-trajectory video generation, and rich downstream applications. We first identify a ...
RoLID-11K: A Dashcam Dataset for Small-Object Roadside Litter Detection : Abstract: Roadside litter poses environmental, safety and economic challenges, yet current monitoring relies on labour-intensive surveys and public reporting, providing limited spatial coverage. Exist...
ABFR-KAN: Kolmogorov-Arnold Networks for Functional Brain Analysis : Abstract: Functional connectivity (FC) analysis, a valuable tool for computer-aided brain disorder diagnosis, traditionally relies on atlas-based parcellation. However, issues relating to selection bi...
Robust Assembly Progress Estimation via Deep Metric Learning : Abstract: In recent years, the advancement of AI technologies has accelerated the development of smart factories. In particular, the automatic monitoring of product assembly progress is crucial for im...
CPPO: Contrastive Perception for Vision Language Policy Optimization : Abstract: We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision-language models (VLMs). While reinforcement learning (RL) has advanced reasoning in language mode...
All-in-One Video Restoration under Smoothly Evolving Unknown Weather Degradations : Abstract: All-in-one image restoration aims to recover clean images from diverse unknown degradations using a single model. But extending this task to videos faces unique challenges. Existing approach...
FreeText: Training-Free Text Rendering in Diffusion Transformers via Attention Localization and Spectral Glyph Injection : Abstract: Large-scale text-to-image (T2I) diffusion models excel at open-domain synthesis but still struggle with precise text rendering, especially for multi-line layouts, dense typography, and long-...
Boosting Segment Anything Model to Generalize Visually Non-Salient Scenarios : Abstract: Segment Anything Model (SAM), known for its remarkable zero-shot segmentation capabilities, has garnered significant attention in the community. Nevertheless, its performance is challenged w...
DynaDrag: Dynamic Drag-Style Image Editing by Motion Prediction : Abstract: To achieve pixel-level image manipulation, drag-style image editing which edits images using points or trajectories as conditions is attracting widespread attention. Most previous methods fo...
SingBAG Pro: Accelerating point cloud-based iterative reconstruction for 3D photoacoustic imaging under arbitrary array : Abstract: High-quality three-dimensional (3D) photoacoustic imaging (PAI) is gaining increasing attention in clinical applications. To address the challenges of limited space and high costs, irregular...
AEGIS: Exploring the Limit of World Knowledge Capabilities for Unified Mulitmodal Models : Abstract: The capability of Unified Multimodal Models (UMMs) to apply world knowledge across diverse tasks remains a critical, unresolved challenge. Existing benchmarks fall short, offering only siloe...
A Cascaded Information Interaction Network for Precise Image Segmentation : Abstract: Visual perception plays a pivotal role in enabling autonomous behavior, offering a cost-effective and efficient alternative to complex multi-sensor systems. However, robust segmentation rema...
GranAlign: Granularity-Aware Alignment Framework for Zero-Shot Video Moment Retrieval : Abstract: Zero-shot video moment retrieval (ZVMR) is the task of localizing a temporal moment within an untrimmed video using a natural language query without relying on task-specific training data. T...
SafeMo: Linguistically Grounded Unlearning for Trustworthy Text-to-Motion Generation : Abstract: Text-to-motion (T2M) generation with diffusion backbones achieves strong realism and alignment. Safety concerns in T2M methods have been raised in recent years; existing methods replace disc...
Modality Dominance-Aware Optimization for Embodied RGB-Infrared Perception : Abstract: RGB-Infrared (RGB-IR) multimodal perception is fundamental to embodied multimedia systems operating in complex physical environments. Although recent cross-modal fusion methods have advanced...
RePose: A Real-Time 3D Human Pose Estimation and Biomechanical Analysis Framework for Rehabilitation : Abstract: We propose a real-time 3D human pose estimation and motion analysis method termed RePose for rehabilitation training. It is capable of real-time monitoring and evaluation of patients'motion ...
Quality Detection of Stored Potatoes via Transfer Learning: A CNN and Vision Transformer Approach : Abstract: Image-based deep learning provides a non-invasive, scalable solution for monitoring potato quality during storage, addressing key challenges such as sprout detection, weight loss estimation,...
Reconstructing Building Height from Spaceborne TomoSAR Point Clouds Using a Dual-Topology Network : Abstract: Reliable building height estimation is essential for various urban applications. Spaceborne SAR tomography (TomoSAR) provides weather-independent, side-looking observations that capture faca...
CRoPS: A Training-Free Hallucination Mitigation Framework for Vision-Language Models : Abstract: Despite the rapid success of Large Vision-Language Models (LVLMs), a persistent challenge is their tendency to generate hallucinated content, undermining reliability in real-world use. Exist...
Pixel-to-4D: Camera-Controlled Image-to-Video Generation with Dynamic 3D Gaussians : Abstract: Humans excel at forecasting the future dynamics of a scene given just a single image. Video generation models that can mimic this ability are an essential component for intelligent systems. ...
Efficient Deep Demosaicing with Spatially Downsampled Isotropic Networks : Abstract: In digital imaging, image demosaicing is a crucial first step which recovers the RGB information from a color filter array (CFA). Oftentimes, deep learning is utilized to perform image demos...
RGS-SLAM: Robust Gaussian Splatting SLAM with One-Shot Dense Initialization : Abstract: We introduce RGS-SLAM, a robust Gaussian-splatting SLAM framework that replaces the residual-driven densification stage of GS-SLAM with a training-free correspondence-to-Gaussian initializat...
Multi-Level Feature Fusion for Continual Learning in Visual Quality Inspection : Abstract: Deep neural networks show great potential for automating various visual quality inspection tasks in manufacturing. However, their applicability is limited in more volatile scenarios, such as...
Grading Handwritten Engineering Exams with Multimodal Large Language Models : Abstract: Handwritten STEM exams capture open-ended reasoning and diagrams, but manual grading is slow and difficult to scale. We present an end-to-end workflow for grading scanned handwritten enginee...
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float (DFloat11) : Abstract: Large-scale AI models, such as Large Language Models (LLMs) and Diffusion Models (DMs), have grown rapidly in size, creating significant challenges for efficient deployment on resource-const...
Infinite-Width Limit of a Single Attention Layer: Analysis via Tensor Programs : Abstract: In modern theoretical analyses of neural networks, the infinite-width limit is often invoked to justify Gaussian approximations of neuron preactivations (e.g., via neural network Gaussian pr...
Clustering by Denoising: Latent plug-and-play diffusion for single-cell data : Abstract: Single-cell RNA sequencing (scRNA-seq) enables the study of cellular heterogeneity. Yet, clustering accuracy, and with it downstream analyses based on cell labels, remain challenging due to ...
MCD: Marginal Contrastive Discrimination for conditional density estimation : Abstract: We consider the problem of conditional density estimation, which is a major topic of interest in the fields of statistical and machine learning. Our method, called Marginal Contrastive Discr...
Sparse-Input Neural Network using Group Concave Regularization : Abstract: Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available samp...
Real-Time Forecasting of Pathological Gait via IMU Navigation: A Few-Shot and Generative Learning Framework for Wearable Devices : Abstract: Current gait analysis faces challenges in various aspects, including limited and poorly labeled data within existing wearable electronics databases, difficulties in collecting patient data d...
Sorbet: A Neuromorphic Hardware-Compatible Transformer-Based Spiking Language Model : Abstract: For reasons such as privacy, there are use cases for language models at the edge. This has given rise to small language models targeted for deployment in resource-constrained devices where e...
Mitigating optimistic bias in entropic risk estimation and optimization : Abstract: The entropic risk measure is widely used in high-stakes decision-making across economics, management science, finance, and safety-critical control systems because it captures tail risks asso...
AutoTrust: Benchmarking Trustworthiness in Large Vision Language Models for Autonomous Driving : Abstract: Recent advancements in large vision language models (VLMs) tailored for autonomous driving (AD) have shown strong scene understanding and reasoning capabilities, making them undeniable candi...
RIMRULE: Improving Tool-Using Language Agents via MDL-Guided Rule Learning : Abstract: Large language models (LLMs) often struggle to use tools reliably in domain-specific settings, where APIs may be idiosyncratic, under-documented, or tailored to private workflows. This highl...
Universal Adaptive Constraint Propagation: Scaling Structured Inference for Large Language Models via Meta-Reinforcement Learning : Abstract: Large language models increasingly require structured inference, from JSON schema enforcement to multi-lingual parsing, where outputs must satisfy complex constraints. We introduce MetaJuLS,...
Pat-DEVAL: Chain-of-Legal-Thought Evaluation for Patent Description : Abstract: Patent descriptions must deliver comprehensive technical disclosure while meeting strict legal standards such as enablement and written description requirements. Although large language mode...
Knowledge Distillation for Temporal Knowledge Graph Reasoning with Large Language Models : Abstract: Reasoning over temporal knowledge graphs (TKGs) is fundamental to improving the efficiency and reliability of intelligent decision-making systems and has become a key technological foundatio...
From Evidence-Based Medicine to Knowledge Graph: Retrieval-Augmented Generation for Sports Rehabilitation and a Domain Benchmark : Abstract: In medicine, large language models (LLMs) increasingly rely on retrieval-augmented generation (RAG) to ground outputs in up-to-date external evidence. However, current RAG approaches focus p...
Talk Less, Verify More: Improving LLM Assistants with Semantic Checks and Execution Feedback : Abstract: As large language model (LLM) assistants become increasingly integrated into enterprise workflows, their ability to generate accurate, semantically aligned, and executable outputs is critica...
The Role of Mixed-Language Documents for Multilingual Large Language Model Pretraining : Abstract: Multilingual large language models achieve impressive cross-lingual performance despite largely monolingual pretraining. While bilingual data in pretraining corpora is widely believed to ena...
Vision-Language Reasoning for Geolocalization: A Reinforcement Learning Approach : Abstract: Recent advances in vision-language models have opened up new possibilities for reasoning-driven image geolocalization. However, existing approaches often rely on synthetic reasoning annotati...
Toward Better Temporal Structures for Geopolitical Events Forecasting : Abstract: Forecasting on geopolitical temporal knowledge graphs (TKGs) through the lens of large language models (LLMs) has recently gained traction. While TKGs and their generalization, hyper-relatio...
Comparative Efficiency Analysis of Lightweight Transformer Models: A Multi-Domain Empirical Benchmark for Enterprise NLP Deployment : Abstract: In the rapidly evolving landscape of enterprise natural language processing (NLP), the demand for efficient, lightweight models capable of handling multi-domain text automation tasks has int...
Rule-Based Approaches to Atomic Sentence Extraction : Abstract: Natural language often combines multiple ideas into complex sentences. Atomic sentence extraction, the task of decomposing complex sentences into simpler sentences that each express a single...
Retrieval--Reasoning Processes for Multi-hop Question Answering: A Four-Axis Design Framework and Empirical Trends : Abstract: Multi-hop question answering (QA) requires systems to iteratively retrieve evidence and reason across multiple hops. While recent RAG and agentic methods report strong results, the underlyin...
A Language-Agnostic Hierarchical LoRA-MoE Architecture for CTC-based Multilingual ASR : Abstract: Large-scale multilingual ASR (mASR) models such as Whisper achieve strong performance but incur high computational and latency costs, limiting their deployment on resource-constrained edge d...
InfoSynth: Information-Guided Benchmark Synthesis for LLMs : Abstract: Large language models (LLMs) have demonstrated significant advancements in reasoning and code generation. However, efficiently creating new benchmarks to evaluate these capabilities remains ...
CSSBench: Evaluating the Safety of Lightweight LLMs against Chinese-Specific Adversarial Patterns : Abstract: Large language models (LLMs) are increasingly deployed in cost-sensitive and on-device scenarios, and safety guardrails have advanced mainly in English. However, real-world Chinese malicious...
Beyond IVR: Benchmarking Customer Support LLM Agents for Business-Adherence : Abstract: Traditional customer support systems, such as Interactive Voice Response (IVR), rely on rigid scripts and lack the flexibility required for handling complex, policy-driven tasks. While large...
Probabilistic Guarantees for Reducing Contextual Hallucinations in LLMs : Abstract: Large language models (LLMs) frequently produce contextual hallucinations, where generated content contradicts or ignores information explicitly stated in the prompt. Such errors are particu...
Physio-DPO: Aligning Large Language Models with the Protein Energy Landscape to Eliminate Structural Hallucinations : Abstract: Large Protein Language Models have shown strong potential for generative protein design, yet they frequently produce structural hallucinations, generating sequences with high linguistic like...
Sigmoid Head for Quality Estimation under Language Ambiguity : Abstract: Language model (LM) probability is not a reliable quality estimator, as natural language is ambiguous. When multiple output options are valid, the model's probability distribution is spread ...
Adapting Natural Language Processing Models Across Jurisdictions: A pilot Study in Canadian Cancer Registries : Abstract: Population-based cancer registries depend on pathology reports as their primary diagnostic source, yet manual abstraction is resource-intensive and contributes to delays in cancer data. Whil...
Learning Speech Representations with Variational Predictive Coding : Abstract: Despite being the best known objective for learning speech representations, the HuBERT objective has not been further developed and improved. We argue that it is the lack of an underlying pr...
Overlooked Safety Vulnerability in LLMs: Malicious Intelligent Optimization Algorithm Request and its Jailbreak : Abstract: The widespread deployment of large language models (LLMs) has raised growing concerns about their misuse risks and associated safety issues. While prior studies have examined the safety of L...
From Sight to Insight: Improving Visual Reasoning Capabilities of Multimodal Models via Reinforcement Learning : Abstract: Reinforcement learning (RL) has emerged as a promising approach for eliciting reasoning chains before generating final answers. However, multimodal large language models (MLLMs) generate rea...
A Chain-of-Thought Approach to Semantic Query Categorization in e-Commerce Taxonomies : Abstract: Search in e-Commerce is powered at the core by a structured representation of the inventory, often formulated as a category taxonomy. An important capability in e-Commerce with hierarchical ...
EXAONE 3.5: Series of Large Language Models for Real-world Use Cases : Abstract: This technical report introduces the EXAONE 3.5 instruction-tuned language models, developed and released by LG AI Research. The EXAONE 3.5 language models are offered in three configuration...
Smart Fault Detection in Nanosatellite Electrical Power System : Abstract: This paper presents a new detection method of faults at Nanosatellites' electrical power without an Attitude Determination Control Subsystem (ADCS) at the LEO orbit. Each part of this system...
Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models : Abstract: Human detection in videos plays an important role in various real-life applications. Most traditional approaches depend on utilizing handcrafted features, which are problem-dependent and opt...
A Comparative Analysis of Interpretable Machine Learning Methods : Abstract: In recent years, Machine Learning (ML) has seen widespread adoption across a broad range of sectors, including high-stakes domains such as healthcare, finance, and law. This growing reliance...
A Comparative Study of Adaptation Strategies for Time Series Foundation Models in Anomaly Detection : Abstract: Time series anomaly detection is essential for the reliable operation of complex systems, but most existing methods require extensive task-specific training. We explore whether time series f...
Controllable Concept Bottleneck Models : Abstract: Concept Bottleneck Models (CBMs) have garnered much attention for their ability to elucidate the prediction process through a human-understandable concept layer. However, most previous studi...
Imitation from Observations with Trajectory-Level Generative Embeddings : Abstract: We consider the offline imitation learning from observations (LfO) where the expert demonstrations are scarce and the available offline suboptimal data are far from the expert behavior. Many...
Detecting Spike Wave Discharges (SWD) using 1-dimensional Residual UNet : Abstract: The manual labeling of events in electroencephalography (EEG) records is time-consuming. This is especially true when EEG recordings are taken continuously over weeks to months. Therefore, a...
Laplacian Kernelized Bandit : Abstract: We study multi-user contextual bandits where users are related by a graph and their reward functions exhibit both non-linear behavior and graph homophily. We introduce a principled joint pen...
When Small Models Are Right for Wrong Reasons: Process Verification for Trustworthy Agents : Abstract: Deploying small language models (7-9B parameters) as autonomous agents requires trust in their reasoning, not just their outputs. We reveal a critical reliability crisis: 50-69\% of correct ...
A Sparse-Attention Deep Learning Model Integrating Heterogeneous Multimodal Features for Parkinson's Disease Severity Profiling : Abstract: Characterising the heterogeneous presentation of Parkinson's disease (PD) requires integrating biological and clinical markers within a unified predictive framework. While multimodal data pr...
Federated Customization of Large Models: Approaches, Experiments, and Insights : Abstract: In this article, we explore federated customization of large models and highlight the key challenges it poses within the federated learning framework. We review several popular large model c...
Cloud-Native Generative AI for Automated Planogram Synthesis: A Diffusion Model Approach for Multi-Store Retail Optimization : Abstract: Planogram creation is a significant challenge for retail, requiring an average of 30 hours per complex layout. This paper introduces a cloud-native architecture using diffusion models to aut...
Entropy Production in Machine Learning Under Fokker-Planck Probability Flow : Abstract: Machine learning models deployed in nonstationary environments experience performance degradation due to data drift. While many drift detection heuristics exist, most lack a principled dynam...
Adversarial Samples Are Not Created Equal : Abstract: Over the past decade, numerous theories have been proposed to explain the widespread vulnerability of deep neural networks to adversarial evasion attacks. Among these, the theory of non-robu...
Cycling Race Time Prediction: A Personalized Machine Learning Approach Using Route Topology and Training Load : Abstract: Predicting cycling duration for a given route is essential for training planning and event preparation. Existing solutions rely on physics-based models that require extensive parameterizatio...
Traffic-Aware Optimal Taxi Placement Using Graph Neural Network-Based Reinforcement Learning : Abstract: In the context of smart city transportation, efficient matching of taxi supply with passenger demand requires real-time integration of urban traffic network data and mobility patterns. Conve...
Do Chatbot LLMs Talk Too Much? The YapBench Benchmark : Abstract: Large Language Models (LLMs) such as ChatGPT, Claude, and Gemini increasingly act as general-purpose copilots, yet they often respond with unnecessary length on simple requests, adding redun...
TeleDoCTR: Domain-Specific and Contextual Troubleshooting for Telecommunications : Abstract: Ticket troubleshooting refers to the process of analyzing and resolving problems that are reported through a ticketing system. In large organizations offering a wide range of services, this ...
ARISE: Adaptive Reinforcement Integrated with Swarm Exploration : Abstract: Effective exploration remains a key challenge in RL, especially with non-stationary rewards or high-dimensional policies. We introduce ARISE, a lightweight framework that enhances reinforcem...
Bayesian Inverse Games with High-Dimensional Multi-Modal Observations : Abstract: Many multi-agent interaction scenarios can be naturally modeled as noncooperative games, where each agent's decisions depend on others' future actions. However, deploying game-theoretic plan...
BSAT: B-Spline Adaptive Tokenizer for Long-Term Time Series Forecasting : Abstract: Long-term time series forecasting using transformers is hampered by the quadratic complexity of self-attention and the rigidity of uniform patching, which may be misaligned with the data's s...
Precision Autotuning for Linear Solvers via Contextual Bandit-Based RL : Abstract: We propose a reinforcement learning (RL) framework for adaptive precision tuning of linear solvers, and can be extended to general algorithms. The framework is formulated as a contextual ban...
The Reasoning-Creativity Trade-off: Toward Creativity-Driven Problem Solving : Abstract: State-of-the-art large language model (LLM) pipelines rely on bootstrapped reasoning loops: sampling diverse chains of thought and reinforcing the highest-scoring ones, mainly optimizing cor...
A Machine Learning Framework for Off Ball Defensive Role and Performance Evaluation in Football : Abstract: Evaluating off-ball defensive performance in football is challenging, as traditional metrics do not capture the nuanced coordinated movements that limit opponent action selection and success...
Memory Bank Compression for Continual Adaptation of Large Language Models : Abstract: Large Language Models (LLMs) have become a mainstay for many everyday applications. However, as data evolve their knowledge quickly becomes outdated. Continual learning aims to update LLMs w...
Categorical Reparameterization with Denoising Diffusion models : Abstract: Gradient-based optimization with categorical variables typically relies on score-function estimators, which are unbiased but noisy, or on continuous relaxations that replace the discrete dis...
Active learning for data-driven reduced models of parametric differential systems with Bayesian operator inference : Abstract: This work develops an active learning framework to intelligently enrich data-driven reduced-order models (ROMs) of parametric dynamical systems, which can serve as the foundation of virtual ...
Deep Learning Approach for the Diagnosis of Pediatric Pneumonia Using Chest X-ray Imaging : Abstract: Pediatric pneumonia remains a leading cause of morbidity and mortality in children worldwide. Timely and accurate diagnosis is critical but often challenged by limited radiological expertise...
Group Cross-Correlations with Faintly Constrained Filters : Abstract: We provide a notion of group cross-correlations, where the associated filter is not as tightly constrained as in the previous literature. This resolves an incompatibility previous constraint...
Automated electrostatic characterization of quantum dot devices in single- and bilayer heterostructures : Abstract: As quantum dot (QD)-based spin qubits advance toward larger, more complex device architectures, rapid, automated device characterization and data analysis tools become critical. The orientat...
Cuffless, calibration-free hemodynamic monitoring with physics-informed machine learning models : Abstract: Wearable technologies have the potential to transform ambulatory and at-home hemodynamic monitoring by providing continuous assessments of cardiovascular health metrics and guiding clinical ...
Reinforcement learning with timed constraints for robotics motion planning : Abstract: Robotic systems operating in dynamic and uncertain environments increasingly require planners that satisfy complex task sequences while adhering to strict temporal constraints. Metric Interv...
It's Never Too Late: Noise Optimization for Collapse Recovery in Trained Diffusion Models : Abstract: Contemporary text-to-image models exhibit a surprising degree of mode collapse, as can be seen when sampling several images given the same text prompt. While previous work has attempted to a...
Combining datasets with different ground truths using Low-Rank Adaptation to generalize image-based CNN models for photometric redshift prediction : Abstract: In this work, we demonstrate how Low-Rank Adaptation (LoRA) can be used to combine different galaxy imaging datasets to improve redshift estimation with CNN models for cosmology. LoRA is an ...
StockBot 2.0: Vanilla LSTMs Outperform Transformer-based Forecasting for Stock Prices : Abstract: Accurate forecasting of financial markets remains a long-standing challenge due to complex temporal and often latent dependencies, non-linear dynamics, and high volatility. Building on our e...
Detecting Unobserved Confounders: A Kernelized Regression Approach : Abstract: Detecting unobserved confounders is crucial for reliable causal inference in observational studies. Existing methods require either linearity assumptions or multiple heterogeneous environmen...
Application Research of a Deep Learning Model Integrating CycleGAN and YOLO in PCB Infrared Defect Detection : Abstract: This paper addresses the critical bottleneck of infrared (IR) data scarcity in Printed Circuit Board (PCB) defect detection by proposing a cross-modal data augmentation framework integrating...
Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing : Abstract: The rapid growth of artificial intelligence (AI) has brought novel data processing and generative capabilities but also escalating energy requirements. This challenge motivates renewed inter...
Rectifying Adversarial Examples Using Their Vulnerabilities : Abstract: Deep neural network-based classifiers are prone to errors when processing adversarial examples (AEs). AEs are minimally perturbed input data undetectable to humans posing significant risks t...
Solving nonlinear subsonic compressible flow in infinite domain via multi-stage neural networks : Abstract: In aerodynamics, accurately modeling subsonic compressible flow over airfoils is critical for aircraft design. However, solving the governing nonlinear perturbation velocity potential equati...
Deterministic Coreset for Lp Subspace : Abstract: We introduce the first iterative algorithm for constructing a $\varepsilon$-coreset that guarantees deterministic $\ell_p$ subspace embedding for any $p \in [1,\infty)$ and any $\varepsilon ...
NOS-Gate: Queue-Aware Streaming IDS for Consumer Gateways under Timing-Controlled Evasion : Abstract: Timing and burst patterns can leak through encryption, and an adaptive adversary can exploit them. This undermines metadata-only detection in a stand-alone consumer gateway. Therefore, consu...
Revati: Transparent GPU-Free Time-Warp Emulation for LLM Serving : Abstract: Deploying LLMs efficiently requires testing hundreds of serving configurations, but evaluating each one on a GPU cluster takes hours and costs thousands of dollars. Discrete-event simulators...
Secure, Verifiable, and Scalable Multi-Client Data Sharing via Consensus-Based Privacy-Preserving Data Distribution : Abstract: We propose the Consensus-Based Privacy-Preserving Data Distribution (CPPDD) framework, a lightweight and post-setup autonomous protocol for secure multi-client data aggregation. The framewor...
Noise-Aware Named Entity Recognition for Historical VET Documents : Abstract: This paper addresses Named Entity Recognition (NER) in the domain of Vocational Education and Training (VET), focusing on historical, digitized documents that suffer from OCR-induced noise. ...
Interpretable Machine Learning for Quantum-Informed Property Predictions in Artificial Sensing Materials : Abstract: Digital sensing faces challenges in developing sustainable methods to extend the applicability of customized e-noses to complex body odor volatilome (BOV). To address this challenge, we deve...
Improving LLM-Assisted Secure Code Generation through Retrieval-Augmented-Generation and Multi-Tool Feedback : Abstract: Large Language Models (LLMs) can generate code but often introduce security vulnerabilities, logical inconsistencies, and compilation errors. Prior work demonstrates that LLMs benefit substa...
Generative Conditional Missing Imputation Networks : Abstract: In this study, we introduce a sophisticated generative conditional strategy designed to impute missing values within datasets, an area of considerable importance in statistical analysis. Spe...
AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules : Abstract: We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Fun...
HyperPriv-EPN: Hypergraph Learning with Privileged Knowledge for Ependymoma Prognosis : Abstract: Preoperative prognosis of Ependymoma is critical for treatment planning but challenging due to the lack of semantic insights in MRI compared to post-operative surgical reports. Existing mult...
Three factor delay learning rules for spiking neural networks : Abstract: Spiking Neural Networks (SNNs) are dynamical systems that operate on spatiotemporal data, yet their learnable parameters are often limited to synaptic weights, contributing little to tempora...
Sparse FEONet: A Low-Cost, Memory-Efficient Operator Network via Finite-Element Local Sparsity for Parametric PDEs : Abstract: In this paper, we study the finite element operator network (FEONet), an operator-learning method for parametric problems, originally introduced in J. Y. Lee, S. Ko, and Y. Hong, Finite Elem...
Cost Optimization in Production Line Using Genetic Algorithm : Abstract: This paper presents a genetic algorithm (GA) approach to cost-optimal task scheduling in a production line. The system consists of a set of serial processing tasks, each with a given duratio...
Two Deep Learning Approaches for Automated Segmentation of Left Ventricle in Cine Cardiac MRI : Abstract: Left ventricle (LV) segmentation is critical for clinical quantification and diagnosis of cardiac images. In this work, we propose two novel deep learning architectures called LNU-Net and IB...
Distributed Sparse Linear Regression under Communication Constraints : Abstract: In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the e...
LeanQuant: Accurate and Scalable Large Language Model Quantization with Loss-error-aware Grid : Abstract: Large language models (LLMs) have shown immense potential across various domains, but their high memory requirements and inference costs remain critical challenges for deployment. Post-train...
Parametrized Sharing for Multi-Agent Hybrid DRL for Multiple Multi-Functional RISs-Aided Downlink NOMA Networks : Abstract: Multi-functional reconfigurable intelligent surface (MF-RIS) is conceived to address the communication efficiency thanks to its extended signal coverage from its active RIS capability and se...
ECR: Manifold-Guided Semantic Cues for Compact Language Models : Abstract: Compact models often lose the structure of their embedding space. The issue shows up when the capacity is tight or the data spans several languages. Such collapse makes it difficult for down...
CoCo-Fed: A Unified Framework for Memory- and Communication-Efficient Federated Learning at the Wireless Edge : Abstract: The deployment of large-scale neural networks within the Open Radio Access Network (O-RAN) architecture is pivotal for enabling native edge intelligence. However, this paradigm faces two cri...
A Comprehensive Dataset for Human vs. AI Generated Image Detection : Abstract: Multimodal generative AI systems like Stable Diffusion, DALL-E, and MidJourney have fundamentally changed how synthetic images are created. These tools drive innovation but also enable the s...
Cracking IoT Security: Can LLMs Outsmart Static Analysis Tools? : Abstract: Smart home IoT platforms such as openHAB rely on Trigger Action Condition (TAC) rules to automate device behavior, but the interplay among these rules can give rise to interaction threats, u...
Improving Scientific Document Retrieval with Academic Concept Index : Abstract: Adapting general-domain retrievers to scientific domains is challenging due to the scarcity of large-scale domain-specific relevance annotations and the substantial mismatch in vocabulary an...
Learning to be Reproducible: Custom Loss Design for Robust Neural Networks : Abstract: To enhance the reproducibility and reliability of deep learning models, we address a critical gap in current training methodologies: the lack of mechanisms that ensure consistent and robust ...
Priority-Aware Multi-Robot Coverage Path Planning : Abstract: Multi-robot systems are widely used for coverage tasks that require efficient coordination across large environments. In Multi-Robot Coverage Path Planning (MCPP), the objective is typically...
HFedMoE: Resource-aware Heterogeneous Federated Learning with Mixture-of-Experts : Abstract: While federated learning (FL) enables fine-tuning of large language models (LLMs) without compromising data privacy, the substantial size of an LLM renders on-device training impractical for...
Stronger Approximation Guarantees for Non-Monotone {\gamma}-Weakly DR-Submodular Maximization : Abstract: Maximizing submodular objectives under constraints is a fundamental problem in machine learning and optimization. We study the maximization of a nonnegative, non-monotone $γ$-weakly DR-submo...
Noise-Robust Tiny Object Localization with Flows : Abstract: Despite significant advances in generic object detection, a persistent performance gap remains for tiny objects compared to normal-scale objects. We demonstrate that tiny objects are highly ...
Interpretability-Guided Bi-objective Optimization: Aligning Accuracy and Explainability : Abstract: This paper introduces Interpretability-Guided Bi-objective Optimization (IGBO), a framework that trains interpretable models by incorporating structured domain knowledge via a bi-objective f...
Avatar Forcing: Real-Time Interactive Head Avatar Generation for Natural Conversation : Abstract: Talking head generation creates lifelike avatars from static portraits for virtual communication and content creation. However, current models do not yet convey the feeling of truly interact...
Fast-weight Product Key Memory : Abstract: Sequence modeling layers in modern language models typically face a trade-off between storage capacity and computational efficiency. While Softmax attention offers unbounded storage at prohi...
IRPO: Scaling the Bradley-Terry Model via Reinforcement Learning : Abstract: Generative Reward Models (GRMs) have attracted considerable research interest in reward modeling due to their interpretability, inference-time scalability, and potential for refinement throu...
QSLM: A Performance- and Memory-aware Quantization Framework with Tiered Search Strategy for Spike-driven Language Models : Abstract: Large Language Models (LLMs) have been emerging as prominent AI models for solving many natural language tasks due to their high performance (e.g., accuracy) and capabilities in generating h...
Detecting Performance Degradation under Data Shift in Pathology Vision-Language Model : Abstract: Vision-Language Models have demonstrated strong potential in medical image analysis and disease diagnosis. However, after deployment, their performance may deteriorate when the input data di...
Exploring the Performance of Large Language Models on Subjective Span Identification Tasks : Abstract: Identifying relevant text spans is important for several downstream tasks in NLP, as it contributes to model explainability. While most span identification approaches rely on relatively smal...
Stochastic Actor-Critic: Mitigating Overestimation via Temporal Aleatoric Uncertainty : Abstract: Off-policy actor-critic methods in reinforcement learning train a critic with temporal-difference updates and use it as a learning signal for the policy (actor). This design typically achiev...
LLM Agents for Combinatorial Efficient Frontiers: Investment Portfolio Optimization : Abstract: Investment portfolio optimization is a task conducted in all major financial institutions. The Cardinality Constrained Mean-Variance Portfolio Optimization (CCPO) problem formulation is ubiq...
FedHypeVAE: Federated Learning with Hypernetwork Generated Conditional VAEs for Differentially Private Embedding Sharing : Abstract: Federated data sharing promises utility without centralizing raw data, yet existing embedding-level generators struggle under non-IID client heterogeneity and provide limited formal protecti...
Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning : Abstract: We present a training-free method for detecting valid mathematical reasoning in large language models through spectral analysis of attention patterns. By treating attention matrices as adjac...
From Transformers to LLMs: A Systematic Survey of Efficiency Considerations in NLP : Abstract: The emergence of Transformer-based Large Language Models (LLMs) has substantially augmented the capabilities of Natural Language Processing (NLP), thereby intensifying the demand for computa...
EXAONE 3.0 7.8B Instruction Tuned Language Model : Abstract: We introduce EXAONE 3.0 instruction-tuned language model, the first open model in the family of Large Language Models (LLMs) developed by LG AI Research. Among different model sizes, we publ...
Robust Molecular Property Prediction via Densifying Scarce Labeled Data : Abstract: A widely recognized limitation of molecular prediction models is their reliance on structures observed in the training data, resulting in poor generalization to out-of-distribution compounds...
The Trojan in the Vocabulary: Stealthy Sabotage of LLM Composition : Abstract: The open-weight LLM ecosystem is increasingly defined by model composition techniques (such as weight merging, speculative decoding, and vocabulary expansion) that remix capabilities from di...
IMBWatch -- a Spatio-Temporal Graph Neural Network approach to detect Illicit Massage Business : Abstract: Illicit Massage Businesses (IMBs) are a covert and persistent form of organized exploitation that operate under the facade of legitimate wellness services while facilitating human traffickin...
Exploration in the Limit : Abstract: In fixed-confidence best arm identification (BAI), the objective is to quickly identify the optimal option while controlling the probability of error below a desired threshold. Despite the p...
Dynamic Bayesian Optimization Framework for Instruction Tuning in Partial Differential Equation Discovery : Abstract: Large Language Models (LLMs) show promise for equation discovery, yet their outputs are highly sensitive to prompt phrasing, a phenomenon we term instruction brittleness. Static prompts cann...
GRL-SNAM: Geometric Reinforcement Learning with Path Differential Hamiltonians for Simultaneous Navigation and Mapping in Unknown Environments : Abstract: We present GRL-SNAM, a geometric reinforcement learning framework for Simultaneous Navigation and Mapping(SNAM) in unknown environments. A SNAM problem is challenging as it needs to design h...
Reinforcement Learning with Function Approximation for Non-Markov Processes : Abstract: We study reinforcement learning methods with linear function approximation under non-Markov state and cost processes. We first consider the policy evaluation method and show that the algorit...
The Weather Paradox: Why Precipitation Fails to Predict Traffic Accident Severity in Large-Scale US Data : Abstract: This study investigates the predictive capacity of environmental, temporal, and spatial factors on traffic accident severity in the United States. Using a dataset of 500,000 U.S. traffic acc...
Sequential Reservoir Computing for Efficient High-Dimensional Spatiotemporal Forecasting : Abstract: Forecasting high-dimensional spatiotemporal systems remains computationally challenging for recurrent neural networks (RNNs) and long short-term memory (LSTM) models due to gradient-based tr...
Early Prediction of Liver Cirrhosis Up to Three Years in Advance: A Machine Learning Study Benchmarking Against the FIB-4 Score : Abstract: Objective: Develop and evaluate machine learning (ML) models for predicting incident liver cirrhosis one, two, and three years prior to diagnosis using routinely collected electronic health ...
Reinforcement-Learned Unequal Error Protection for Quantized Semantic Embeddings : Abstract: This paper tackles the pressing challenge of preserving semantic meaning in communication systems constrained by limited bandwidth. We introduce a novel reinforcement learning framework that...
Optimized Hybrid Feature Engineering for Resource-Efficient Arrhythmia Detection in ECG Signals: An Optimization Framework : Abstract: Cardiovascular diseases, particularly arrhythmias, remain a leading global cause of mortality, necessitating continuous monitoring via the Internet of Medical Things (IoMT). However, state-o...
Unknown Aware AI-Generated Content Attribution : Abstract: The rapid advancement of photorealistic generative models has made it increasingly important to attribute the origin of synthetic content, moving beyond binary real or fake detection toward ...
Robust Graph Fine-Tuning with Adversarial Graph Prompting : Abstract: Parameter-Efficient Fine-Tuning (PEFT) method has emerged as a dominant paradigm for adapting pre-trained GNN models to downstream tasks. However, existing PEFT methods usually exhibit signi...
Task-Driven Kernel Flows: Label Rank Compression and Laplacian Spectral Filtering : Abstract: We present a theory of feature learning in wide L2-regularized networks showing that supervised learning is inherently compressive. We derive a kernel ODE that predicts a "water-filling" spe...
Can Optimal Transport Improve Federated Inverse Reinforcement Learning? : Abstract: In robotics and multi-agent systems, fleets of autonomous agents often operate in subtly different environments while pursuing a common high-level objective. Directly pooling their data to l...
Quantum King-Ring Domination in Chess: A QAOA Approach : Abstract: The Quantum Approximate Optimization Algorithm (QAOA) is extensively benchmarked on synthetic random instances such as MaxCut, TSP, and SAT problems, but these lack semantic structure and hu...
An Agentic Framework for Neuro-Symbolic Programming : Abstract: Integrating symbolic constraints into deep learning models could make them more robust, interpretable, and data-efficient. Still, it remains a time-consuming and challenging task. Existing f...
Evaluating Anomaly Detectors for Simulated Highly Imbalanced Industrial Classification Problems : Abstract: Machine learning offers potential solutions to current issues in industrial systems in areas such as quality control and predictive maintenance, but also faces unique barriers in industrial ...
Yahtzee: Reinforcement Learning Techniques for Stochastic Combinatorial Games : Abstract: Yahtzee is a classic dice game with a stochastic, combinatorial structure and delayed rewards, making it an interesting mid-scale RL benchmark. While an optimal policy for solitaire Yahtzee ...
Neural Brain Fields: A NeRF-Inspired Approach for Generating Nonexistent EEG Electrodes : Abstract: Electroencephalography (EEG) data present unique modeling challenges because recordings vary in length, exhibit very low signal to noise ratios, differ significantly across participants, dri...
Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI : Abstract: Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intell...
Personalized Spiking Neural Networks with Ferroelectric Synapses for EEG Signal Processing : Abstract: Electroencephalography (EEG)-based brain-computer interfaces (BCIs) are strongly affected by non-stationary neural signals that vary across sessions and individuals, limiting the generalizat...
Large Empirical Case Study: Go-Explore adapted for AI Red Team Testing : Abstract: Production LLM agents with tool-using capabilities require security testing despite their safety training. We adapt Go-Explore to evaluate GPT-4o-mini across 28 experimental runs spanning si...
Toward Large-Scale Photonics-Empowered AI Systems: From Physical Design Automation to System-Algorithm Co-Exploration : Abstract: In this work, we identify three considerations that are essential for realizing practical photonic AI systems at scale: (1) dynamic tensor operation support for modern models rather than onl...
Democratizing Electronic-Photonic AI Systems: An Open-Source AI-Infused Cross-Layer Co-Design and Design Automation Toolflow : Abstract: Photonics is becoming a cornerstone technology for high-performance AI systems and scientific computing, offering unparalleled speed, parallelism, and energy efficiency. Despite this promise...
MethConvTransformer: A Deep Learning Framework for Cross-Tissue Alzheimer's Disease Detection : Abstract: Alzheimer's disease (AD) is a multifactorial neurodegenerative disorder characterized by progressive cognitive decline and widespread epigenetic dysregulation in the brain. DNA methylation, ...
FCMBench: A Comprehensive Financial Credit Multimodal Benchmark for Real-world Applications : Abstract: As multimodal AI becomes widely used for credit risk assessment and document review, a domain-specific benchmark is urgently needed that (1) reflects documents and workflows specific to fina...
Online Finetuning Decision Transformers with Pure RL Gradients : Abstract: Decision Transformers (DTs) have emerged as a powerful framework for sequential decision making by formulating offline reinforcement learning (RL) as a sequence modeling problem. However, ex...
Hear the Heartbeat in Phases: Physiologically Grounded Phase-Aware ECG Biometrics : Abstract: Electrocardiography (ECG) is adopted for identity authentication in wearable devices due to its individual-specific characteristics and inherent liveness. However, existing methods often tre...
Understanding Emotion in Discourse: Recognition Insights and Linguistic Patterns for Generation : Abstract: While Emotion Recognition in Conversation (ERC) has achieved high accuracy, two critical gaps remain: a limited understanding of \textit{which} architectural choices actually matter, and a l...
SSI-GAN: Semi-Supervised Swin-Inspired Generative Adversarial Networks for Neuronal Spike Classification : Abstract: Mosquitos are the main transmissive agents of arboviral diseases. Manual classification of their neuronal spike patterns is very labor-intensive and expensive. Most available deep learning s...
Latent Flow Matching for Expressive Singing Voice Synthesis : Abstract: Conditional variational autoencoder (cVAE)-based singing voice synthesis provides efficient inference and strong audio quality by learning a score-conditioned prior and a recording-condition...
JP-TL-Bench: Anchored Pairwise LLM Evaluation for Bidirectional Japanese-English Translation : Abstract: We introduce JP-TL-Bench, a lightweight, open benchmark designed to guide the iterative development of Japanese-English translation systems. In this context, the challenge is often "which of...
GRIT -- Geometry-Aware PEFT with K-FACPreconditioning, Fisher-Guided Reprojection, andDynamic Rank Adaptation : Abstract: Parameter-efficient fine-tuning (PEFT) is the default way to adapt LLMs, but widely used LoRA and QLoRA are largely geometry-agnostic: they optimize in fixed, randomly oriented low-rank subs...
Neural Minimum Weight Perfect Matching for Quantum Error Codes : Abstract: Realizing the full potential of quantum computation requires Quantum Error Correction (QEC). QEC reduces error rates by encoding logical information across redundant physical qubits, enablin...
An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems : Abstract: The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in securing modern codebases. This paper prese...
Next Generation Intelligent Low-Altitude Economy Deployments: The O-RAN Perspective : Abstract: Despite the growing interest in low-altitude economy (LAE) applications, including UAV-based logistics and emergency response, fundamental challenges remain in orchestrating such missions ov...
Parallel Universes, Parallel Languages: A Comprehensive Study on LLM-based Multilingual Counterfactual Example Generation : Abstract: Counterfactuals refer to minimally edited inputs that cause a model's prediction to change, serving as a promising approach to explaining the model's behavior. Large language models (LLMs) e...
Beyond Perfect APIs: A Comprehensive Evaluation of LLM Agents Under Real-World API Complexity : Abstract: We introduce WildAGTEval, a benchmark designed to evaluate large language model (LLM) agents' function-calling capabilities under realistic API complexity. Unlike prior work that assumes an ...
FaithSCAN: Model-Driven Single-Pass Hallucination Detection for Faithful Visual Question Answering : Abstract: Faithfulness hallucinations in VQA occur when vision-language models produce fluent yet visually ungrounded answers, severely undermining their reliability in safety-critical applications. E...
Benchmarking Preprocessing and Integration Methods in Single-Cell Genomics : Abstract: Single-cell data analysis has the potential to revolutionize personalized medicine by characterizing disease-associated molecular changes at the single-cell level. Advanced single-cell multi...
Can Large Language Models Still Explain Themselves? Investigating the Impact of Quantization on Self-Explanations : Abstract: Quantization is widely used to accelerate inference and streamline the deployment of large language models (LLMs), yet its effects on self-explanations (SEs) remain unexplored. SEs, generate...
Towards Automated Differential Diagnosis of Skin Diseases Using Deep Learning and Imbalance-Aware Strategies : Abstract: As dermatological conditions become increasingly common and the availability of dermatologists remains limited, there is a growing need for intelligent tools to support both patients and cli...
DepFlow: Disentangled Speech Generation to Mitigate Semantic Bias in Depression Detection : Abstract: Speech is a scalable and non-invasive biomarker for early mental health screening. However, widely used depression datasets like DAIC-WOZ exhibit strong coupling between linguistic sentiment...
The Generative AI Paradox: GenAI and the Erosion of Trust, the Corrosion of Information Verification, and the Demise of Truth : Abstract: Generative AI (GenAI) now produces text, images, audio, and video that can be perceptually convincing at scale and at negligible marginal cost. While public debate often frames the associate...
VisNet: Efficient Person Re-Identification via Alpha-Divergence Loss, Feature Fusion and Dynamic Multi-Task Learning : Abstract: Person re-identification (ReID) is an extremely important area in both surveillance and mobile applications, requiring strong accuracy with minimal computational cost. State-of-the-art metho...
HarmoniAD: Harmonizing Local Structures and Global Semantics for Anomaly Detection : Abstract: Anomaly detection is crucial in industrial product quality inspection. Failing to detect tiny defects often leads to serious consequences. Existing methods face a structure-semantics trade-o...
Sparse Probabilistic Coalition Structure Generation: Bayesian Greedy Pursuit and $\ell_1$ Relaxations : Abstract: We study coalition structure generation (CSG) when coalition values are not given but must be learned from episodic observations. We model each episode as a sparse linear regression problem,...
Robust Uncertainty Quantification for Factual Generation of Large Language Models : Abstract: The rapid advancement of large language model(LLM) technology has facilitated its integration into various domains of professional and daily life. However, the persistent challenge of LLM ha...
Mapping Human Anti-collusion Mechanisms to Multi-agent AI : Abstract: As multi-agent AI systems become increasingly autonomous, evidence shows they can develop collusive strategies similar to those long observed in human markets and institutions. While human d...
BERT-JEPA: Reorganizing CLS Embeddings for Language-Invariant Semantics : Abstract: Joint Embedding Predictive Architectures (JEPA) are a novel self supervised training technique that have shown recent promise across domains. We introduce BERT-JEPA (BEPA), a training paradi...
PatchBlock: A Lightweight Defense Against Adversarial Patches for Embedded EdgeAI Devices : Abstract: Adversarial attacks pose a significant challenge to the reliable deployment of machine learning models in EdgeAI applications, such as autonomous driving and surveillance, which rely on reso...
In Line with Context: Repository-Level Code Generation via Context Inlining : Abstract: Repository-level code generation has attracted growing attention in recent years. Unlike function-level code generation, it requires the model to understand the entire repository, reasoning ...
Word Frequency Counting Based on Serverless MapReduce : Abstract: With the increasing demand for high-performance and high-efficiency computing, cloud computing, especially serverless computing, has gradually become a research hotspot in recent years, attr...
Engineering Attack Vectors and Detecting Anomalies in Additive Manufacturing : Abstract: Additive manufacturing (AM) is rapidly integrating into critical sectors such as aerospace, automotive, and healthcare. However, this cyber-physical convergence introduces new attack surface...
Do LLMs Judge Distantly Supervised Named Entity Labels Well? Constructing the JudgeWEL Dataset : Abstract: We present judgeWEL, a dataset for named entity recognition (NER) in Luxembourgish, automatically labelled and subsequently verified using large language models (LLM) in a novel pipeline. Bu...
Deep Delta Learning : Abstract: The efficacy of deep residual networks is fundamentally predicated on the identity shortcut connection. While this mechanism effectively mitigates the vanishing gradient problem, it imposes ...
E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models : Abstract: Recent reinforcement learning has enhanced the flow matching models on human preference alignment. While stochastic sampling enables the exploration of denoising directions, existing methods...
RMAAT: Astrocyte-Inspired Memory Compression and Replay for Efficient Long-Context Transformers : Abstract: The quadratic complexity of self-attention mechanism presents a significant impediment to applying Transformer models to long sequences. This work explores computational principles derived f...
Language as Mathematical Structure: Examining Semantic Field Theory Against Language Games : Abstract: Large language models (LLMs) offer a new empirical setting in which long-standing theories of linguistic meaning can be examined. This paper contrasts two broad approaches: social constructi...
Defensive M2S: Training Guardrail Models on Compressed Multi-turn Conversations : Abstract: Guardrail models are essential for ensuring the safety of Large Language Model (LLM) deployments, but processing full multi-turn conversation histories incurs significant computational cost....
Deep Networks Learn Deep Hierarchical Models : Abstract: We consider supervised learning with $n$ labels and show that layerwise SGD on residual networks can efficiently learn a class of hierarchical models. This model class assumes the existence ...
Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations : Abstract: Mixture-of-Experts (MoE) models achieve efficiency through sparse activation, but the role of geometric regularization in expert specialization remains unclear. We apply orthogonality loss t...
Neural Chains and Discrete Dynamical Systems : Abstract: We inspect the analogy between machine-learning (ML) applications based on the transformer architecture without self-attention, {\it neural chains} hereafter, and discrete dynamical systems ...
MAESTRO: Multi-Agent Evaluation Suite for Testing, Reliability, and Observability : Abstract: We present MAESTRO, an evaluation suite for the testing, reliability, and observability of LLM-based MAS. MAESTRO standardizes MAS configuration and execution through a unified interface, su...
Multi-Agent Coordinated Rename Refactoring : Abstract: The primary value of AI agents in software development lies in their ability to extend the developer's capacity for reasoning and action, not to supplant human involvement. To showcase how t...
MotionPhysics: Learnable Motion Distillation for Text-Guided Simulation : Abstract: Accurately simulating existing 3D objects and a wide variety of materials often demands expert knowledge and time-consuming physical parameter tuning to achieve the desired dynamic behavior....
Trajectory Guard -- A Lightweight, Sequence-Aware Model for Real-Time Anomaly Detection in Agentic AI : Abstract: Autonomous LLM agents generate multi-step action plans that can fail due to contextual misalignment or structural incoherence. Existing anomaly detection methods are ill-suited for this chal...
Probability-Aware Parking Selection : Abstract: Current parking navigation systems often underestimate total travel time by failing to account for the time spent searching for a parking space, which significantly affects user experience, ...
Optimizing LSTM Neural Networks for Resource-Constrained Retail Sales Forecasting: A Model Compression Study : Abstract: Standard LSTM(Long Short-Term Memory) neural networks provide accurate predictions for sales data in the retail industry, but require a lot of computing power. It can be challenging especial...
Reasoning in Action: MCTS-Driven Knowledge Retrieval for Large Language Models : Abstract: Large language models (LLMs) typically enhance their performance through either the retrieval of semantically similar information or the improvement of their reasoning capabilities. However,...
Finetuning Large Language Models for Automated Depression Screening in Nigerian Pidgin English: GENSCORE Pilot Study : Abstract: Depression is a major contributor to the mental-health burden in Nigeria, yet screening coverage remains limited due to low access to clinicians, stigma, and language barriers. Traditional t...
Toward a Physical Theory of Intelligence : Abstract: We present a physical theory of intelligence grounded in irreversible information processing in systems constrained by conservation laws. An intelligent system is modelled as a coupled agent...
A multi-algorithm approach for operational human resources workload balancing in a last mile urban delivery system : Abstract: Efficient workload assignment to the workforce is critical in last-mile package delivery systems. In this context, traditional methods of assigning package deliveries to workers based on geo...
Quantitative Rule-Based Strategy modeling in Classic Indian Rummy: A Metric Optimization Approach : Abstract: The 13-card variant of Classic Indian Rummy is a sequential game of incomplete information that requires probabilistic reasoning and combinatorial decision-making. This paper proposes a rule...
From Clay to Code: Typological and Material Reasoning in AI Interpretations of Iranian Pigeon Towers : Abstract: This study investigates how generative AI systems interpret the architectural intelligence embedded in vernacular form. Using the Iranian pigeon tower as a case study, the research tests thr...
The Agentic Leash: Extracting Causal Feedback Fuzzy Cognitive Maps with LLMs : Abstract: We design a large-language-model (LLM) agent that extracts causal feedback fuzzy cognitive maps (FCMs) from raw text. The causal learning or extraction process is agentic both because of the...
Mortar: Evolving Mechanics for Automatic Game Design : Abstract: We present Mortar, a system for autonomously evolving game mechanics for automatic game design. Game mechanics define the rules and interactions that govern gameplay, and designing them manu...
Ask, Clarify, Optimize: Human-LLM Agent Collaboration for Smarter Inventory Control : Abstract: Inventory management remains a challenge for many small and medium-sized businesses that lack the expertise to deploy advanced optimization methods. This paper investigates whether Large Lan...
Constructing a Neuro-Symbolic Mathematician from First Principles : Abstract: Large Language Models (LLMs) exhibit persistent logical failures in complex reasoning due to the lack of an internal axiomatic framework. We propose Mathesis, a neuro-symbolic architecture t...
Explicit Abstention Knobs for Predictable Reliability in Video Question Answering : Abstract: High-stakes deployment of vision-language models (VLMs) requires selective prediction, where systems abstain when uncertain rather than risk costly errors. We investigate whether confidence-...
An AI Monkey Gets Grapes for Sure -- Sphere Neural Networks for Reliable Decision-Making : Abstract: This paper compares three methodological categories of neural reasoning: LLM reasoning, supervised learning-based reasoning, and explicit model-based reasoning. LLMs remain unreliable and st...
FlashInfer-Bench: Building the Virtuous Cycle for AI-driven LLM Systems : Abstract: Recent advances show that large language models (LLMs) can act as autonomous agents capable of generating GPU kernels, but integrating these AI-generated kernels into real-world inference sy...
Will LLM-powered Agents Bias Against Humans? Exploring the Belief-Dependent Vulnerability : Abstract: LLM-empowered agents can exhibit not only demographic bias (e.g., gender, religion) but also intergroup bias triggered by minimal "us" versus "them" cues. When this intergroup boundary align...
ClinicalReTrial: A Self-Evolving AI Agent for Clinical Trial Protocol Optimization : Abstract: Clinical trial failure remains a central bottleneck in drug development, where minor protocol design flaws can irreversibly compromise outcomes despite promising therapeutics. Although cutti...
Multiagent Reinforcement Learning for Liquidity Games : Abstract: Making use of swarm methods in financial market modeling of liquidity, and techniques from financial analysis in swarm analysis, holds the potential to advance both research areas. In swarm ...
Bio-inspired Agentic Self-healing Framework for Resilient Distributed Computing Continuum Systems : Abstract: Human biological systems sustain life through extraordinary resilience, continually detecting damage, orchestrating targeted responses, and restoring function through self-healing. Inspired ...
Adaptive Causal Coordination Detection for Social Media: A Memory-Guided Framework with Semi-Supervised Learning : Abstract: Detecting coordinated inauthentic behavior on social media remains a critical and persistent challenge, as most existing approaches rely on superficial correlation analysis, employ static pa...
Can Semantic Methods Enhance Team Sports Tactics? A Methodology for Football with Broader Applications : Abstract: This paper explores how semantic-space reasoning, traditionally used in computational linguistics, can be extended to tactical decision-making in team sports. Building on the analogy between...
Progressive Ideation using an Agentic AI Framework for Human-AI Co-Creation : Abstract: The generation of truly novel and diverse ideas is important for contemporary engineering design, yet it remains a significant cognitive challenge for novice designers. Current 'single-spurt...
The Illusion of Insight in Reasoning Models : Abstract: Do reasoning models have "Aha!" moments? Prior work suggests that models like DeepSeek-R1-Zero undergo sudden mid-trace realizations that lead to accurate outputs, implying an intrinsic capa...
DA-DPO: Cost-efficient Difficulty-aware Preference Optimization for Reducing MLLM Hallucinations : Abstract: Direct Preference Optimization (DPO) has shown strong potential for mitigating hallucinations in Multimodal Large Language Models (MLLMs). However, existing multimodal DPO approaches often s...
A Vision-and-Knowledge Enhanced Large Language Model for Generalizable Pedestrian Crossing Behavior Inference : Abstract: Existing paradigms for inferring pedestrian crossing behavior, ranging from statistical models to supervised learning methods, demonstrate limited generalizability and perform inadequately o...

Research Sources: 270 | Generated: 1/5/2026