AI Research News Feeds for January 12th, 2026

AI RESEARCH PAPERS & ACADEMIC SOURCES

FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time : Abstract: We present a real-time tracking SLAM system that unifies efficient camera tracking with photorealistic feature-enriched mapping using 3D Gaussian Splatting (3DGS). Our main contribution is i...
FlyPose: Towards Robust Human Pose Estimation From Aerial Views : Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure insp...
Boosting Latent Diffusion Models via Disentangled Representation Alignment : Abstract: Latent Diffusion Models (LDMs) generate high-quality images by operating in a compressed latent space, typically obtained through image tokenizers such as Variational Autoencoders (VAEs). In...
GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras : Abstract: Accurate surround-view depth estimation provides a competitive alternative to laser-based sensors and is essential for 3D scene understanding in autonomous driving. While prior studies have ...
Kidney Cancer Detection Using 3D-Based Latent Diffusion Models : Abstract: In this work, we present a novel latent diffusion-based pipeline for 3D kidney anomaly detection on contrast-enhanced abdominal CT. The method combines Denoising Diffusion Probabilistic Mode...
Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation : Abstract: Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks such as mean teacher and dual...
Phase4DFD: Multi-Domain Phase-Aware Attention for Deepfake Detection : Abstract: Recent deepfake detection methods have increasingly explored frequency domain representations to reveal manipulation artifacts that are difficult to detect in the spatial domain. However, mo...
Adapting Vision Transformers to Ultra-High Resolution Semantic Segmentation with Relay Tokens : Abstract: Current approaches for segmenting ultra high resolution images either slide a window, thereby discarding global context, or downsample and lose fine detail. We propose a simple yet effective...
Context-Aware Decoding for Faithful Vision-Language Generation : Abstract: Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs), especially in open-ended tasks such as image ca...
WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation : Abstract: Domain-generalized retinal vessel segmentation is critical for automated ophthalmic diagnosis, yet faces significant challenges from domain shift induced by non-uniform illumination and vary...
Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation : Abstract: Deformable multi-contrast image registration is a challenging yet crucial task due to the complex, non-linear intensity relationships across different imaging contrasts. Conventional registr...
Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints : Abstract: Deepfake detection systems deployed in real-world environments are subject to adversaries capable of crafting imperceptible perturbations that degrade model performance. While adversarial tr...
Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network : Abstract: Group-equivariant convolutional neural networks (G-CNN) heavily rely on parameter sharing to increase CNN's data efficiency and performance. However, the parameter-sharing strategy greatly i...
RobustFormer: Noise-Robust Pre-training for images and videos : Abstract: While deep learning-based models like transformers, have revolutionized time-series and vision tasks, they remain highly susceptible to noise and often overfit on noisy patterns rather than ...
Pyramidal Adaptive Cross-Gating for Multimodal Detection : Abstract: Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different mo...
Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions : Abstract: As large language models (LLMs) transition from chat interfaces to integral components of stochastic pipelines across domains like educational assessment and synthetic data construction, the...
Towards Valid Student Simulation with Large Language Models : Abstract: This paper presents a conceptual and methodological framework for large language model (LLM) based student simulation in educational settings. The authors identify a core failure mode, terme...
The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence : Abstract: To reliably assist human decision-making, LLMs must maintain factual internal beliefs against misleading injections. While current models resist explicit misinformation, we uncover a fundame...
MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards : Abstract: Maintaining consistency in long-term dialogues remains a fundamental challenge for LLMs, as standard retrieval mechanisms often fail to capture the temporal evolution of historical states. W...
FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse : Abstract: The stateless architecture of Large Language Models inherently lacks the mechanism to preserve dynamic context, compelling agents to redundantly reprocess history to maintain long-horizon au...
CHisAgent: A Multi-Agent Framework for Event Taxonomy Construction in Ancient Chinese Cultural Systems : Abstract: Despite strong performance on many tasks, large language models (LLMs) show limited ability in historical and cultural reasoning, particularly in non-English contexts such as Chinese history...
Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism : Abstract: Parallel Speculative Decoding (PSD) accelerates traditional Speculative Decoding (SD) by overlapping draft generation with verification. However, it remains hampered by two fundamental chall...
Closing the Modality Reasoning Gap for Speech Large Language Models : Abstract: Although speech large language models have achieved notable progress, a substantial modality reasoning gap remains: their reasoning performance on speech inputs is markedly weaker than on te...
Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring : Abstract: This study addresses critical gaps in Automated Essay Scoring (AES) systems and Large Language Models (LLMs) with regard to their ability to effectively identify and score harmful essays. De...
Generation-Based and Emotion-Reflected Memory Update: Creating the KEEM Dataset for Better Long-Term Conversation : Abstract: In this work, we introduce the Keep Emotional and Essential Memory (KEEM) dataset, a novel generation-based dataset designed to enhance memory updates in long-term conversational systems. Un...
Can large language models interpret unstructured chat data on dynamic group decision-making processes? Evidence on joint destination choice : Abstract: Social activities result from complex joint activity-travel decisions between group members. While observing the decision-making process of these activities is difficult via traditional trav...
Data Augmented Pipeline for Legal Information Extraction and Reasoning : Abstract: In this paper, we propose a pipeline leveraging Large Language Models (LLMs) for data augmentation in Information Extraction tasks within the legal domain. The proposed method is both simple...
Text Detoxification in isiXhosa and Yor\`ub\'a: A Cross-Lingual Machine Learning Approach for Low-Resource African Languages : Abstract: Toxic language is one of the major barrier to safe online participation, yet robust mitigation tools are scarce for African languages. This study addresses this critical gap by investigating...
GIFT: Games as Informal Training for Generalizable LLMs : Abstract: While Large Language Models (LLMs) have achieved remarkable success in formal learning tasks such as mathematics and code generation, they still struggle with the "practical wisdom" and gene...
Afri-MCQA: Multimodal Cultural Question Answering for African Languages : Abstract: Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark c...
AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor : Abstract: We introduce AutoMonitor-Bench, the first benchmark designed to systematically evaluate the reliability of LLM-based misbehavior monitors across diverse tasks and failure modes. AutoMonitor-...
One Script Instead of Hundreds? On Pretraining Romanized Encoder Language Models : Abstract: Exposing latent lexical overlap, script romanization has emerged as an effective strategy for improving cross-lingual transfer (XLT) in multilingual language models (mLMs). Most prior work, ...
LLMs as Science Journalists: Supporting Early-stage Researchers in Communicating Their Science to the Public : Abstract: The scientific community needs tools that help early-stage researchers effectively communicate their findings and innovations to the public. Although existing general-purpose Large Language ...
Peek2: A Regex-free implementation of pretokenizers for Byte-level BPE : Abstract: Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-...
Left, Right, or Center? Evaluating LLM Framing in News Classification and Generation : Abstract: Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle...
Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs : Abstract: Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generat...
What do the metrics mean? A critical analysis of the use of Automated Evaluation Metrics in Interpreting : Abstract: With the growth of interpreting technologies, from remote interpreting and Computer-Aided Interpreting to automated speech translation and interpreting avatars, there is now a high demand fo...
FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG : Abstract: Retrieval-Augmented Generation (RAG) models are critically undermined by citation hallucinations, a deceptive failure where a model confidently cites a source that fails to support its claim...
iReasoner: Trajectory-Aware Intrinsic Reasoning Supervision for Self-Evolving Large Multimodal Models : Abstract: Recent work shows that large multimodal models (LMMs) can self-improve from unlabeled data via self-play and intrinsic feedback. Yet existing self-evolving frameworks mainly reward final out...
HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search : Abstract: Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures ...
Pantagruel: Unified Self-Supervised Encoders for French Text and Speech : Abstract: We release Pantagruel models, a new family of self-supervised encoder models for French text and speech. Instead of predicting modality-tailored targets such as textual tokens or speech unit...
Distilling Feedback into Memory-as-a-Tool : Abstract: We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-control...
Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks : Abstract: Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with...
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards : Abstract: Reinforcement learning (RL) has emerged as a critical technique for enhancing LLM-based deep search agents. However, existing approaches primarily rely on binary outcome rewards, which fail ...
TagRAG: Tag-guided Hierarchical Knowledge Graph Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation enhances language models by retrieving external knowledge to support informed and grounded responses. However, traditional RAG methods rely on fragment-level r...
ROAP: A Reading-Order and Attention-Prior Pipeline for Optimizing Layout Transformers in Key Information Extraction : Abstract: The efficacy of Multimodal Transformers in visually-rich document understanding (VrDU) is critically constrained by two inherent limitations: the lack of explicit modeling for logical readin...
MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding : Abstract: Long videos, ranging from minutes to hours, present significant challenges for current Multi-modal Large Language Models (MLLMs) due to their complex events, diverse scenes, and long-range d...
Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors : Abstract: Hieroglyphs, as logographic writing systems, encode rich semantic and cultural information within their internal structural composition. Yet, current advanced Large Language Models (LLMs) an...
The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era : Abstract: Driven by the rapid advancement of Large Language Models (LLMs), particularly Audio-LLMs and Omni-models, spoken dialogue systems have evolved significantly, progressively narrowing the gap ...
Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs : Abstract: Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step tow...
Expression Syntax Information Bottleneck for Math Word Problems : Abstract: Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original t...
Coding the Visual World: From Image to Simulation Using Vision Language Models : Abstract: The ability to construct mental models of the world is a central aspect of understanding. Similarly, visual understanding can be viewed as the ability to construct a representative model of ...
MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments : Abstract: We present MOSAIC-GS, a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos using Gaussian Splatting. Monocular...
EdgeLDR: Quaternion Low-Displacement Rank Neural Networks for Edge-Efficient Deep Learning : Abstract: Deploying deep neural networks on edge devices is often limited by the memory traffic and compute cost of dense linear operators. While quaternion neural networks improve parameter efficienc...
Sketch&Patch++: Efficient Structure-Aware 3D Gaussian Representation : Abstract: We observe that Gaussians exhibit distinct roles and characteristics analogous to traditional artistic techniques -- like how artists first sketch outlines before filling in broader areas wi...
TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection : Abstract: Infrared small target detection (ISTD) remains a long-standing challenge due to weak signal contrast, limited spatial extent, and cluttered backgrounds. Despite performance improvements from...
Multi-Image Super Resolution Framework for Detection and Analysis of Plant Roots : Abstract: Understanding plant root systems is critical for advancing research in soil-plant interactions, nutrient uptake, and overall plant health. However, accurate imaging of roots in subterranean ...
Hippocampal Atrophy Patterns Across the Alzheimer's Disease Spectrum: A Voxel-Based Morphometry Analysis : Abstract: Alzheimer's disease (AD) and mild cognitive impairment (MCI) are associated with progressive gray matter loss, particularly in medial temporal structures. In this study, CAT12/SPM12 voxel-ba...
GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting : Abstract: We introduce GaussianSwap, a novel video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video while transferring identity from a source image...
SAS-VPReID: A Scale-Adaptive Framework with Shape Priors for Video-based Person Re-Identification at Extreme Far Distances : Abstract: Video-based Person Re-IDentification (VPReID) aims to retrieve the same person from videos captured by non-overlapping cameras. At extreme far distances, VPReID is highly challenging due to ...
DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion : Abstract: Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content. Although existing approaches base...
MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation : Abstract: Existing multi-object image generation methods face difficulties in achieving precise alignment between localized image generation regions and their corresponding semantics based on language...
One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection : Abstract: Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios, following zero- and few-shot paradigms without any dat...
What's Left Unsaid? Detecting and Correcting Misleading Omissions in Multimodal News Previews : Abstract: Even when factually correct, social-media news previews (image-headline pairs) can induce interpretation drift: by selectively omitting crucial context, they lead readers to form judgments t...
Towards Generalized Multi-Image Editing for Unified Multimodal Models : Abstract: Unified Multimodal Models (UMMs) integrate multimodal understanding and generation, yet they are limited to maintaining visual consistency and disambiguating visual cues when referencing det...
Orient Anything V2: Unifying Orientation and Rotation Understanding : Abstract: This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anythin...
Generalizable and Adaptive Continual Learning Framework for AI-generated Image Detection : Abstract: The malicious misuse and widespread dissemination of AI-generated images pose a significant threat to the authenticity of online information. Current detection methods often struggle to gene...
Learning Geometric Invariance for Gait Recognition : Abstract: The goal of gait recognition is to extract identity-invariant features of an individual under various gait conditions, e.g., cross-view and cross-clothing. Most gait models strive to implici...
LatentVLA: Efficient Vision-Language Models for Autonomous Driving via Latent Action Prediction : Abstract: End-to-end autonomous driving models trained on largescale datasets perform well in common scenarios but struggle with rare, long-tail situations due to limited scenario diversity. Recent Vi...
SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving : Abstract: Recent end-to-end autonomous driving approaches have leveraged Vision-Language Models (VLMs) to enhance planning capabilities in complex driving scenarios. However, VLMs are inherently train...
SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More : Abstract: Charts are high-density visual carriers of complex data and medium for information extraction and analysis. Due to the need for precise and complex visual reasoning, automated chart understa...
Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation : Abstract: Generating high-quality 3D characters from single images remains a significant challenge in digital content creation, particularly due to complex body poses and self-occlusion. In this paper...
TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment : Abstract: Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation....
Transforming User Defined Criteria into Explainable Indicators with an Integrated LLM AHP System : Abstract: Evaluating complex texts across domains requires converting user defined criteria into quantitative, explainable indicators, which is a persistent challenge in search and recommendation syst...
Studying Illustrations in Manuscripts: An Efficient Deep-Learning Approach : Abstract: The recent Artificial Intelligence (AI) revolution has opened transformative possibilities for the humanities, particularly in unlocking the visual content embedded in historical manuscripts...
Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings : Abstract: The ubiquity of payment networks generates vast transactional data encoding rich consumer and merchant behavioral patterns. Recent foundation models for transaction analysis process tabular ...
On the use of case estimate and transactional payment data in neural networks for individual loss reserving : Abstract: The use of neural networks trained on individual claims data has become increasingly popular in the actuarial reserving literature. We consider how to best input historical payment data in n...
Channel Selected Stratified Nested Cross Validation for Clinically Relevant EEG Based Parkinsons Disease Detection : Abstract: The early detection of Parkinsons disease remains a critical challenge in clinical neuroscience, with electroencephalography offering a noninvasive and scalable pathway toward population lev...
A universal vision transformer for fast calorimeter simulations : Abstract: The high-dimensional complex nature of detectors makes fast calorimeter simulations a prime application for modern generative machine learning. Vision transformers (ViTs) can emulate the Gea...
Machine learning assisted state prediction of misspecified linear dynamical system via modal reduction : Abstract: Accurate prediction of structural dynamics is imperative for preserving digital twin fidelity throughout operational lifetimes. Parametric models with fixed nominal parameters often omit cri...
Optimizing Digital Adjudication through Social Network Analysis: An Empirical Study of Credit Card Disputes in Beijing : Abstract: Amid the rapid digitalization of judicial systems, the integration of big data into adjudication remains underexplored, particularly in uncovering the structural logic of legal applications....
Generalized Canonical Polyadic Tensor Decompositions with General Symmetry : Abstract: Canonical Polyadic (CP) tensor decomposition is a workhorse algorithm for discovering underlying low-dimensional structure in tensor data. This is accomplished in conventional CP decompositi...
Archetypal cases for questionnaires with nominal multiple choice questions : Abstract: Archetypal analysis serves as an exploratory tool that interprets a collection of observations as convex combinations of pure (extreme) patterns. When these patterns correspond to actual obs...
Dynamic Inclusion and Bounded Multi-Factor Tilts for Robust Portfolio Construction : Abstract: This paper proposes a portfolio construction framework designed to remain robust under estimation error, non-stationarity, and realistic trading constraints. The methodology combines dynamic...
A brief note on learning problem with global perspectives : Abstract: This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e...
What Functions Does XGBoost Learn? : Abstract: This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understandin...
Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models : Abstract: Large Language Models (LLMs) face a significant threat from multi-turn jailbreak attacks, where adversaries progressively steer conversations to elicit harmful outputs. However, the practica...
Autonomous Probe Microscopy with Robust Bag-of-Features Multi-Objective Bayesian Optimization: Pareto-Front Mapping of Nanoscale Structure-Property Trade-Offs : Abstract: Combinatorial materials libraries are an efficient route to generate large families of candidate compositions, but their impact is often limited by the speed and depth of characterization an...
DNATokenizer: A GPU-First Byte-to-Identifier Tokenizer for High-Throughput DNA Language Models : Abstract: Tokenization sits at the boundary between high-throughput genomic input and GPU compute, posing challenges in both algorithm design and system throughput. Overlapping k-mer tokenization can ...
Autonomous Discovery of the Ising Model's Critical Parameters with Reinforcement Learning : Abstract: Traditional methods for determining critical parameters are often influenced by human factors. This research introduces a physics-inspired adaptive reinforcement learning framework that enab...
Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation : Abstract: Convolutional Neural Networks (CNNs) are known to exhibit a strong texture bias, favoring local patterns over global shape information--a tendency inherent to their convolutional architectur...
SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes : Abstract: Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This r...
Compressing image encoders via latent distillation : Abstract: Deep learning models for image compression often face practical limitations in hardware-constrained applications. Although these models achieve high-quality reconstructions, they are typical...
Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs : Abstract: As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on ma...
Tracing Stereotypes in Pre-trained Transformers: From Biased Neurons to Fairer Models : Abstract: The advent of transformer-based language models has reshaped how AI systems process and generate text. In software engineering (SE), these models now support diverse activities, accelerating...
ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers : Abstract: Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods re...
Simplify-This: A Comparative Analysis of Prompt-Based and Fine-Tuned LLMs : Abstract: Large language models (LLMs) enable strong text generation, and in general there is a practical tradeoff between fine-tuning and prompt engineering. We introduce Simplify-This, a comparative...
Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning : Abstract: Sequential Bayesian optimal experimental design (SBOED) for PDE-governed inverse problems is computationally challenging, especially for infinite-dimensional random field parameters. High-fi...
Multi-task Modeling for Engineering Applications with Sparse Data : Abstract: Modern engineering and scientific workflows often require simultaneous predictions across related tasks and fidelity levels, where high-fidelity data is scarce and expensive, while low-fidel...
A Critical Examination of Active Learning Workflows in Materials Science : Abstract: Active learning (AL) plays a critical role in materials science, enabling applications such as the construction of machine-learning interatomic potentials for atomistic simulations and the o...
DeePM: Regime-Robust Deep Learning for Systematic Macro Portfolio Management : Abstract: We propose DeePM (Deep Portfolio Manager), a structured deep-learning macro portfolio manager trained end-to-end to maximize a robust, risk-adjusted utility. DeePM addresses three fundamenta...
AWaRe-SAC: Proactive Slice Admission Control under Weather-Induced Capacity Uncertainty : Abstract: As emerging applications demand higher throughput and lower latencies, operators are increasingly deploying millimeter-wave (mmWave) links within x-haul transport networks, spanning fronthau...
CyberGFM: Graph Foundation Models for Lateral Movement Detection in Enterprise Networks : Abstract: Representing networks as a graph and training a link prediction model using benign connections is an effective method of anomaly-based intrusion detection. Existing works using this techniqu...
Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem : Abstract: We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical exc...
Manifold limit for the training of shallow graph convolutional neural networks : Abstract: We study the discrete-to-continuum consistency of the training of shallow graph convolutional neural networks (GCNNs) on proximity graphs of sampled point clouds under a manifold assumption....
Utilising physics-guided deep learning to overcome data scarcity : Abstract: Deep learning (DL) relies heavily on data, and the quality of data influences its performance significantly. However, obtaining high-quality, well-annotated datasets can be challenging or ev...
Simple Mechanisms for Representing, Indexing and Manipulating Concepts : Abstract: Supervised and unsupervised learning using deep neural networks typically aims to exploit the underlying structure in the training data; this structure is often explained using a latent gene...
FedScalar: A Communication efficient Federated Learning : Abstract: Federated learning (FL) has gained considerable popularity for distributed machine learning due to its ability to preserve the privacy of participating agents by eliminating the need for dat...
Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models : Abstract: In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider ...
Data-Driven Approach to Capitation Reform in Rwanda : Abstract: As part of Rwanda's transition toward universal health coverage, the national Community-Based Health Insurance (CBHI) scheme is moving from retrospective fee-for-service reimbursements to pr...
The Table of Media Bias Elements: A sentence-level taxonomy of media bias types and propaganda techniques : Abstract: Public debates about "left-" or "right-wing" news overlook the fact that bias is usually conveyed by concrete linguistic manoeuvres that transcend any single political spectrum. We therefore...
Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection : Abstract: Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range o...
Glitter: Visualizing Lexical Surprisal for Readability in Administrative Texts : Abstract: This work investigates how measuring information entropy of text can be used to estimate its readability. We propose a visualization framework that can be used to approximate information ent...
LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection : Abstract: Automated seizure detection from electroencephalography (EEG) remains difficult due to the large variability of seizure dynamics across patients, recording conditions, and clinical settings....
Rapid Adaptation of SpO2 Estimation to Wearable Devices via Transfer Learning on Low-Sampling-Rate PPG : Abstract: Blood oxygen saturation (SpO2) is a vital marker for healthcare monitoring. Traditional SpO2 estimation methods often rely on complex clinical calibration, making them unsuitable for low-pow...
Generalizable Blood Pressure Estimation from Multi-Wavelength PPG Using Curriculum-Adversarial Learning : Abstract: Accurate and generalizable blood pressure (BP) estimation is vital for the early detection and management of cardiovascular diseases. In this study, we enforce subject-level data splitting o...
Improving User Experience with Personalized Review Ranking and Summarization : Abstract: Online consumer reviews play a crucial role in guiding purchase decisions by offering insights into product quality, usability, and performance. However, the increasing volume of user-genera...
Simulating Multi-Stakeholder Decision-Making with Generative Agents in Urban Planning : Abstract: Reaching consensus in urban planning is a complex process often hindered by prolonged negotiations, trade-offs, power dynamics, and competing stakeholder interests, resulting in inefficienci...
Explainable AI needs formalization : Abstract: The field of "explainable artificial intelligence" (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current sta...
Towards AI-Native Software Engineering (SE 3.0): A Vision and a Challenge Roadmap : Abstract: The rise of AI-assisted software engineering (SE 2.0), powered by Foundation Models (FMs) and FM-powered coding assistants, has shown promise in improving developer productivity. However, it...
Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection : Abstract: We present a novel approach to automatically generate non-trivial task-specific synthetic datasets for hallucination detection. Our approach features a two-step generation-selection pipeline...
TIME: Temporally Intelligent Meta-reasoning Engine for Context Triggered Explicit Reasoning : Abstract: Reasoning oriented large language models often expose explicit "thinking" as long, turn-global traces at the start of every response, either always on or toggled externally at inference time...
Ontology Neural Networks for Topologically Conditioned Constraint Satisfaction : Abstract: Neuro-symbolic reasoning systems face fundamental challenges in maintaining semantic coherence while satisfying physical and logical constraints. Building upon our previous work on Ontology ...
When the Server Steps In: Calibrated Updates for Fair Federated Learning : Abstract: Federated learning (FL) has emerged as a transformative distributed learning paradigm, enabling multiple clients to collaboratively train a global model under the coordination of a central s...
GlyRAG: Context-Aware Retrieval-Augmented Framework for Blood Glucose Forecasting : Abstract: Accurate forecasting of blood glucose from CGM is essential for preventing dysglycemic events, thus enabling proactive diabetes management. However, current forecasting models treat blood gl...
The Kernel Manifold: A Geometric Approach to Gaussian Process Model Selection : Abstract: Gaussian Process (GP) regression is a powerful nonparametric Bayesian framework, but its performance depends critically on the choice of covariance kernel. Selecting an appropriate kernel is...
Inverting Non-Injective Functions with Twin Neural Network Regression : Abstract: Non-injective functions are not invertible. However, non-injective functions can be restricted to sub-domains on which they are locally injective and surjective and thus invertible if the di...
Imitation Learning for Combinatorial Optimisation under Uncertainty : Abstract: Imitation learning (IL) provides a data-driven framework for approximating policies for large-scale combinatorial optimisation problems formulated as sequential decision problems (SDPs), whe...
DynaSTy: A Framework for SpatioTemporal Node Attribute Prediction in Dynamic Graphs : Abstract: Accurate multistep forecasting of node-level attributes on dynamic graphs is critical for applications ranging from financial trust networks to biological networks. Existing spatiotemporal g...
Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning : Abstract: Knowledge distillation (KD) has the potential to accelerate MARL by employing a centralized teacher for decentralized students but faces key bottlenecks. Specifically, there are (1) challeng...
Efficient Inference for Noisy LLM-as-a-Judge Evaluation : Abstract: Large language models (LLMs) are increasingly used as automatic evaluators of generative AI outputs, a paradigm often referred to as "LLM-as-a-judge." In practice, LLM judges are imperfect p...
Prediction of Fault Slip Tendency in CO${_2}$ Storage using Data-space Inversion : Abstract: Accurately assessing the potential for fault slip is essential in many subsurface operations. Conventional model-based history matching methods, which entail the generation of posterior geom...
RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models : Abstract: Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creat...
MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization : Abstract: Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such...
Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection : Abstract: Fine-tuning large language models (LLMs) using standard first-order (FO) optimization often drives training toward sharp, poorly generalizing minima. Conversely, zeroth-order (ZO) methods of...
Toward an Integrated Cross-Urban Accident Prevention System: A Multi-Task Spatial-Temporal Learning Framework for Urban Safety Management : Abstract: The development of a cross-city accident prevention system is particularly challenging due to the heterogeneity, inconsistent reporting, and inherently clustered, sparse, cyclical, and noisy...
Buffered AUC maximization for scoring systems via mixed-integer optimization : Abstract: A scoring system is a linear classifier composed of a small number of explanatory variables, each assigned a small integer coefficient. This system is highly interpretable and allows predict...
Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow : Abstract: The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational c...
Poisson Hyperplane Processes with Rectified Linear Units : Abstract: Neural networks have shown state-of-the-art performances in various classification and regression tasks. Rectified linear units (ReLU) are often used as activation functions for the hidden l...
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning : Abstract: We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale t...
Good Allocations from Bad Estimates : Abstract: Conditional average treatment effect (CATE) estimation is the de facto gold standard for targeting a treatment to a heterogeneous population. The method estimates treatment effects up to an ...
Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) offers a promising framework for optimizing large language models in reasoning tasks. However, existing RLVR algorithms focus on differe...
Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks : Abstract: In recent years, large language models (LLMs) have demonstrated significant potential in complex reasoning tasks like mathematical problem-solving. However, existing research predominantly r...
Continual Learning of Achieving Forgetting-free and Positive Knowledge Transfer : Abstract: Existing research on continual learning (CL) of a sequence of tasks focuses mainly on dealing with catastrophic forgetting (CF) to balance the learning plasticity of new tasks and the memory...
From Global to Local: Cluster-Aware Learning for Wi-Fi Fingerprinting Indoor Localisation : Abstract: Wi-Fi fingerprinting remains one of the most practical solutions for indoor positioning, however, its performance is often limited by the size and heterogeneity of fingerprint datasets, stro...
Do Sparse Autoencoders Identify Reasoning Features in Language Models? : Abstract: We investigate whether sparse autoencoders (SAEs) identify genuine reasoning features in large language models (LLMs). Starting from features selected using standard contrastive activation m...
FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching : Abstract: Traditional post-training quantization (PTQ) is considered an effective approach to reduce model size and accelerate inference of large-scale language models (LLMs). However, existing low-ra...
Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer : Abstract: Algorithm extraction aims to synthesize executable programs directly from models trained on specific algorithmic tasks, enabling de novo algorithm discovery without relying on human-written ...
Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers : Abstract: Transformers require positional encodings to represent sequence order, yet most prior work focuses on designing new positional encodings rather than examining how positional information is f...
Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem : Abstract: Motivated by the growing interest in representation learning approaches that uncover the latent structure of high-dimensional data, this work proposes new algorithms for reconstruction-based...
Detecting Autism Spectrum Disorder with Deep Eye Movement Features : Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and behavioral patterns. Eye movement data offers a non-invasive diagnostic ...
A Dual Pipeline Machine Learning Framework for Automated Multi Class Sleep Disorder Screening Using Hybrid Resampling and Ensemble Learning : Abstract: Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical s...
A New Family of Poisson Non-negative Matrix Factorization Methods Using the Shifted Log Link : Abstract: Poisson non-negative matrix factorization (NMF) is a widely used method to find interpretable "parts-based" decompositions of count data. While many variants of Poisson NMF exist, existing m...
GlueNN: gluing patchwise analytic solutions with neural networks : Abstract: In many problems in physics and engineering, one encounters complicated differential equations with strongly scale-dependent terms for which exact analytical or numerical solutions are not a...
Distilling Lightweight Domain Experts from Large ML Models by Identifying Relevant Subspaces : Abstract: Knowledge distillation involves transferring the predictive capabilities of large, high-performing AI models (teachers) to smaller models (students) that can operate in environments with lim...
Prophet as a Repro ducible Forecasting Framework: A Methodological Guide for Business and Financial Analytics : Abstract: Reproducibility remains a persistent challenge in forecasting research and practice, particularly in business and financial analytics where forecasts inform high-stakes decisions. Traditiona...
On the Robustness of Age for Learning-Based Wireless Scheduling in Unknown Environments : Abstract: The constrained combinatorial multi-armed bandit model has been widely employed to solve problems in wireless networking and related areas, including the problem of wireless scheduling for t...
Community-Based Model Sharing and Generalisation: Anomaly Detection in IoT Temperature Sensor Networks : Abstract: The rapid deployment of Internet of Things (IoT) devices has led to large-scale sensor networks that monitor environmental and urban phenomena in real time. Communities of Interest (CoIs) pr...
HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors : Abstract: Recent advances in software vulnerability detection have been driven by Language Model (LM)-based approaches. However, these models remain vulnerable to adversarial attacks that exploit lexi...
Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders : Abstract: Dual and cross encoders have long been mainstays of information retrieval (IR), but are being challenged by the emergent capabilities of LLMs. An LLM-based approach we term pointwise generat...
ACR: Adaptive Context Refactoring via Context Refactoring Operators for Multi-Turn Dialogue : Abstract: Large Language Models (LLMs) have shown remarkable performance in multi-turn dialogue. However, in multi-turn dialogue, models still struggle to stay aligned with what has been established e...
PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data Structures Across Nodes : Abstract: Time series are highly valuable and rarely shareable across nodes, making federated learning a promising paradigm to leverage distributed temporal data. However, different sampling standards...
Transformer Is Inherently a Causal Learner : Abstract: We reveal that transformers trained in an autoregressive manner naturally encode time-delayed causal structures in their learned representations. When predicting future values in multivariat...
Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training : Abstract: Recent advancements in single-cell multi-omics, particularly RNA-seq, have provided profound insights into cellular heterogeneity and gene regulation. While pre-trained language model (PLM) ...
A Framework for Personalized Persuasiveness Prediction via Context-Aware User Profiling : Abstract: Estimating the persuasiveness of messages is critical in various applications, from recommender systems to safety assessment of LLMs. While it is imperative to consider the target persuadee'...
Stephanie2: Thinking, Waiting, and Making Decisions Like Humans in Step-by-Step AI Social Chat : Abstract: Instant-messaging human social chat typically progresses through a sequence of short messages. Existing step-by-step AI chatting systems typically split a one-shot generation into multiple m...
Advancing credit mobility through stakeholder-informed AI design and adoption : Abstract: Transferring from a 2-year to a 4-year college is crucial for socioeconomic mobility, yet students often face challenges ensuring their credits are fully recognized, leading to delays in the...
AGDC: Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces : Abstract: Transformer-based autoregressive models excel in data generation but are inherently constrained by their reliance on discretized tokens, which limits their ability to represent continuous va...
Joint Optimization of Neural Autoregressors via Scoring rules : Abstract: Non-parametric distributional regression has achieved significant milestones in recent years. Among these, the Tabular Prior-Data Fitted Network (TabPFN) has demonstrated state-of-the-art pe...
AIBoMGen: Generating an AI Bill of Materials for Secure, Transparent, and Compliant Model Training : Abstract: The rapid adoption of complex AI systems has outpaced the development of tools to ensure their transparency, security, and regulatory compliance. In this paper, the AI Bill of Materials (AIB...
Multimodal In-context Learning for ASR of Low-resource Languages : Abstract: Automatic speech recognition (ASR) still covers only a small fraction of the world's languages, mainly due to supervised data scarcity. In-context learning (ICL) with large language models (...
Visualising Information Flow in Word Embeddings with Diffusion Tensor Imaging : Abstract: Understanding how large language models (LLMs) represent natural language is a central challenge in natural language processing (NLP) research. Many existing methods extract word embeddings ...
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations : Abstract: Hyper-Connections (HC) generalizes residual connections by introducing dynamic residual matrices that mix information across multiple residual streams, accelerating convergence in deep neura...
The Echo Chamber Multi-Turn LLM Jailbreak : Abstract: The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security chal...
Analysing Differences in Persuasive Language in LLM-Generated Text: Uncovering Stereotypical Gender Patterns : Abstract: Large language models (LLMs) are increasingly used for everyday communication tasks, including drafting interpersonal messages intended to influence and persuade. Prior work has shown that L...
VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit : Abstract: LLM agents operating in open environments face escalating risks from indirect prompt injection, particularly within the tool stream where manipulated metadata and runtime feedback hijack exe...
Variational Autoencoders for P-wave Detection on Strong Motion Earthquake Spectrograms : Abstract: Accurate P-wave detection is critical for earthquake early warning, yet strong-motion records pose challenges due to high noise levels, limited labeled data, and complex waveform characteris...
Adaptive Disentangled Representation Learning for Incomplete Multi-View Multi-Label Classification : Abstract: Multi-view multi-label learning frequently suffers from simultaneous feature absence and incomplete annotations, due to challenges in data acquisition and cost-intensive supervision. To tack...
SAFE: Secure and Accurate Federated Learning for Privacy-Preserving Brain-Computer Interfaces : Abstract: Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) are widely adopted due to their efficiency and portability; however, their decoding algorithms still face multiple challenge...
Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning : Abstract: Accurate drug-target interaction (DTI) prediction is essential for computational drug discovery, yet existing models often rely on single-modality predefined molecular descriptors or sequenc...
EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis : Abstract: Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, a...
SceneFoundry: Generating Interactive Infinite 3D Worlds : Abstract: The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existin...
Decoding Workload and Agreement From EEG During Spoken Dialogue With Conversational AI : Abstract: Passive brain-computer interfaces offer a potential source of implicit feedback for alignment of large language models, but most mental state decoding has been done in controlled tasks. This...
Influence of Parallelism in Vector-Multiplication Units on Correlation Power Analysis : Abstract: The use of neural networks in edge devices is increasing, which introduces new security challenges related to the neural networks' confidentiality. As edge devices often offer physical acces...
Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning : Abstract: This paper presents a comprehensive approach to singularity detection and avoidance in UR10 robotic arm path planning through the integration of fuzzy logic safety systems and reinforcement ...
DexterCap: An Affordable and Automated System for Capturing Dexterous Hand-Object Manipulation : Abstract: Capturing fine-grained hand-object interactions is challenging due to severe self-occlusion from closely spaced fingers and the subtlety of in-hand manipulation motions. Existing optical mot...
Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals : Abstract: Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals ...
Router-Suggest: Dynamic Routing for Multimodal Auto-Completion in Visually-Grounded Dialogs : Abstract: Real-time multimodal auto-completion is essential for digital assistants, chatbots, design tools, and healthcare consultations, where user inputs rely on shared visual context. We introduce ...
LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting : Abstract: We propose a novel framework for decomposing arbitrarily posed humans into animatable multi-layered 3D human avatars, separating the body and garments. Conventional single-layer reconstructi...
CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning : Abstract: Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference...
IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck : Abstract: Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Model (LLM) reasoning have been hindered by a persistent challenge: exploration collapse. The sema...
Continual-learning for Modelling Low-Resource Languages from Large Language Models : Abstract: Modelling a language model for a multi-lingual scenario includes several potential challenges, among which catastrophic forgetting is the major challenge. For example, small language models ...
Gender Bias in LLMs: Preliminary Evidence from Shared Parenting Scenario in Czech Family Law : Abstract: Access to justice remains limited for many people, leading laypersons to increasingly rely on Large Language Models (LLMs) for legal self-help. Laypeople use these tools intuitively, which m...
An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift : Abstract: Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior w...
Can AI mediation improve democratic deliberation? : Abstract: The strength of democracy lies in the free and equal exchange of diverse viewpoints. Living up to this ideal at scale faces inherent tensions: broad participation, meaningful deliberation, a...
Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency : Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextua...
Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates : Abstract: As machine learning models become increasingly embedded in societal infrastructure, auditing them for bias is of growing importance. However, in real-world deployments, auditing is complicat...
Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset : Abstract: On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, inc...
Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS & DOT from the lab to the everyday world : Abstract: Functional near-infrared spectroscopy (fNIRS) and diffuse optical tomography (DOT) are rapidly evolving toward wearable, multimodal, and data-driven, AI-supported neuroimaging in the everyda...
Can We Predict Before Executing Machine Learning Agents? : Abstract: Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Exe...
Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets : Abstract: Background: Pancreatic cancer is one of the most aggressive cancers, with poor survival rates. Endoscopic ultrasound (EUS) is a key diagnostic modality, but its effectiveness is constrained ...
VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction : Abstract: Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale....
The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning : Abstract: Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective...
AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs : Abstract: Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a p...
MIPO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation Learning : Abstract: Representation learning on electronic health records (EHRs) plays a vital role in downstream medical prediction tasks. Although natural language processing techniques, such as recurrent neur...
Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents : Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal stat...
Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries : Abstract: Many large language models (LLMs) are trained on a massive body of knowledge present on the Internet. Darth Vecdor (DV) was designed to extract this knowledge into a structured, terminology-...
Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models : Abstract: Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models ...
AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms? : Abstract: Reasoning ability has become a central focus in the advancement of Large Reasoning Models (LRMs). Although notable progress has been achieved on several reasoning benchmarks such as MATH500 ...
How to Set the Batch Size for Large-Scale Pre-training? : Abstract: The concept of Critical Batch Size, as pioneered by OpenAI, has long served as a foundational principle for large-scale pre-training. However, with the paradigm shift towards the Warmup-Stab...
Large language models can effectively convince people to believe conspiracies : Abstract: Large language models (LLMs) have been shown to be persuasive across a variety of contexts. But it remains unclear whether this persuasive power advantages truth over falsehood, or if LLMs c...
MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents : Abstract: We present MineNPC-Task, a user-authored benchmark and evaluation harness for testing memory-aware, mixed-initiative LLM agents in open-world Minecraft. Rather than relying on synthetic prom...
An Evaluation on Large Language Model Outputs: Discourse and Memorization : Abstract: We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available ...
ART: Adaptive Reasoning Trees for Explainable Claim Verification : Abstract: Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes...
PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering : Abstract: Answering real-world open-domain multi-hop questions over massive corpora is a critical challenge in Retrieval-Augmented Generation (RAG) systems. Recent research employs reinforcement learn...
MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis : Abstract: Understanding urban environment change is essential for sustainable development. However, current approaches, particularly remote sensing change detection, often rely on rigid, single-modal ...
The Evaluation Gap in Medicine, AI and LLMs: Navigating Elusive Ground Truth & Uncertainty via a Probabilistic Paradigm : Abstract: Benchmarking the relative capabilities of AI systems, including Large Language Models (LLMs) and Vision Models, typically ignores the impact of uncertainty in the underlying ground truth ans...
Explainable AI: Learning from the Learners : Abstract: Artificial intelligence now outperforms humans in several scientific and engineering tasks, yet its internal representations often remain opaque. In this Perspective, we argue that explainab...
Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making : Abstract: One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a ...
WildSci: Advancing Scientific Reasoning from In-the-Wild Literature : Abstract: Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily avai...
Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models : Abstract: Standard safety alignment optimizes Large Language Models (LLMs) for universal helpfulness and honesty, effectively instilling a rigid "Boy Scout" morality. While robust for general-purpose ...
Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection : Abstract: E-commerce platforms and payment solution providers face increasingly sophisticated fraud schemes, ranging from identity theft and account takeovers to complex money laundering operations th...
A Causal Information-Flow Framework for Unbiased Learning-to-Rank : Abstract: In web search and recommendation systems, user clicks are widely used to train ranking models. However, click data is heavily biased, i.e., users tend to click higher-ranked items (position ...
Cumulative Path-Level Semantic Reasoning for Inductive Knowledge Graph Completion : Abstract: Conventional Knowledge Graph Completion (KGC) methods aim to infer missing information in incomplete Knowledge Graphs (KGs) by leveraging existing information, which struggle to perform effe...
GenCtrl -- A Formal Controllability Toolkit for Generative Models : Abstract: As generative models become ubiquitous, there is a critical need for fine-grained control over the generation process. Yet, while controlled generation methods from prompting to fine-tuning ...
HAG: Hierarchical Demographic Tree-based Agent Generation for Topic-Adaptive Simulation : Abstract: High-fidelity agent initialization is crucial for credible Agent-Based Modeling across diverse domains. A robust framework should be Topic-Adaptive, capturing macro-level joint distributions...
CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space : Abstract: Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid...
Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models : Abstract: Despite the success of test-time scaling, Large Reasoning Models (LRMs) frequently encounter repetitive loops that lead to computational waste and inference failure. In this paper, we identi...
Logic-Parametric Neuro-Symbolic NLI: Controlling Logical Formalisms for Verifiable LLM Reasoning : Abstract: Large language models (LLMs) and theorem provers (TPs) can be effectively combined for verifiable natural language inference (NLI). However, existing approaches rely on a fixed logical forma...
Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding : Abstract: Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to...
PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility : Abstract: Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy a...
DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation : Abstract: Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. Recently,...
From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation : Abstract: Vision-language models are increasingly deployed as computer-use agents (CUAs) that operate desktops and browsers. Top-performing CUAs are framework-based systems that decompose planning and...
StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management : Abstract: Multi-agent systems based on large language models, particularly centralized architectures, have recently shown strong potential for complex and knowledge-intensive tasks. However, central a...
TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents : Abstract: Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and decision-making emerging as core general-purpose cap...
Open-Vocabulary 3D Instruction Ambiguity Detection : Abstract: In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embo...
EvoC2Rust: A Skeleton-guided Framework for Project-Level C-to-Rust Translation : Abstract: Translating legacy C codebases to Rust is increasingly demanded for building safety-critical systems. While various approaches have emerged for this task, they face inherent trade-offs: rule...
SP-Rank: A Dataset for Ranked Preferences with Secondary Information : Abstract: We introduce $\mathbf{SP-Rank}$, the first large-scale, publicly available dataset for benchmarking algorithms that leverage both first-order preferences and second-order predictions in rank...
KP-Agent: Keyword Pruning in Sponsored Search Advertising via LLM-Powered Contextual Bandits : Abstract: Sponsored search advertising (SSA) requires advertisers to constantly adjust keyword strategies. While bid adjustment and keyword generation are well-studied, keyword pruning-refining keywor...
From Events to Trending: A Multi-Stage Hotspots Detection Method Based on Generative Query Indexing : Abstract: LLM-based conversational systems have become a popular gateway for information access, yet most existing chatbots struggle to handle news-related trending queries effectively. To improve use...
Quantifying Document Impact in RAG-LLMs : Abstract: Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this intro...
LLM2IR: simple unsupervised contrastive learning makes long-context LLM great retriever : Abstract: Modern dense information retrieval (IR) models usually rely on costly large-scale pretraining. In this paper, we introduce LLM2IR, an efficient unsupervised contrastive learning framework to...
Engineering the RAG Stack: A Comprehensive Review of the Architecture and Trust Frameworks for Retrieval-Augmented Generation Systems : Abstract: This article provides a comprehensive systematic literature review of academic studies, industrial applications, and real-world deployments from 2018 to 2025, providing a practical guide and...
Cross-Document Topic-Aligned Chunking for Retrieval-Augmented Generation : Abstract: Chunking quality determines RAG system performance. Current methods partition documents individually, but complex queries need information scattered across multiple sources: the knowledge fr...
Retrieval-Augmented Multi-LLM Ensemble for Industrial Part Specification Extraction : Abstract: Industrial part specification extraction from unstructured text remains a persistent challenge in manufacturing, procurement, and maintenance, where manual processing is both time-consuming ...
LiveVectorLake: A Real-Time Versioned Knowledge Base Architecture for Streaming Vector Updates and Temporal Retrieval : Abstract: Modern Retrieval-Augmented Generation (RAG) systems struggle with a fundamental architectural tension: vector indices are optimized for query latency but poorly handle continuous knowledge u...
Bayesian Recovery for Probabilistic Coalition Structures : Abstract: Probabilistic Coalition Structure Generation (PCSG) is NP-hard and can be recast as an $l_0$-type sparse recovery problem by representing coalition structures as sparse coefficient vectors o...
Evolving Cognitive Architectures : Abstract: This article proposes a research and development direction that would lead to the creation of next-generation intelligent technical systems. A distinctive feature of these systems is their a...
Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles : Abstract: Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computat...
On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis : Abstract: We formalise recursive self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system and prove that, as training data become increasingly self-generated...
A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes : Abstract: Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over long-lasting tasks. By integrating memory, to...
MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs : Abstract: The pervasive "memory wall" bottleneck is significantly amplified in modern large-scale Mixture-of-Experts (MoE) architectures. MoE's inherent architectural sparsity leads to sparse arithmet...
Bi-Orthogonal Factor Decomposition for Vision Transformers : Abstract: Self-attention is the central computational primitive of Vision Transformers, yet we lack a principled understanding of what information attention mechanisms exchange between tokens. Attenti...
Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models : Abstract: In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These hig...
A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference : Abstract: Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing conditional inference metho...
PRISM: Protocol Refinement through Intelligent Simulation Modeling : Abstract: Automating experimental protocol design and execution remains as a fundamental bottleneck in realizing self-driving laboratories. We introduce PRISM (Protocol Refinement through Intelligent ...
STResNet & STYOLO : A New Family of Compact Classification and Object Detection Models for MCUs : Abstract: Recent advancements in lightweight neural networks have significantly improved the efficiency of deploying deep learning models on edge hardware. However, most existing architectures still t...
Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed as agents that invoke external tools through structured function calls. While recent work reports strong tool-calling performance under...
Ensemble of radiomics and ConvNeXt for breast cancer diagnosis : Abstract: Early diagnosis of breast cancer is crucial for improving survival rates. Radiomics and deep learning (DL) have shown significant potential in assisting radiologists with early cancer detect...
Multi-task Cross-modal Learning for Chest X-ray Image Retrieval : Abstract: CLIP and BiomedCLIP are examples of vision-language foundation models and offer strong cross-modal embeddings; however, they are not optimized for fine-grained medical retrieval tasks, such ...
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization : Abstract: The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage wor...
Tracing Moral Foundations in Large Language Models : Abstract: Large language models (LLMs) often produce human-like moral judgments, but it is unclear whether this reflects an internal conceptual structure or superficial ``moral mimicry.'' Using Moral ...
Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction : Abstract: Large Language Models (LLMs) demonstrate strong reasoning and self-correction abilities in high-resource languages like English, but their performance remains limited in low-resource languag...
Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning : Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful respon...
STELP: Secure Transpilation and Execution of LLM-Generated Programs : Abstract: Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs ...
Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning : Abstract: Recently, differentiable causal discovery has emerged as a promising approach to improve the accuracy and efficiency of existing methods. However, when applied to high-dimensional data or da...
Prompt-Free SAM-Based Multi-Task Framework for Breast Ultrasound Lesion Segmentation and Classification : Abstract: Accurate tumor segmentation and classification in breast ultrasound (BUS) imaging remain challenging due to low contrast, speckle noise, and diverse lesion morphology. This study presents a ...
Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues : Abstract: Users demand fast, seamless webpage experiences, yet developers often struggle to meet these expectations within tight constraints. Performance optimization, while critical, is a time-consum...
Over-Searching in Search-Augmented Large Language Models : Abstract: Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search -- unnecessarily invoking search tool even...
DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis : Abstract: Accurate and efficient multivariate time series (MTS) analysis is increasingly critical for a wide range of intelligent applications. Within this realm, Transformers have emerged as the pred...
Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts : Abstract: Heterogeneous Graph Neural Networks(HGNNs) have advanced mainly through better encoders, yet their decoding/projection stage still relies on a single shared linear head, assuming it can map ...
Understanding LLM-Driven Test Oracle Generation : Abstract: Automated unit test generation aims to improve software quality while reducing the time and effort required for creating tests manually. However, existing techniques primarily generate regre...
VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck : Abstract: Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal tasks, but remain susceptible to hallucinations, where generated text deviates from the underlying visual co...
Semi-Supervised Facial Expression Recognition based on Dynamic Threshold and Negative Learning : Abstract: Facial expression recognition is a key task in human-computer interaction and affective computing. However, acquiring a large amount of labeled facial expression data is often costly. Theref...
ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging : Abstract: Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success. Yet, equipping domain-specialized models with such reasoning capabilities, refer...
RISE: Rule-Driven SQL Dialect Translation via Query Reduction : Abstract: Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation ...
GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting : Abstract: In the field of 3D dynamic scene reconstruction, how to balance model convergence rate and rendering quality has long been a critical challenge that urgently needs to be addressed, particula...
Naiad: Novel Agentic Intelligent Autonomous System for Inland Water Monitoring : Abstract: Inland water monitoring is vital for safeguarding public health and ecosystems, enabling timely interventions to mitigate risks. Existing methods often address isolated sub-problems such as ...
Mathematical Knowledge Graph-Driven Framework for Equation-Based Predictive and Reliable Additive Manufacturing : Abstract: Additive manufacturing (AM) relies critically on understanding and extrapolating process-property relationships; however, existing data-driven approaches remain limited by fragmented knowled...
Effects of personality steering on cooperative behavior in Large Language Model agents : Abstract: Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that assigning personality traits to LLMs can in...
Improving Enzyme Prediction with Chemical Reaction Equations by Hypergraph-Enhanced Knowledge Graph Embeddings : Abstract: Predicting enzyme-substrate interactions has long been a fundamental problem in biochemistry and metabolic engineering. While existing methods could leverage databases of expert-curated enzy...
The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models : Abstract: Persona conditioning can be viewed as a behavioral prior for large language models (LLMs) and is often assumed to confer expertise and improve safety in a monotonic manner. However, its effe...
Conformity and Social Impact on AI Agents : Abstract: As AI agents increasingly operate in multi-agent environments, understanding their collective behavior becomes critical for predicting the dynamics of artificial societies. This study examin...
On the Effect of Cheating in Chess : Abstract: Cheating in chess, by using advice from powerful software, has become a major problem, reaching the highest levels. As opposed to the large majority of previous work, which concerned {\em de...

Research Sources: 284 | Generated: 1/15/2026