AI RESEARCH PAPERS & ACADEMIC SOURCES
- FeatureSLAM: Feature-enriched 3D gaussian splatting SLAM in real time : Abstract: We present a real-time tracking SLAM system that unifies efficient camera tracking with photorealistic feature-enriched mapping using 3D Gaussian Splatting (3DGS). Our main contribution is i...
- FlyPose: Towards Robust Human Pose Estimation From Aerial Views : Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly deployed in close proximity to humans for applications such as parcel delivery, traffic monitoring, disaster response and infrastructure insp...
- Boosting Latent Diffusion Models via Disentangled Representation Alignment : Abstract: Latent Diffusion Models (LDMs) generate high-quality images by operating in a compressed latent space, typically obtained through image tokenizers such as Variational Autoencoders (VAEs). In...
- GeoSurDepth: Spatial Geometry-Consistent Self-Supervised Depth Estimation for Surround-View Cameras : Abstract: Accurate surround-view depth estimation provides a competitive alternative to laser-based sensors and is essential for 3D scene understanding in autonomous driving. While prior studies have ...
- Kidney Cancer Detection Using 3D-Based Latent Diffusion Models : Abstract: In this work, we present a novel latent diffusion-based pipeline for 3D kidney anomaly detection on contrast-enhanced abdominal CT. The method combines Denoising Diffusion Probabilistic Mode...
- Bidirectional Channel-selective Semantic Interaction for Semi-Supervised Medical Segmentation : Abstract: Semi-supervised medical image segmentation is an effective method for addressing scenarios with limited labeled data. Existing methods mainly rely on frameworks such as mean teacher and dual...
- Phase4DFD: Multi-Domain Phase-Aware Attention for Deepfake Detection : Abstract: Recent deepfake detection methods have increasingly explored frequency domain representations to reveal manipulation artifacts that are difficult to detect in the spatial domain. However, mo...
- Adapting Vision Transformers to Ultra-High Resolution Semantic Segmentation with Relay Tokens : Abstract: Current approaches for segmenting ultra high resolution images either slide a window, thereby discarding global context, or downsample and lose fine detail. We propose a simple yet effective...
- Context-Aware Decoding for Faithful Vision-Language Generation : Abstract: Hallucinations, generating responses inconsistent with the visual input, remain a critical limitation of large vision-language models (LVLMs), especially in open-ended tasks such as image ca...
- WaveRNet: Wavelet-Guided Frequency Learning for Multi-Source Domain-Generalized Retinal Vessel Segmentation : Abstract: Domain-generalized retinal vessel segmentation is critical for automated ophthalmic diagnosis, yet faces significant challenges from domain shift induced by non-uniform illumination and vary...
- Adaptive Conditional Contrast-Agnostic Deformable Image Registration with Uncertainty Estimation : Abstract: Deformable multi-contrast image registration is a challenging yet crucial task due to the complex, non-linear intensity relationships across different imaging contrasts. Conventional registr...
- Deepfake detectors are DUMB: A benchmark to assess adversarial training robustness under transferability constraints : Abstract: Deepfake detection systems deployed in real-world environments are subject to adversaries capable of crafting imperceptible perturbations that degrade model performance. While adversarial tr...
- Adaptive aggregation of Monte Carlo augmented decomposed filters for efficient group-equivariant convolutional neural network : Abstract: Group-equivariant convolutional neural networks (G-CNN) heavily rely on parameter sharing to increase CNN's data efficiency and performance. However, the parameter-sharing strategy greatly i...
- RobustFormer: Noise-Robust Pre-training for images and videos : Abstract: While deep learning-based models like transformers, have revolutionized time-series and vision tasks, they remain highly susceptible to noise and often overfit on noisy patterns rather than ...
- Pyramidal Adaptive Cross-Gating for Multimodal Detection : Abstract: Object detection in aerial imagery is a critical task in applications such as UAV reconnaissance. Although existing methods have extensively explored feature interaction between different mo...
- Large Language Models Are Bad Dice Players: LLMs Struggle to Generate Random Numbers from Statistical Distributions : Abstract: As large language models (LLMs) transition from chat interfaces to integral components of stochastic pipelines across domains like educational assessment and synthetic data construction, the...
- Towards Valid Student Simulation with Large Language Models : Abstract: This paper presents a conceptual and methodological framework for large language model (LLM) based student simulation in educational settings. The authors identify a core failure mode, terme...
- The Facade of Truth: Uncovering and Mitigating LLM Susceptibility to Deceptive Evidence : Abstract: To reliably assist human decision-making, LLMs must maintain factual internal beliefs against misleading injections. While current models resist explicit misinformation, we uncover a fundame...
- MemBuilder: Reinforcing LLMs for Long-Term Memory Construction via Attributed Dense Rewards : Abstract: Maintaining consistency in long-term dialogues remains a fundamental challenge for LLMs, as standard retrieval mechanisms often fail to capture the temporal evolution of historical states. W...
- FlashMem: Distilling Intrinsic Latent Memory via Computation Reuse : Abstract: The stateless architecture of Large Language Models inherently lacks the mechanism to preserve dynamic context, compelling agents to redundantly reprocess history to maintain long-horizon au...
- CHisAgent: A Multi-Agent Framework for Event Taxonomy Construction in Ancient Chinese Cultural Systems : Abstract: Despite strong performance on many tasks, large language models (LLMs) show limited ability in historical and cultural reasoning, particularly in non-English contexts such as Chinese history...
- Double: Breaking the Acceleration Limit via Double Retrieval Speculative Parallelism : Abstract: Parallel Speculative Decoding (PSD) accelerates traditional Speculative Decoding (SD) by overlapping draft generation with verification. However, it remains hampered by two fundamental chall...
- Closing the Modality Reasoning Gap for Speech Large Language Models : Abstract: Although speech large language models have achieved notable progress, a substantial modality reasoning gap remains: their reasoning performance on speech inputs is markedly weaker than on te...
- Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring : Abstract: This study addresses critical gaps in Automated Essay Scoring (AES) systems and Large Language Models (LLMs) with regard to their ability to effectively identify and score harmful essays. De...
- Generation-Based and Emotion-Reflected Memory Update: Creating the KEEM Dataset for Better Long-Term Conversation : Abstract: In this work, we introduce the Keep Emotional and Essential Memory (KEEM) dataset, a novel generation-based dataset designed to enhance memory updates in long-term conversational systems. Un...
- Can large language models interpret unstructured chat data on dynamic group decision-making processes? Evidence on joint destination choice : Abstract: Social activities result from complex joint activity-travel decisions between group members. While observing the decision-making process of these activities is difficult via traditional trav...
- Data Augmented Pipeline for Legal Information Extraction and Reasoning : Abstract: In this paper, we propose a pipeline leveraging Large Language Models (LLMs) for data augmentation in Information Extraction tasks within the legal domain. The proposed method is both simple...
- Text Detoxification in isiXhosa and Yor\`ub\'a: A Cross-Lingual Machine Learning Approach for Low-Resource African Languages : Abstract: Toxic language is one of the major barrier to safe online participation, yet robust mitigation tools are scarce for African languages. This study addresses this critical gap by investigating...
- GIFT: Games as Informal Training for Generalizable LLMs : Abstract: While Large Language Models (LLMs) have achieved remarkable success in formal learning tasks such as mathematics and code generation, they still struggle with the "practical wisdom" and gene...
- Afri-MCQA: Multimodal Cultural Question Answering for African Languages : Abstract: Africa is home to over one-third of the world's languages, yet remains underrepresented in AI research. We introduce Afri-MCQA, the first Multilingual Cultural Question-Answering benchmark c...
- AutoMonitor-Bench: Evaluating the Reliability of LLM-Based Misbehavior Monitor : Abstract: We introduce AutoMonitor-Bench, the first benchmark designed to systematically evaluate the reliability of LLM-based misbehavior monitors across diverse tasks and failure modes. AutoMonitor-...
- One Script Instead of Hundreds? On Pretraining Romanized Encoder Language Models : Abstract: Exposing latent lexical overlap, script romanization has emerged as an effective strategy for improving cross-lingual transfer (XLT) in multilingual language models (mLMs). Most prior work, ...
- LLMs as Science Journalists: Supporting Early-stage Researchers in Communicating Their Science to the Public : Abstract: The scientific community needs tools that help early-stage researchers effectively communicate their findings and innovations to the public. Although existing general-purpose Large Language ...
- Peek2: A Regex-free implementation of pretokenizers for Byte-level BPE : Abstract: Pretokenization is a crucial, sequential pass in Byte-level BPE tokenizers. Our proposed new implementation, Peek2, serves as a drop-in replacement for cl100k-like pretokenizers used in GPT-...
- Left, Right, or Center? Evaluating LLM Framing in News Classification and Generation : Abstract: Large Language Model (LLM) based summarization and text generation are increasingly used for producing and rewriting text, raising concerns about political framing in journalism where subtle...
- Semantic NLP Pipelines for Interoperable Patient Digital Twins from Unstructured EHRs : Abstract: Digital twins -- virtual replicas of physical entities -- are gaining traction in healthcare for personalized monitoring, predictive modeling, and clinical decision support. However, generat...
- What do the metrics mean? A critical analysis of the use of Automated Evaluation Metrics in Interpreting : Abstract: With the growth of interpreting technologies, from remote interpreting and Computer-Aided Interpreting to automated speech translation and interpreting avatars, there is now a high demand fo...
- FACTUM: Mechanistic Detection of Citation Hallucination in Long-Form RAG : Abstract: Retrieval-Augmented Generation (RAG) models are critically undermined by citation hallucinations, a deceptive failure where a model confidently cites a source that fails to support its claim...
- iReasoner: Trajectory-Aware Intrinsic Reasoning Supervision for Self-Evolving Large Multimodal Models : Abstract: Recent work shows that large multimodal models (LMMs) can self-improve from unlabeled data via self-play and intrinsic feedback. Yet existing self-evolving frameworks mainly reward final out...
- HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search : Abstract: Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures ...
- Pantagruel: Unified Self-Supervised Encoders for French Text and Speech : Abstract: We release Pantagruel models, a new family of self-supervised encoder models for French text and speech. Instead of predicting modality-tailored targets such as textual tokens or speech unit...
- Distilling Feedback into Memory-as-a-Tool : Abstract: We propose a framework that amortizes the cost of inference-time reasoning by converting transient critiques into retrievable guidelines, through a file-based memory system and agent-control...
- Don't Break the Cache: An Evaluation of Prompt Caching for Long-Horizon Agentic Tasks : Abstract: Recent advancements in Large Language Model (LLM) agents have enabled complex multi-turn agentic tasks requiring extensive tool calling, where conversations can span dozens of API calls with...
- Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards : Abstract: Reinforcement learning (RL) has emerged as a critical technique for enhancing LLM-based deep search agents. However, existing approaches primarily rely on binary outcome rewards, which fail ...
- TagRAG: Tag-guided Hierarchical Knowledge Graph Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation enhances language models by retrieving external knowledge to support informed and grounded responses. However, traditional RAG methods rely on fragment-level r...
- ROAP: A Reading-Order and Attention-Prior Pipeline for Optimizing Layout Transformers in Key Information Extraction : Abstract: The efficacy of Multimodal Transformers in visually-rich document understanding (VrDU) is critically constrained by two inherent limitations: the lack of explicit modeling for logical readin...
- MMViR: A Multi-Modal and Multi-Granularity Representation for Long-range Video Understanding : Abstract: Long videos, ranging from minutes to hours, present significant challenges for current Multi-modal Large Language Models (MLLMs) due to their complex events, diverse scenes, and long-range d...
- Enabling Stroke-Level Structural Analysis of Hieroglyphic Scripts without Language-Specific Priors : Abstract: Hieroglyphs, as logographic writing systems, encode rich semantic and cultural information within their internal structural composition. Yet, current advanced Large Language Models (LLMs) an...
- The ICASSP 2026 HumDial Challenge: Benchmarking Human-like Spoken Dialogue Systems in the LLM Era : Abstract: Driven by the rapid advancement of Large Language Models (LLMs), particularly Audio-LLMs and Omni-models, spoken dialogue systems have evolved significantly, progressively narrowing the gap ...
- Continual Pretraining on Encrypted Synthetic Data for Privacy-Preserving LLMs : Abstract: Preserving privacy in sensitive data while pretraining large language models on small, domain-specific corpora presents a significant challenge. In this work, we take an exploratory step tow...
- Expression Syntax Information Bottleneck for Math Word Problems : Abstract: Math Word Problems (MWP) aims to automatically solve mathematical questions given in texts. Previous studies tend to design complex models to capture additional information in the original t...
- Coding the Visual World: From Image to Simulation Using Vision Language Models : Abstract: The ability to construct mental models of the world is a central aspect of understanding. Similarly, visual understanding can be viewed as the ability to construct a representative model of ...
- MOSAIC-GS: Monocular Scene Reconstruction via Advanced Initialization for Complex Dynamic Environments : Abstract: We present MOSAIC-GS, a novel, fully explicit, and computationally efficient approach for high-fidelity dynamic scene reconstruction from monocular videos using Gaussian Splatting. Monocular...
- EdgeLDR: Quaternion Low-Displacement Rank Neural Networks for Edge-Efficient Deep Learning : Abstract: Deploying deep neural networks on edge devices is often limited by the memory traffic and compute cost of dense linear operators. While quaternion neural networks improve parameter efficienc...
- Sketch&Patch++: Efficient Structure-Aware 3D Gaussian Representation : Abstract: We observe that Gaussians exhibit distinct roles and characteristics analogous to traditional artistic techniques -- like how artists first sketch outlines before filling in broader areas wi...
- TAPM-Net: Trajectory-Aware Perturbation Modeling for Infrared Small Target Detection : Abstract: Infrared small target detection (ISTD) remains a long-standing challenge due to weak signal contrast, limited spatial extent, and cluttered backgrounds. Despite performance improvements from...
- Multi-Image Super Resolution Framework for Detection and Analysis of Plant Roots : Abstract: Understanding plant root systems is critical for advancing research in soil-plant interactions, nutrient uptake, and overall plant health. However, accurate imaging of roots in subterranean ...
- Hippocampal Atrophy Patterns Across the Alzheimer's Disease Spectrum: A Voxel-Based Morphometry Analysis : Abstract: Alzheimer's disease (AD) and mild cognitive impairment (MCI) are associated with progressive gray matter loss, particularly in medial temporal structures. In this study, CAT12/SPM12 voxel-ba...
- GaussianSwap: Animatable Video Face Swapping with 3D Gaussian Splatting : Abstract: We introduce GaussianSwap, a novel video face swapping framework that constructs a 3D Gaussian Splatting based face avatar from a target video while transferring identity from a source image...
- SAS-VPReID: A Scale-Adaptive Framework with Shape Priors for Video-based Person Re-Identification at Extreme Far Distances : Abstract: Video-based Person Re-IDentification (VPReID) aims to retrieve the same person from videos captured by non-overlapping cameras. At extreme far distances, VPReID is highly challenging due to ...
- DIFF-MF: A Difference-Driven Channel-Spatial State Space Model for Multi-Modal Image Fusion : Abstract: Multi-modal image fusion aims to integrate complementary information from multiple source images to produce high-quality fused images with enriched content. Although existing approaches base...
- MoGen: A Unified Collaborative Framework for Controllable Multi-Object Image Generation : Abstract: Existing multi-object image generation methods face difficulties in achieving precise alignment between localized image generation regions and their corresponding semantics based on language...
- One Language-Free Foundation Model Is Enough for Universal Vision Anomaly Detection : Abstract: Universal visual anomaly detection (AD) aims to identify anomaly images and segment anomaly regions towards open and dynamic scenarios, following zero- and few-shot paradigms without any dat...
- What's Left Unsaid? Detecting and Correcting Misleading Omissions in Multimodal News Previews : Abstract: Even when factually correct, social-media news previews (image-headline pairs) can induce interpretation drift: by selectively omitting crucial context, they lead readers to form judgments t...
- Towards Generalized Multi-Image Editing for Unified Multimodal Models : Abstract: Unified Multimodal Models (UMMs) integrate multimodal understanding and generation, yet they are limited to maintaining visual consistency and disambiguating visual cues when referencing det...
- Orient Anything V2: Unifying Orientation and Rotation Understanding : Abstract: This work presents Orient Anything V2, an enhanced foundation model for unified understanding of object 3D orientation and rotation from single or paired images. Building upon Orient Anythin...
- Generalizable and Adaptive Continual Learning Framework for AI-generated Image Detection : Abstract: The malicious misuse and widespread dissemination of AI-generated images pose a significant threat to the authenticity of online information. Current detection methods often struggle to gene...
- Learning Geometric Invariance for Gait Recognition : Abstract: The goal of gait recognition is to extract identity-invariant features of an individual under various gait conditions, e.g., cross-view and cross-clothing. Most gait models strive to implici...
- LatentVLA: Efficient Vision-Language Models for Autonomous Driving via Latent Action Prediction : Abstract: End-to-end autonomous driving models trained on largescale datasets perform well in common scenarios but struggle with rare, long-tail situations due to limited scenario diversity. Recent Vi...
- SGDrive: Scene-to-Goal Hierarchical World Cognition for Autonomous Driving : Abstract: Recent end-to-end autonomous driving approaches have leveraged Vision-Language Models (VLMs) to enhance planning capabilities in complex driving scenarios. However, VLMs are inherently train...
- SketchVL: Policy Optimization via Fine-Grained Credit Assignment for Chart Understanding and More : Abstract: Charts are high-density visual carriers of complex data and medium for information extraction and analysis. Due to the need for precise and complex visual reasoning, automated chart understa...
- Rotate Your Character: Revisiting Video Diffusion Models for High-Quality 3D Character Generation : Abstract: Generating high-quality 3D characters from single images remains a significant challenge in digital content creation, particularly due to complex body poses and self-occlusion. In this paper...
- TAGRPO: Boosting GRPO on Image-to-Video Generation with Direct Trajectory Alignment : Abstract: Recent studies have demonstrated the efficacy of integrating Group Relative Policy Optimization (GRPO) into flow matching models, particularly for text-to-image and text-to-video generation....
- Transforming User Defined Criteria into Explainable Indicators with an Integrated LLM AHP System : Abstract: Evaluating complex texts across domains requires converting user defined criteria into quantitative, explainable indicators, which is a persistent challenge in search and recommendation syst...
- Studying Illustrations in Manuscripts: An Efficient Deep-Learning Approach : Abstract: The recent Artificial Intelligence (AI) revolution has opened transformative possibilities for the humanities, particularly in unlocking the visual content embedded in historical manuscripts...
- Enhancing Foundation Models in Transaction Understanding with LLM-based Sentence Embeddings : Abstract: The ubiquity of payment networks generates vast transactional data encoding rich consumer and merchant behavioral patterns. Recent foundation models for transaction analysis process tabular ...
- On the use of case estimate and transactional payment data in neural networks for individual loss reserving : Abstract: The use of neural networks trained on individual claims data has become increasingly popular in the actuarial reserving literature. We consider how to best input historical payment data in n...
- Channel Selected Stratified Nested Cross Validation for Clinically Relevant EEG Based Parkinsons Disease Detection : Abstract: The early detection of Parkinsons disease remains a critical challenge in clinical neuroscience, with electroencephalography offering a noninvasive and scalable pathway toward population lev...
- A universal vision transformer for fast calorimeter simulations : Abstract: The high-dimensional complex nature of detectors makes fast calorimeter simulations a prime application for modern generative machine learning. Vision transformers (ViTs) can emulate the Gea...
- Machine learning assisted state prediction of misspecified linear dynamical system via modal reduction : Abstract: Accurate prediction of structural dynamics is imperative for preserving digital twin fidelity throughout operational lifetimes. Parametric models with fixed nominal parameters often omit cri...
- Optimizing Digital Adjudication through Social Network Analysis: An Empirical Study of Credit Card Disputes in Beijing : Abstract: Amid the rapid digitalization of judicial systems, the integration of big data into adjudication remains underexplored, particularly in uncovering the structural logic of legal applications....
- Generalized Canonical Polyadic Tensor Decompositions with General Symmetry : Abstract: Canonical Polyadic (CP) tensor decomposition is a workhorse algorithm for discovering underlying low-dimensional structure in tensor data. This is accomplished in conventional CP decompositi...
- Archetypal cases for questionnaires with nominal multiple choice questions : Abstract: Archetypal analysis serves as an exploratory tool that interprets a collection of observations as convex combinations of pure (extreme) patterns. When these patterns correspond to actual obs...
- Dynamic Inclusion and Bounded Multi-Factor Tilts for Robust Portfolio Construction : Abstract: This paper proposes a portfolio construction framework designed to remain robust under estimation error, non-stationarity, and realistic trading constraints. The methodology combines dynamic...
- A brief note on learning problem with global perspectives : Abstract: This brief note considers the problem of learning with dynamic-optimizing principal-agent setting, in which the agents are allowed to have global perspectives about the learning process, i.e...
- What Functions Does XGBoost Learn? : Abstract: This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understandin...
- Knowledge-Driven Multi-Turn Jailbreaking on Large Language Models : Abstract: Large Language Models (LLMs) face a significant threat from multi-turn jailbreak attacks, where adversaries progressively steer conversations to elicit harmful outputs. However, the practica...
- Autonomous Probe Microscopy with Robust Bag-of-Features Multi-Objective Bayesian Optimization: Pareto-Front Mapping of Nanoscale Structure-Property Trade-Offs : Abstract: Combinatorial materials libraries are an efficient route to generate large families of candidate compositions, but their impact is often limited by the speed and depth of characterization an...
- DNATokenizer: A GPU-First Byte-to-Identifier Tokenizer for High-Throughput DNA Language Models : Abstract: Tokenization sits at the boundary between high-throughput genomic input and GPU compute, posing challenges in both algorithm design and system throughput. Overlapping k-mer tokenization can ...
- Autonomous Discovery of the Ising Model's Critical Parameters with Reinforcement Learning : Abstract: Traditional methods for determining critical parameters are often influenced by human factors. This research introduces a physics-inspired adaptive reinforcement learning framework that enab...
- Quantifying and Inducing Shape Bias in CNNs via Max-Pool Dilation : Abstract: Convolutional Neural Networks (CNNs) are known to exhibit a strong texture bias, favoring local patterns over global shape information--a tendency inherent to their convolutional architectur...
- SceneAlign: Aligning Multimodal Reasoning to Scene Graphs in Complex Visual Scenes : Abstract: Multimodal large language models often struggle with faithful reasoning in complex visual scenes, where intricate entities and relations require precise visual grounding at each step. This r...
- Compressing image encoders via latent distillation : Abstract: Deep learning models for image compression often face practical limitations in hardware-constrained applications. Although these models achieve high-quality reconstructions, they are typical...
- Multilingual Amnesia: On the Transferability of Unlearning in Multilingual LLMs : Abstract: As multilingual large language models become more widely used, ensuring their safety and fairness across diverse linguistic contexts presents unique challenges. While existing research on ma...
- Tracing Stereotypes in Pre-trained Transformers: From Biased Neurons to Fairer Models : Abstract: The advent of transformer-based language models has reshaped how AI systems process and generate text. In software engineering (SE), these models now support diverse activities, accelerating...
- ViTNT-FIQA: Training-Free Face Image Quality Assessment with Vision Transformers : Abstract: Face Image Quality Assessment (FIQA) is essential for reliable face recognition systems. Current approaches primarily exploit only final-layer representations, while training-free methods re...
- Simplify-This: A Comparative Analysis of Prompt-Based and Fine-Tuned LLMs : Abstract: Large language models (LLMs) enable strong text generation, and in general there is a practical tradeoff between fine-tuning and prompt engineering. We introduce Simplify-This, a comparative...
- Sequential Bayesian Optimal Experimental Design in Infinite Dimensions via Policy Gradient Reinforcement Learning : Abstract: Sequential Bayesian optimal experimental design (SBOED) for PDE-governed inverse problems is computationally challenging, especially for infinite-dimensional random field parameters. High-fi...
- Multi-task Modeling for Engineering Applications with Sparse Data : Abstract: Modern engineering and scientific workflows often require simultaneous predictions across related tasks and fidelity levels, where high-fidelity data is scarce and expensive, while low-fidel...
- A Critical Examination of Active Learning Workflows in Materials Science : Abstract: Active learning (AL) plays a critical role in materials science, enabling applications such as the construction of machine-learning interatomic potentials for atomistic simulations and the o...
- DeePM: Regime-Robust Deep Learning for Systematic Macro Portfolio Management : Abstract: We propose DeePM (Deep Portfolio Manager), a structured deep-learning macro portfolio manager trained end-to-end to maximize a robust, risk-adjusted utility. DeePM addresses three fundamenta...
- AWaRe-SAC: Proactive Slice Admission Control under Weather-Induced Capacity Uncertainty : Abstract: As emerging applications demand higher throughput and lower latencies, operators are increasingly deploying millimeter-wave (mmWave) links within x-haul transport networks, spanning fronthau...
- CyberGFM: Graph Foundation Models for Lateral Movement Detection in Enterprise Networks : Abstract: Representing networks as a graph and training a link prediction model using benign connections is an effective method of anomaly-based intrusion detection. Existing works using this techniqu...
- Detecting Stochasticity in Discrete Signals via Nonparametric Excursion Theorem : Abstract: We develop a practical framework for distinguishing diffusive stochastic processes from deterministic signals using only a single discrete time series. Our approach is based on classical exc...
- Manifold limit for the training of shallow graph convolutional neural networks : Abstract: We study the discrete-to-continuum consistency of the training of shallow graph convolutional neural networks (GCNNs) on proximity graphs of sampled point clouds under a manifold assumption....
- Utilising physics-guided deep learning to overcome data scarcity : Abstract: Deep learning (DL) relies heavily on data, and the quality of data influences its performance significantly. However, obtaining high-quality, well-annotated datasets can be challenging or ev...
- Simple Mechanisms for Representing, Indexing and Manipulating Concepts : Abstract: Supervised and unsupervised learning using deep neural networks typically aims to exploit the underlying structure in the training data; this structure is often explained using a latent gene...
- FedScalar: A Communication efficient Federated Learning : Abstract: Federated learning (FL) has gained considerable popularity for distributed machine learning due to its ability to preserve the privacy of participating agents by eliminating the need for dat...
- Precise Asymptotics for Spectral Methods in Mixed Generalized Linear Models : Abstract: In a mixed generalized linear model, the goal is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one. We consider ...
- Data-Driven Approach to Capitation Reform in Rwanda : Abstract: As part of Rwanda's transition toward universal health coverage, the national Community-Based Health Insurance (CBHI) scheme is moving from retrospective fee-for-service reimbursements to pr...
- The Table of Media Bias Elements: A sentence-level taxonomy of media bias types and propaganda techniques : Abstract: Public debates about "left-" or "right-wing" news overlook the fact that bias is usually conveyed by concrete linguistic manoeuvres that transcend any single political spectrum. We therefore...
- Same Claim, Different Judgment: Benchmarking Scenario-Induced Bias in Multilingual Financial Misinformation Detection : Abstract: Large language models (LLMs) have been widely applied across various domains of finance. Since their training data are largely derived from human-authored corpora, LLMs may inherit a range o...
- Glitter: Visualizing Lexical Surprisal for Readability in Administrative Texts : Abstract: This work investigates how measuring information entropy of text can be used to estimate its readability. We propose a visualization framework that can be used to approximate information ent...
- LookAroundNet: Extending Temporal Context with Transformers for Clinically Viable EEG Seizure Detection : Abstract: Automated seizure detection from electroencephalography (EEG) remains difficult due to the large variability of seizure dynamics across patients, recording conditions, and clinical settings....
- Rapid Adaptation of SpO2 Estimation to Wearable Devices via Transfer Learning on Low-Sampling-Rate PPG : Abstract: Blood oxygen saturation (SpO2) is a vital marker for healthcare monitoring. Traditional SpO2 estimation methods often rely on complex clinical calibration, making them unsuitable for low-pow...
- Generalizable Blood Pressure Estimation from Multi-Wavelength PPG Using Curriculum-Adversarial Learning : Abstract: Accurate and generalizable blood pressure (BP) estimation is vital for the early detection and management of cardiovascular diseases. In this study, we enforce subject-level data splitting o...
- Improving User Experience with Personalized Review Ranking and Summarization : Abstract: Online consumer reviews play a crucial role in guiding purchase decisions by offering insights into product quality, usability, and performance. However, the increasing volume of user-genera...
- Simulating Multi-Stakeholder Decision-Making with Generative Agents in Urban Planning : Abstract: Reaching consensus in urban planning is a complex process often hindered by prolonged negotiations, trade-offs, power dynamics, and competing stakeholder interests, resulting in inefficienci...
- Explainable AI needs formalization : Abstract: The field of "explainable artificial intelligence" (XAI) seemingly addresses the desire that decisions of machine learning systems should be human-understandable. However, in its current sta...
- Towards AI-Native Software Engineering (SE 3.0): A Vision and a Challenge Roadmap : Abstract: The rise of AI-assisted software engineering (SE 2.0), powered by Foundation Models (FMs) and FM-powered coding assistants, has shown promise in improving developer productivity. However, it...
- Controlled Automatic Task-Specific Synthetic Data Generation for Hallucination Detection : Abstract: We present a novel approach to automatically generate non-trivial task-specific synthetic datasets for hallucination detection. Our approach features a two-step generation-selection pipeline...
- TIME: Temporally Intelligent Meta-reasoning Engine for Context Triggered Explicit Reasoning : Abstract: Reasoning oriented large language models often expose explicit "thinking" as long, turn-global traces at the start of every response, either always on or toggled externally at inference time...
- Ontology Neural Networks for Topologically Conditioned Constraint Satisfaction : Abstract: Neuro-symbolic reasoning systems face fundamental challenges in maintaining semantic coherence while satisfying physical and logical constraints. Building upon our previous work on Ontology ...
- When the Server Steps In: Calibrated Updates for Fair Federated Learning : Abstract: Federated learning (FL) has emerged as a transformative distributed learning paradigm, enabling multiple clients to collaboratively train a global model under the coordination of a central s...
- GlyRAG: Context-Aware Retrieval-Augmented Framework for Blood Glucose Forecasting : Abstract: Accurate forecasting of blood glucose from CGM is essential for preventing dysglycemic events, thus enabling proactive diabetes management. However, current forecasting models treat blood gl...
- The Kernel Manifold: A Geometric Approach to Gaussian Process Model Selection : Abstract: Gaussian Process (GP) regression is a powerful nonparametric Bayesian framework, but its performance depends critically on the choice of covariance kernel. Selecting an appropriate kernel is...
- Inverting Non-Injective Functions with Twin Neural Network Regression : Abstract: Non-injective functions are not invertible. However, non-injective functions can be restricted to sub-domains on which they are locally injective and surjective and thus invertible if the di...
- Imitation Learning for Combinatorial Optimisation under Uncertainty : Abstract: Imitation learning (IL) provides a data-driven framework for approximating policies for large-scale combinatorial optimisation problems formulated as sequential decision problems (SDPs), whe...
- DynaSTy: A Framework for SpatioTemporal Node Attribute Prediction in Dynamic Graphs : Abstract: Accurate multistep forecasting of node-level attributes on dynamic graphs is critical for applications ranging from financial trust networks to biological networks. Existing spatiotemporal g...
- Interactive Distillation for Cooperative Multi-Agent Reinforcement Learning : Abstract: Knowledge distillation (KD) has the potential to accelerate MARL by employing a centralized teacher for decentralized students but faces key bottlenecks. Specifically, there are (1) challeng...
- Efficient Inference for Noisy LLM-as-a-Judge Evaluation : Abstract: Large language models (LLMs) are increasingly used as automatic evaluators of generative AI outputs, a paradigm often referred to as "LLM-as-a-judge." In practice, LLM judges are imperfect p...
- Prediction of Fault Slip Tendency in CO${_2}$ Storage using Data-space Inversion : Abstract: Accurately assessing the potential for fault slip is essential in many subsurface operations. Conventional model-based history matching methods, which entail the generation of posterior geom...
- RingSQL: Generating Synthetic Data with Schema-Independent Templates for Text-to-SQL Reasoning Models : Abstract: Recent advances in text-to-SQL systems have been driven by larger models and improved datasets, yet progress is still limited by the scarcity of high-quality training data. Manual data creat...
- MaxCode: A Max-Reward Reinforcement Learning Framework for Automated Code Optimization : Abstract: Large Language Models (LLMs) demonstrate strong capabilities in general coding tasks but encounter two key challenges when optimizing code: (i) the complexity of writing optimized code (such...
- Hi-ZFO: Hierarchical Zeroth- and First-Order LLM Fine-Tuning via Importance-Guided Tensor Selection : Abstract: Fine-tuning large language models (LLMs) using standard first-order (FO) optimization often drives training toward sharp, poorly generalizing minima. Conversely, zeroth-order (ZO) methods of...
- Toward an Integrated Cross-Urban Accident Prevention System: A Multi-Task Spatial-Temporal Learning Framework for Urban Safety Management : Abstract: The development of a cross-city accident prevention system is particularly challenging due to the heterogeneity, inconsistent reporting, and inherently clustered, sparse, cyclical, and noisy...
- Buffered AUC maximization for scoring systems via mixed-integer optimization : Abstract: A scoring system is a linear classifier composed of a small number of explanatory variables, each assigned a small integer coefficient. This system is highly interpretable and allows predict...
- Learn to Evolve: Self-supervised Neural JKO Operator for Wasserstein Gradient Flow : Abstract: The Jordan-Kinderlehrer-Otto (JKO) scheme provides a stable variational framework for computing Wasserstein gradient flows, but its practical use is often limited by the high computational c...
- Poisson Hyperplane Processes with Rectified Linear Units : Abstract: Neural networks have shown state-of-the-art performances in various classification and regression tasks. Rectified linear units (ReLU) are often used as activation functions for the hidden l...
- PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning : Abstract: We introduce Parallel Coordinated Reasoning (PaCoRe), a training-and-inference framework designed to overcome a central limitation of contemporary language models: their inability to scale t...
- Good Allocations from Bad Estimates : Abstract: Conditional average treatment effect (CATE) estimation is the de facto gold standard for targeting a treatment to a heterogeneous population. The method estimates treatment effects up to an ...
- Orchestrating Tokens and Sequences: Dynamic Hybrid Policy Optimization for RLVR : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) offers a promising framework for optimizing large language models in reasoning tasks. However, existing RLVR algorithms focus on differe...
- Dual-Phase LLM Reasoning: Self-Evolved Mathematical Frameworks : Abstract: In recent years, large language models (LLMs) have demonstrated significant potential in complex reasoning tasks like mathematical problem-solving. However, existing research predominantly r...
- Continual Learning of Achieving Forgetting-free and Positive Knowledge Transfer : Abstract: Existing research on continual learning (CL) of a sequence of tasks focuses mainly on dealing with catastrophic forgetting (CF) to balance the learning plasticity of new tasks and the memory...
- From Global to Local: Cluster-Aware Learning for Wi-Fi Fingerprinting Indoor Localisation : Abstract: Wi-Fi fingerprinting remains one of the most practical solutions for indoor positioning, however, its performance is often limited by the size and heterogeneity of fingerprint datasets, stro...
- Do Sparse Autoencoders Identify Reasoning Features in Language Models? : Abstract: We investigate whether sparse autoencoders (SAEs) identify genuine reasoning features in large language models (LLMs). Starting from features selected using standard contrastive activation m...
- FLRQ: Faster LLM Quantization with Flexible Low-Rank Matrix Sketching : Abstract: Traditional post-training quantization (PTQ) is considered an effective approach to reduce model size and accelerate inference of large-scale language models (LLMs). However, existing low-ra...
- Weights to Code: Extracting Interpretable Algorithms from the Discrete Transformer : Abstract: Algorithm extraction aims to synthesize executable programs directly from models trained on specific algorithmic tasks, enabling de novo algorithm discovery without relying on human-written ...
- Fusion Matters: Length-Aware Analysis of Positional-Encoding Fusion in Transformers : Abstract: Transformers require positional encodings to represent sequence order, yet most prior work focuses on designing new positional encodings rather than examining how positional information is f...
- Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem : Abstract: Motivated by the growing interest in representation learning approaches that uncover the latent structure of high-dimensional data, this work proposes new algorithms for reconstruction-based...
- Detecting Autism Spectrum Disorder with Deep Eye Movement Features : Abstract: Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder characterized by deficits in social communication and behavioral patterns. Eye movement data offers a non-invasive diagnostic ...
- A Dual Pipeline Machine Learning Framework for Automated Multi Class Sleep Disorder Screening Using Hybrid Resampling and Ensemble Learning : Abstract: Accurate classification of sleep disorders, particularly insomnia and sleep apnea, is important for reducing long term health risks and improving patient quality of life. However, clinical s...
- A New Family of Poisson Non-negative Matrix Factorization Methods Using the Shifted Log Link : Abstract: Poisson non-negative matrix factorization (NMF) is a widely used method to find interpretable "parts-based" decompositions of count data. While many variants of Poisson NMF exist, existing m...
- GlueNN: gluing patchwise analytic solutions with neural networks : Abstract: In many problems in physics and engineering, one encounters complicated differential equations with strongly scale-dependent terms for which exact analytical or numerical solutions are not a...
- Distilling Lightweight Domain Experts from Large ML Models by Identifying Relevant Subspaces : Abstract: Knowledge distillation involves transferring the predictive capabilities of large, high-performing AI models (teachers) to smaller models (students) that can operate in environments with lim...
- Prophet as a Repro ducible Forecasting Framework: A Methodological Guide for Business and Financial Analytics : Abstract: Reproducibility remains a persistent challenge in forecasting research and practice, particularly in business and financial analytics where forecasts inform high-stakes decisions. Traditiona...
- On the Robustness of Age for Learning-Based Wireless Scheduling in Unknown Environments : Abstract: The constrained combinatorial multi-armed bandit model has been widely employed to solve problems in wireless networking and related areas, including the problem of wireless scheduling for t...
- Community-Based Model Sharing and Generalisation: Anomaly Detection in IoT Temperature Sensor Networks : Abstract: The rapid deployment of Internet of Things (IoT) devices has led to large-scale sensor networks that monitor environmental and urban phenomena in real time. Communities of Interest (CoIs) pr...
- HogVul: Black-box Adversarial Code Generation Framework Against LM-based Vulnerability Detectors : Abstract: Recent advances in software vulnerability detection have been driven by Language Model (LM)-based approaches. However, these models remain vulnerable to adversarial attacks that exploit lexi...
- Autoregressive Ranking: Bridging the Gap Between Dual and Cross Encoders : Abstract: Dual and cross encoders have long been mainstays of information retrieval (IR), but are being challenged by the emergent capabilities of LLMs. An LLM-based approach we term pointwise generat...
- ACR: Adaptive Context Refactoring via Context Refactoring Operators for Multi-Turn Dialogue : Abstract: Large Language Models (LLMs) have shown remarkable performance in multi-turn dialogue. However, in multi-turn dialogue, models still struggle to stay aligned with what has been established e...
- PiXTime: A Model for Federated Time Series Forecasting with Heterogeneous Data Structures Across Nodes : Abstract: Time series are highly valuable and rarely shareable across nodes, making federated learning a promising paradigm to leverage distributed temporal data. However, different sampling standards...
- Transformer Is Inherently a Causal Learner : Abstract: We reveal that transformers trained in an autoregressive manner naturally encode time-delayed causal structures in their learned representations. When predicting future values in multivariat...
- Open World Knowledge Aided Single-Cell Foundation Model with Robust Cross-Modal Cell-Language Pre-training : Abstract: Recent advancements in single-cell multi-omics, particularly RNA-seq, have provided profound insights into cellular heterogeneity and gene regulation. While pre-trained language model (PLM) ...
- A Framework for Personalized Persuasiveness Prediction via Context-Aware User Profiling : Abstract: Estimating the persuasiveness of messages is critical in various applications, from recommender systems to safety assessment of LLMs. While it is imperative to consider the target persuadee'...
- Stephanie2: Thinking, Waiting, and Making Decisions Like Humans in Step-by-Step AI Social Chat : Abstract: Instant-messaging human social chat typically progresses through a sequence of short messages. Existing step-by-step AI chatting systems typically split a one-shot generation into multiple m...
- Advancing credit mobility through stakeholder-informed AI design and adoption : Abstract: Transferring from a 2-year to a 4-year college is crucial for socioeconomic mobility, yet students often face challenges ensuring their credits are fully recognized, leading to delays in the...
- AGDC: Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces : Abstract: Transformer-based autoregressive models excel in data generation but are inherently constrained by their reliance on discretized tokens, which limits their ability to represent continuous va...
- Joint Optimization of Neural Autoregressors via Scoring rules : Abstract: Non-parametric distributional regression has achieved significant milestones in recent years. Among these, the Tabular Prior-Data Fitted Network (TabPFN) has demonstrated state-of-the-art pe...
- AIBoMGen: Generating an AI Bill of Materials for Secure, Transparent, and Compliant Model Training : Abstract: The rapid adoption of complex AI systems has outpaced the development of tools to ensure their transparency, security, and regulatory compliance. In this paper, the AI Bill of Materials (AIB...
- Multimodal In-context Learning for ASR of Low-resource Languages : Abstract: Automatic speech recognition (ASR) still covers only a small fraction of the world's languages, mainly due to supervised data scarcity. In-context learning (ICL) with large language models (...
- Visualising Information Flow in Word Embeddings with Diffusion Tensor Imaging : Abstract: Understanding how large language models (LLMs) represent natural language is a central challenge in natural language processing (NLP) research. Many existing methods extract word embeddings ...
- mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations : Abstract: Hyper-Connections (HC) generalizes residual connections by introducing dynamic residual matrices that mix information across multiple residual streams, accelerating convergence in deep neura...
- The Echo Chamber Multi-Turn LLM Jailbreak : Abstract: The availability of Large Language Models (LLMs) has led to a new generation of powerful chatbots that can be developed at relatively low cost. As companies deploy these tools, security chal...
- Analysing Differences in Persuasive Language in LLM-Generated Text: Uncovering Stereotypical Gender Patterns : Abstract: Large language models (LLMs) are increasingly used for everyday communication tasks, including drafting interpersonal messages intended to influence and persuade. Prior work has shown that L...
- VIGIL: Defending LLM Agents Against Tool Stream Injection via Verify-Before-Commit : Abstract: LLM agents operating in open environments face escalating risks from indirect prompt injection, particularly within the tool stream where manipulated metadata and runtime feedback hijack exe...
- Variational Autoencoders for P-wave Detection on Strong Motion Earthquake Spectrograms : Abstract: Accurate P-wave detection is critical for earthquake early warning, yet strong-motion records pose challenges due to high noise levels, limited labeled data, and complex waveform characteris...
- Adaptive Disentangled Representation Learning for Incomplete Multi-View Multi-Label Classification : Abstract: Multi-view multi-label learning frequently suffers from simultaneous feature absence and incomplete annotations, due to challenges in data acquisition and cost-intensive supervision. To tack...
- SAFE: Secure and Accurate Federated Learning for Privacy-Preserving Brain-Computer Interfaces : Abstract: Electroencephalogram (EEG)-based brain-computer interfaces (BCIs) are widely adopted due to their efficiency and portability; however, their decoding algorithms still face multiple challenge...
- Tensor-DTI: Enhancing Biomolecular Interaction Prediction with Contrastive Embedding Learning : Abstract: Accurate drug-target interaction (DTI) prediction is essential for computational drug discovery, yet existing models often rely on single-modality predefined molecular descriptors or sequenc...
- EnvScaler: Scaling Tool-Interactive Environments for LLM Agent via Programmatic Synthesis : Abstract: Large language models (LLMs) are expected to be trained to act as agents in various real-world environments, but this process relies on rich and varied tool-interaction sandboxes. However, a...
- SceneFoundry: Generating Interactive Infinite 3D Worlds : Abstract: The ability to automatically generate large-scale, interactive, and physically realistic 3D environments is crucial for advancing robotic learning and embodied intelligence. However, existin...
- Decoding Workload and Agreement From EEG During Spoken Dialogue With Conversational AI : Abstract: Passive brain-computer interfaces offer a potential source of implicit feedback for alignment of large language models, but most mental state decoding has been done in controlled tasks. This...
- Influence of Parallelism in Vector-Multiplication Units on Correlation Power Analysis : Abstract: The use of neural networks in edge devices is increasing, which introduces new security challenges related to the neural networks' confidentiality. As edge devices often offer physical acces...
- Intelligent Singularity Avoidance in UR10 Robotic Arm Path Planning Using Hybrid Fuzzy Logic and Reinforcement Learning : Abstract: This paper presents a comprehensive approach to singularity detection and avoidance in UR10 robotic arm path planning through the integration of fuzzy logic safety systems and reinforcement ...
- DexterCap: An Affordable and Automated System for Capturing Dexterous Hand-Object Manipulation : Abstract: Capturing fine-grained hand-object interactions is challenging due to severe self-occlusion from closely spaced fingers and the subtlety of in-hand manipulation motions. Existing optical mot...
- Goal Force: Teaching Video Models To Accomplish Physics-Conditioned Goals : Abstract: Recent advancements in video generation have enabled the development of ``world models'' capable of simulating potential futures for robotics and planning. However, specifying precise goals ...
- Router-Suggest: Dynamic Routing for Multimodal Auto-Completion in Visually-Grounded Dialogs : Abstract: Real-time multimodal auto-completion is essential for digital assistants, chatbots, design tools, and healthcare consultations, where user inputs rely on shared visual context. We introduce ...
- LayerGS: Decomposition and Inpainting of Layered 3D Human Avatars via 2D Gaussian Splatting : Abstract: We propose a novel framework for decomposing arbitrarily posed humans into animatable multi-layered 3D human avatars, separating the body and garments. Conventional single-layer reconstructi...
- CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning : Abstract: Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference...
- IIB-LPO: Latent Policy Optimization via Iterative Information Bottleneck : Abstract: Recent advances in Reinforcement Learning with Verifiable Rewards (RLVR) for Large Language Model (LLM) reasoning have been hindered by a persistent challenge: exploration collapse. The sema...
- Continual-learning for Modelling Low-Resource Languages from Large Language Models : Abstract: Modelling a language model for a multi-lingual scenario includes several potential challenges, among which catastrophic forgetting is the major challenge. For example, small language models ...
- Gender Bias in LLMs: Preliminary Evidence from Shared Parenting Scenario in Czech Family Law : Abstract: Access to justice remains limited for many people, leading laypersons to increasingly rely on Large Language Models (LLMs) for legal self-help. Laypeople use these tools intuitively, which m...
- An Empirical Study on Preference Tuning Generalization and Diversity Under Domain Shift : Abstract: Preference tuning aligns pretrained language models to human judgments of quality, helpfulness, or safety by optimizing over explicit preference signals rather than likelihood alone. Prior w...
- Can AI mediation improve democratic deliberation? : Abstract: The strength of democracy lies in the free and equal exchange of diverse viewpoints. Living up to this ideal at scale faces inherent tensions: broad participation, meaningful deliberation, a...
- Illusions of Confidence? Diagnosing LLM Truthfulness via Neighborhood Consistency : Abstract: As Large Language Models (LLMs) are increasingly deployed in real-world settings, correctness alone is insufficient. Reliable deployment requires maintaining truthful beliefs under contextua...
- Auditing Fairness under Model Updates: Fundamental Complexity and Property-Preserving Updates : Abstract: As machine learning models become increasingly embedded in societal infrastructure, auditing them for bias is of growing importance. However, in real-world deployments, auditing is complicat...
- Agentic LLMs as Powerful Deanonymizers: Re-identification of Participants in the Anthropic Interviewer Dataset : Abstract: On December 4, 2025, Anthropic released Anthropic Interviewer, an AI tool for running qualitative interviews at scale, along with a public dataset of 1,250 interviews with professionals, inc...
- Cedalion Tutorial: A Python-based framework for comprehensive analysis of multimodal fNIRS & DOT from the lab to the everyday world : Abstract: Functional near-infrared spectroscopy (fNIRS) and diffuse optical tomography (DOT) are rapidly evolving toward wearable, multimodal, and data-driven, AI-supported neuroimaging in the everyda...
- Can We Predict Before Executing Machine Learning Agents? : Abstract: Autonomous machine learning agents have revolutionized scientific discovery, yet they remain constrained by a Generate-Execute-Feedback paradigm. Previous approaches suffer from a severe Exe...
- Performance of a Deep Learning-Based Segmentation Model for Pancreatic Tumors on Public Endoscopic Ultrasound Datasets : Abstract: Background: Pancreatic cancer is one of the most aggressive cancers, with poor survival rates. Endoscopic ultrasound (EUS) is a key diagnostic modality, but its effectiveness is constrained ...
- VideoAR: Autoregressive Video Generation via Next-Frame & Scale Prediction : Abstract: Recent advances in video generation have been dominated by diffusion and flow-matching models, which produce high-quality results but remain computationally intensive and difficult to scale....
- The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning : Abstract: Large language models (LLMs) often fail to learn effective long chain-of-thought (Long CoT) reasoning from human or non-Long-CoT LLMs imitation. To understand this, we propose that effective...
- AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs : Abstract: Large language models (LLMs) exhibit complementary strengths arising from differences in pretraining data, model architectures, and decoding behaviors. Inference-time ensembling provides a p...
- MIPO: Mutual Integration of Patient Journey and Medical Ontology for Healthcare Representation Learning : Abstract: Representation learning on electronic health records (EHRs) plays a vital role in downstream medical prediction tasks. Although natural language processing techniques, such as recurrent neur...
- Symbolic Planning and Multi-Agent Path Finding in Extremely Dense Environments with Unassigned Agents : Abstract: We introduce the Block Rearrangement Problem (BRaP), a challenging component of large warehouse management which involves rearranging storage blocks within dense grids to achieve a goal stat...
- Darth Vecdor: An Open-Source System for Generating Knowledge Graphs Through Large Language Model Queries : Abstract: Many large language models (LLMs) are trained on a massive body of knowledge present on the Internet. Darth Vecdor (DV) was designed to extract this knowledge into a structured, terminology-...
- Adversarial Yet Cooperative: Multi-Perspective Reasoning in Retrieved-Augmented Language Models : Abstract: Recent advances in synergizing large reasoning models (LRMs) with retrieval-augmented generation (RAG) have shown promising results, yet two critical challenges remain: (1) reasoning models ...
- AlgBench: To What Extent Do Large Reasoning Models Understand Algorithms? : Abstract: Reasoning ability has become a central focus in the advancement of Large Reasoning Models (LRMs). Although notable progress has been achieved on several reasoning benchmarks such as MATH500 ...
- How to Set the Batch Size for Large-Scale Pre-training? : Abstract: The concept of Critical Batch Size, as pioneered by OpenAI, has long served as a foundational principle for large-scale pre-training. However, with the paradigm shift towards the Warmup-Stab...
- Large language models can effectively convince people to believe conspiracies : Abstract: Large language models (LLMs) have been shown to be persuasive across a variety of contexts. But it remains unclear whether this persuasive power advantages truth over falsehood, or if LLMs c...
- MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents : Abstract: We present MineNPC-Task, a user-authored benchmark and evaluation harness for testing memory-aware, mixed-initiative LLM agents in open-world Minecraft. Rather than relying on synthetic prom...
- An Evaluation on Large Language Model Outputs: Discourse and Memorization : Abstract: We present an empirical evaluation of various outputs generated by nine of the most widely-available large language models (LLMs). Our analysis is done with off-the-shelf, readily-available ...
- ART: Adaptive Reasoning Trees for Explainable Claim Verification : Abstract: Large Language Models (LLMs) are powerful candidates for complex decision-making, leveraging vast encoded knowledge and remarkable zero-shot abilities. However, their adoption in high-stakes...
- PRISMA: Reinforcement Learning Guided Two-Stage Policy Optimization in Multi-Agent Architecture for Open-Domain Multi-Hop Question Answering : Abstract: Answering real-world open-domain multi-hop questions over massive corpora is a critical challenge in Retrieval-Augmented Generation (RAG) systems. Recent research employs reinforcement learn...
- MMUEChange: A Generalized LLM Agent Framework for Intelligent Multi-Modal Urban Environment Change Analysis : Abstract: Understanding urban environment change is essential for sustainable development. However, current approaches, particularly remote sensing change detection, often rely on rigid, single-modal ...
- The Evaluation Gap in Medicine, AI and LLMs: Navigating Elusive Ground Truth & Uncertainty via a Probabilistic Paradigm : Abstract: Benchmarking the relative capabilities of AI systems, including Large Language Models (LLMs) and Vision Models, typically ignores the impact of uncertainty in the underlying ground truth ans...
- Explainable AI: Learning from the Learners : Abstract: Artificial intelligence now outperforms humans in several scientific and engineering tasks, yet its internal representations often remain opaque. In this Perspective, we argue that explainab...
- Safety Not Found (404): Hidden Risks of LLM-Based Robotics Decision Making : Abstract: One mistake by an AI system in a safety-critical setting can cost lives. As Large Language Models (LLMs) become integral to robotics decision-making, the physical dimension of risk grows; a ...
- WildSci: Advancing Scientific Reasoning from In-the-Wild Literature : Abstract: Recent progress in large language model (LLM) reasoning has focused on domains like mathematics and coding, where abundant high-quality data and objective evaluation metrics are readily avai...
- Crisis-Bench: Benchmarking Strategic Ambiguity and Reputation Management in Large Language Models : Abstract: Standard safety alignment optimizes Large Language Models (LLMs) for universal helpfulness and honesty, effectively instilling a rigid "Boy Scout" morality. While robust for general-purpose ...
- Reinforcement Learning of Large Language Models for Interpretable Credit Card Fraud Detection : Abstract: E-commerce platforms and payment solution providers face increasingly sophisticated fraud schemes, ranging from identity theft and account takeovers to complex money laundering operations th...
- A Causal Information-Flow Framework for Unbiased Learning-to-Rank : Abstract: In web search and recommendation systems, user clicks are widely used to train ranking models. However, click data is heavily biased, i.e., users tend to click higher-ranked items (position ...
- Cumulative Path-Level Semantic Reasoning for Inductive Knowledge Graph Completion : Abstract: Conventional Knowledge Graph Completion (KGC) methods aim to infer missing information in incomplete Knowledge Graphs (KGs) by leveraging existing information, which struggle to perform effe...
- GenCtrl -- A Formal Controllability Toolkit for Generative Models : Abstract: As generative models become ubiquitous, there is a critical need for fine-grained control over the generation process. Yet, while controlled generation methods from prompting to fine-tuning ...
- HAG: Hierarchical Demographic Tree-based Agent Generation for Topic-Adaptive Simulation : Abstract: High-fidelity agent initialization is crucial for credible Agent-Based Modeling across diverse domains. A robust framework should be Topic-Adaptive, capturing macro-level joint distributions...
- CHDP: Cooperative Hybrid Diffusion Policies for Reinforcement Learning in Parameterized Action Space : Abstract: Hybrid action space, which combines discrete choices and continuous parameters, is prevalent in domains such as robot control and game AI. However, efficiently modeling and optimizing hybrid...
- Circular Reasoning: Understanding Self-Reinforcing Loops in Large Reasoning Models : Abstract: Despite the success of test-time scaling, Large Reasoning Models (LRMs) frequently encounter repetitive loops that lead to computational waste and inference failure. In this paper, we identi...
- Logic-Parametric Neuro-Symbolic NLI: Controlling Logical Formalisms for Verifiable LLM Reasoning : Abstract: Large language models (LLMs) and theorem provers (TPs) can be effectively combined for verifiable natural language inference (NLI). However, existing approaches rely on a fixed logical forma...
- Overcoming Joint Intractability with Lossless Hierarchical Speculative Decoding : Abstract: Verification is a key bottleneck in improving inference speed while maintaining distribution fidelity in Speculative Decoding. Recent work has shown that sequence-level verification leads to...
- PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility : Abstract: Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy a...
- DynaDebate: Breaking Homogeneity in Multi-Agent Debate with Dynamic Path Generation : Abstract: Recent years have witnessed the rapid development of Large Language Model-based Multi-Agent Systems (MAS), which excel at collaborative decision-making and complex problem-solving. Recently,...
- From Off-Policy to On-Policy: Enhancing GUI Agents via Bi-level Expert-to-Policy Assimilation : Abstract: Vision-language models are increasingly deployed as computer-use agents (CUAs) that operate desktops and browsers. Top-performing CUAs are framework-based systems that decompose planning and...
- StackPlanner: A Centralized Hierarchical Multi-Agent System with Task-Experience Memory Management : Abstract: Multi-agent systems based on large language models, particularly centralized architectures, have recently shown strong potential for complex and knowledge-intensive tasks. However, central a...
- TowerMind: A Tower Defence Game Learning Environment and Benchmark for LLM as Agents : Abstract: Recent breakthroughs in Large Language Models (LLMs) have positioned them as a promising paradigm for agents, with long-term planning and decision-making emerging as core general-purpose cap...
- Open-Vocabulary 3D Instruction Ambiguity Detection : Abstract: In safety-critical domains, linguistic ambiguity can have severe consequences; a vague command like "Pass me the vial" in a surgical setting could lead to catastrophic errors. Yet, most embo...
- EvoC2Rust: A Skeleton-guided Framework for Project-Level C-to-Rust Translation : Abstract: Translating legacy C codebases to Rust is increasingly demanded for building safety-critical systems. While various approaches have emerged for this task, they face inherent trade-offs: rule...
- SP-Rank: A Dataset for Ranked Preferences with Secondary Information : Abstract: We introduce $\mathbf{SP-Rank}$, the first large-scale, publicly available dataset for benchmarking algorithms that leverage both first-order preferences and second-order predictions in rank...
- KP-Agent: Keyword Pruning in Sponsored Search Advertising via LLM-Powered Contextual Bandits : Abstract: Sponsored search advertising (SSA) requires advertisers to constantly adjust keyword strategies. While bid adjustment and keyword generation are well-studied, keyword pruning-refining keywor...
- From Events to Trending: A Multi-Stage Hotspots Detection Method Based on Generative Query Indexing : Abstract: LLM-based conversational systems have become a popular gateway for information access, yet most existing chatbots struggle to handle news-related trending queries effectively. To improve use...
- Quantifying Document Impact in RAG-LLMs : Abstract: Retrieval Augmented Generation (RAG) enhances Large Language Models (LLMs) by connecting them to external knowledge, improving accuracy and reducing outdated information. However, this intro...
- LLM2IR: simple unsupervised contrastive learning makes long-context LLM great retriever : Abstract: Modern dense information retrieval (IR) models usually rely on costly large-scale pretraining. In this paper, we introduce LLM2IR, an efficient unsupervised contrastive learning framework to...
- Engineering the RAG Stack: A Comprehensive Review of the Architecture and Trust Frameworks for Retrieval-Augmented Generation Systems : Abstract: This article provides a comprehensive systematic literature review of academic studies, industrial applications, and real-world deployments from 2018 to 2025, providing a practical guide and...
- Cross-Document Topic-Aligned Chunking for Retrieval-Augmented Generation : Abstract: Chunking quality determines RAG system performance. Current methods partition documents individually, but complex queries need information scattered across multiple sources: the knowledge fr...
- Retrieval-Augmented Multi-LLM Ensemble for Industrial Part Specification Extraction : Abstract: Industrial part specification extraction from unstructured text remains a persistent challenge in manufacturing, procurement, and maintenance, where manual processing is both time-consuming ...
- LiveVectorLake: A Real-Time Versioned Knowledge Base Architecture for Streaming Vector Updates and Temporal Retrieval : Abstract: Modern Retrieval-Augmented Generation (RAG) systems struggle with a fundamental architectural tension: vector indices are optimized for query latency but poorly handle continuous knowledge u...
- Bayesian Recovery for Probabilistic Coalition Structures : Abstract: Probabilistic Coalition Structure Generation (PCSG) is NP-hard and can be recast as an $l_0$-type sparse recovery problem by representing coalition structures as sparse coefficient vectors o...
- Evolving Cognitive Architectures : Abstract: This article proposes a research and development direction that would lead to the creation of next-generation intelligent technical systems. A distinctive feature of these systems is their a...
- Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles : Abstract: Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computat...
- On the Limits of Self-Improving in LLMs and Why AGI, ASI and the Singularity Are Not Near Without Symbolic Model Synthesis : Abstract: We formalise recursive self-training in Large Language Models (LLMs) and Generative AI as a discrete-time dynamical system and prove that, as training data become increasingly self-generated...
- A Survey of Agentic AI and Cybersecurity: Challenges, Opportunities and Use-case Prototypes : Abstract: Agentic AI marks an important transition from single-step generative models to systems capable of reasoning, planning, acting, and adapting over long-lasting tasks. By integrating memory, to...
- MoEBlaze: Breaking the Memory Wall for Efficient MoE Training on Modern GPUs : Abstract: The pervasive "memory wall" bottleneck is significantly amplified in modern large-scale Mixture-of-Experts (MoE) architectures. MoE's inherent architectural sparsity leads to sparse arithmet...
- Bi-Orthogonal Factor Decomposition for Vision Transformers : Abstract: Self-attention is the central computational primitive of Vision Transformers, yet we lack a principled understanding of what information attention mechanisms exchange between tokens. Attenti...
- Multi-turn Jailbreaking Attack in Multi-Modal Large Language Models : Abstract: In recent years, the security vulnerabilities of Multi-modal Large Language Models (MLLMs) have become a serious concern in the Generative Artificial Intelligence (GenAI) research. These hig...
- A Bayesian Generative Modeling Approach for Arbitrary Conditional Inference : Abstract: Modern data analysis increasingly requires flexible conditional inference P(X_B | X_A) where (X_A, X_B) is an arbitrary partition of observed variable X. Existing conditional inference metho...
- PRISM: Protocol Refinement through Intelligent Simulation Modeling : Abstract: Automating experimental protocol design and execution remains as a fundamental bottleneck in realizing self-driving laboratories. We introduce PRISM (Protocol Refinement through Intelligent ...
- STResNet & STYOLO : A New Family of Compact Classification and Object Detection Models for MCUs : Abstract: Recent advancements in lightweight neural networks have significantly improved the efficiency of deploying deep learning models on edge hardware. However, most existing architectures still t...
- Lost in Execution: On the Multilingual Robustness of Tool Calling in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed as agents that invoke external tools through structured function calls. While recent work reports strong tool-calling performance under...
- Ensemble of radiomics and ConvNeXt for breast cancer diagnosis : Abstract: Early diagnosis of breast cancer is crucial for improving survival rates. Radiomics and deep learning (DL) have shown significant potential in assisting radiologists with early cancer detect...
- Multi-task Cross-modal Learning for Chest X-ray Image Retrieval : Abstract: CLIP and BiomedCLIP are examples of vision-language foundation models and offer strong cross-modal embeddings; however, they are not optimized for fine-grained medical retrieval tasks, such ...
- Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization : Abstract: The image geolocalization task aims to predict the location where an image was taken anywhere on Earth using visual clues. Existing large vision-language model (LVLM) approaches leverage wor...
- Tracing Moral Foundations in Large Language Models : Abstract: Large language models (LLMs) often produce human-like moral judgments, but it is unclear whether this reflects an internal conceptual structure or superficial ``moral mimicry.'' Using Moral ...
- Do LLMs Need Inherent Reasoning Before Reinforcement Learning? A Study in Korean Self-Correction : Abstract: Large Language Models (LLMs) demonstrate strong reasoning and self-correction abilities in high-resource languages like English, but their performance remains limited in low-resource languag...
- Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning : Abstract: Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful respon...
- STELP: Secure Transpilation and Execution of LLM-Generated Programs : Abstract: Rapid evolution of Large Language Models (LLMs) has achieved major advances in reasoning, planning, and function-calling capabilities. Multi-agentic collaborative frameworks using such LLMs ...
- Efficient Differentiable Causal Discovery via Reliable Super-Structure Learning : Abstract: Recently, differentiable causal discovery has emerged as a promising approach to improve the accuracy and efficiency of existing methods. However, when applied to high-dimensional data or da...
- Prompt-Free SAM-Based Multi-Task Framework for Breast Ultrasound Lesion Segmentation and Classification : Abstract: Accurate tumor segmentation and classification in breast ultrasound (BUS) imaging remain challenging due to low contrast, speckle noise, and diverse lesion morphology. This study presents a ...
- Evaluating the Use of LLMs for Automated DOM-Level Resolution of Web Performance Issues : Abstract: Users demand fast, seamless webpage experiences, yet developers often struggle to meet these expectations within tight constraints. Performance optimization, while critical, is a time-consum...
- Over-Searching in Search-Augmented Large Language Models : Abstract: Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search -- unnecessarily invoking search tool even...
- DeMa: Dual-Path Delay-Aware Mamba for Efficient Multivariate Time Series Analysis : Abstract: Accurate and efficient multivariate time series (MTS) analysis is increasingly critical for a wide range of intelligent applications. Within this realm, Transformers have emerged as the pred...
- Scalable Heterogeneous Graph Learning via Heterogeneous-aware Orthogonal Prototype Experts : Abstract: Heterogeneous Graph Neural Networks(HGNNs) have advanced mainly through better encoders, yet their decoding/projection stage still relies on a single shared linear head, assuming it can map ...
- Understanding LLM-Driven Test Oracle Generation : Abstract: Automated unit test generation aims to improve software quality while reducing the time and effort required for creating tests manually. However, existing techniques primarily generate regre...
- VIB-Probe: Detecting and Mitigating Hallucinations in Vision-Language Models via Variational Information Bottleneck : Abstract: Vision-Language Models (VLMs) have demonstrated remarkable progress in multimodal tasks, but remain susceptible to hallucinations, where generated text deviates from the underlying visual co...
- Semi-Supervised Facial Expression Recognition based on Dynamic Threshold and Negative Learning : Abstract: Facial expression recognition is a key task in human-computer interaction and affective computing. However, acquiring a large amount of labeled facial expression data is often costly. Theref...
- ReasonAny: Incorporating Reasoning Capability to Any Model via Simple and Effective Model Merging : Abstract: Large Reasoning Models (LRMs) with long chain-of-thought reasoning have recently achieved remarkable success. Yet, equipping domain-specialized models with such reasoning capabilities, refer...
- RISE: Rule-Driven SQL Dialect Translation via Query Reduction : Abstract: Translating SQL dialects across different relational database management systems (RDBMSs) is crucial for migrating RDBMS-based applications to the cloud. Traditional SQL dialect translation ...
- GS-DMSR: Dynamic Sensitive Multi-scale Manifold Enhancement for Accelerated High-Quality 3D Gaussian Splatting : Abstract: In the field of 3D dynamic scene reconstruction, how to balance model convergence rate and rendering quality has long been a critical challenge that urgently needs to be addressed, particula...
- Naiad: Novel Agentic Intelligent Autonomous System for Inland Water Monitoring : Abstract: Inland water monitoring is vital for safeguarding public health and ecosystems, enabling timely interventions to mitigate risks. Existing methods often address isolated sub-problems such as ...
- Mathematical Knowledge Graph-Driven Framework for Equation-Based Predictive and Reliable Additive Manufacturing : Abstract: Additive manufacturing (AM) relies critically on understanding and extrapolating process-property relationships; however, existing data-driven approaches remain limited by fragmented knowled...
- Effects of personality steering on cooperative behavior in Large Language Model agents : Abstract: Large language models (LLMs) are increasingly used as autonomous agents in strategic and social interactions. Although recent studies suggest that assigning personality traits to LLMs can in...
- Improving Enzyme Prediction with Chemical Reaction Equations by Hypergraph-Enhanced Knowledge Graph Embeddings : Abstract: Predicting enzyme-substrate interactions has long been a fundamental problem in biochemistry and metabolic engineering. While existing methods could leverage databases of expert-curated enzy...
- The Persona Paradox: Medical Personas as Behavioral Priors in Clinical Language Models : Abstract: Persona conditioning can be viewed as a behavioral prior for large language models (LLMs) and is often assumed to confer expertise and improve safety in a monotonic manner. However, its effe...
- Conformity and Social Impact on AI Agents : Abstract: As AI agents increasingly operate in multi-agent environments, understanding their collective behavior becomes critical for predicting the dynamics of artificial societies. This study examin...
- On the Effect of Cheating in Chess : Abstract: Cheating in chess, by using advice from powerful software, has become a major problem, reaching the highest levels. As opposed to the large majority of previous work, which concerned {\em de...
Research Sources: 284 | Generated: 1/15/2026
