AI Research News Feeds for November 14th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Self-Supervised Training For Low Dose CT Reconstruction : Abstract: Ionizing radiation has been the biggest concern in CT imaging. To reduce the dose level without compromising the image quality, low-dose CT reconstruction has been offered with the availabil...
Generating Physically Stable and Buildable Brick Structures from Text : Abstract: We introduce BrickGPT, the first approach for generating physically stable interconnecting brick assembly models from text prompts. To achieve this, we construct a large-scale, physically st...
STATIC : Surface Temporal Affine for TIme Consistency in Video Monocular Depth Estimation : Abstract: Video monocular depth estimation is essential for applications such as autonomous driving, AR/VR, and robotics. Recent transformer-based single-image monocular depth estimation models perfor...
Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment : Abstract: Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LM...
Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame : Abstract: Video diffusion models have achieved impressive results in natural scene generation, yet they struggle to generalize to scientific phenomena such as fluid simulations and meteorological proc...
Trapped by Their Own Light: Deployable and Stealth Retroreflective Patch Attacks on Traffic Sign Recognition Systems : Abstract: Traffic sign recognition plays a critical role in ensuring safe and efficient transportation of autonomous vehicles but remain vulnerable to adversarial attacks using stickers or laser proje...
Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling : Abstract: Outcome-reward reinforcement learning (RL) is a common and increasingly significant way to refine the step-by-step reasoning of multimodal large language models (MLLMs). In the multiple-choi...
Depth Anything 3: Recovering the Visual Space from Any Views : Abstract: We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal mo...
One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models : Abstract: Diffusion models struggle to scale beyond their training resolutions, as direct high-resolution sampling is slow and costly, while post-hoc image super-resolution (ISR) introduces artifacts ...
From 2D to 3D Without Extra Baggage: Data-Efficient Cancer Detection in Digital Breast Tomosynthesis : Abstract: Digital Breast Tomosynthesis (DBT) enhances finding visibility for breast cancer detection by providing volumetric information that reduces the impact of overlapping tissues; however, limite...
OmniVGGT: Omni-Modality Driven Visual Geometry Grounded : Abstract: General 3D foundation models have started to lead the trend of unifying diverse vision tasks, yet most assume RGB-only inputs and ignore readily available geometric cues (e.g., camera intrin...
Dynamic Avatar-Scene Rendering from Human-centric Context : Abstract: Reconstructing dynamic humans interacting with real-world environments from monocular videos is an important and challenging task. Despite considerable progress in 4D neural rendering, exist...
SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation : Abstract: Vision-Language-Action (VLA) models have advanced in robotic manipulation, yet practical deployment remains hindered by two key limitations: 1) perceptual redundancy, where irrelevant visual...
Learnable Total Variation with Lambda Mapping for Low-Dose CT Denoising : Abstract: Although Total Variation (TV) performs well in noise reduction and edge preservation on images, its dependence on the lambda parameter limits its efficiency and makes it difficult to use eff...
SPOT: Sparsification with Attention Dynamics via Token Relevance in Vision Transformers : Abstract: While Vision Transformers (ViT) have demonstrated remarkable performance across diverse tasks, their computational demands are substantial, scaling quadratically with the number of processed...
Histology-informed tiling of whole tissue sections improves the interpretability and predictability of cancer relapse and genetic alterations : Abstract: Histopathologists establish cancer grade by assessing histological structures, such as glands in prostate cancer. Yet, digital pathology pipelines often rely on grid-based tiling that ignore...
RodEpil: A Video Dataset of Laboratory Rodents for Seizure Detection and Benchmark Evaluation : Abstract: We introduce a curated video dataset of laboratory rodents for automatic detection of convulsive events. The dataset contains short (10~s) top-down and side-view video clips of individual ro...
3DFETUS: Standardizing Fetal Facial Planes in 3D Ultrasound : Abstract: Acquiring standard facial planes during routine fetal ultrasound (US) examinations is often challenging due to fetal movement, variability in orientation, and operator-dependent expertise. T...
LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components : Abstract: The health condition of wind turbine (WT) components is crucial for ensuring stable and reliable operation. However, existing fault detection methods are largely limited to visual recognitio...
GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models : Abstract: Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typica...
SAMIRO: Spatial Attention Mutual Information Regularization with a Pre-trained Model as Oracle for Lane Detection : Abstract: Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant o...
Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation : Abstract: Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial ide...
MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation : Abstract: Embodied navigation is a fundamental capability for robotic agents operating. Real-world deployment requires open vocabulary generalization and low training overhead, motivating zero-shot me...
FOUND: Fourier-based von Mises Distribution for Robust Single Domain Generalization in Object Detection : Abstract: Single Domain Generalization (SDG) for object detection aims to train a model on a single source domain that can generalize effectively to unseen target domains. While recent methods like CL...
Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment : Abstract: Recent advancements in weakly-supervised video anomaly detection have achieved remarkable performance by applying the multiple instance learning paradigm based on multimodal foundation model...
CLIP4VI-ReID: Learning Modality-shared Representations via CLIP Semantic Bridge for Visible-Infrared Person Re-identification : Abstract: This paper proposes a novel CLIP-driven modality-shared representation learning network named CLIP4VI-ReID for VI-ReID task, which consists of Text Semantic Generation (TSG), Infrared Featur...
Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts : Abstract: Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major...
PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning : Abstract: Despite significant progress, Vision-Language Models (VLMs) still struggle with complex visual reasoning, where multi-step dependencies cause early errors to cascade through the reasoning ch...
Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis : Abstract: Facial Emotion Analysis (FEA) extends traditional facial emotion recognition by incorporating explainable, fine-grained reasoning. The task integrates three subtasks: emotion recognition, fa...
TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding : Abstract: Spatio-Temporal Video Grounding (STVG) aims to localize a spatio-temporal tube that corresponds to a given language query in an untrimmed video. This is a challenging task since it involves ...
Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization : Abstract: Recent multimodal deepfake detection methods designed for generalization conjecture that single-stage supervised training struggles to generalize across unseen manipulations and datasets. Ho...
HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction : Abstract: Vehicle-to-Everything (V2X) collaborative perception extends sensing beyond single vehicle limits through transmission. However, as more agents participate, existing frameworks face two key ...
LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures : Abstract: 3D LiDAR scene completion from point clouds is a fundamental component of perception systems in autonomous vehicles. Previous methods have predominantly employed diffusion models for high-fi...
CephRes-MHNet: A Multi-Head Residual Network for Accurate and Robust Cephalometric Landmark Detection : Abstract: Accurate localization of cephalometric landmarks from 2D lateral skull X-rays is vital for orthodontic diagnosis and treatment. Manual annotation is time-consuming and error-prone, whereas a...
Physically Interpretable Multi-Degradation Image Restoration via Deep Unfolding and Explainable Convolution : Abstract: Although image restoration has advanced significantly, most existing methods target only a single type of degradation. In real-world scenarios, images often contain multiple degradations sim...
Decoupling Bias, Aligning Distributions: Synergistic Fairness Optimization for Deepfake Detection : Abstract: Fairness is a core element in the trustworthy deployment of deepfake detection models, especially in the field of digital identity security. Biases in detection models toward different demog...
Split-Layer: Enhancing Implicit Neural Representation by Maximizing the Dimensionality of Feature Space : Abstract: Implicit neural representation (INR) models signals as continuous functions using neural networks, offering efficient and differentiable optimization for inverse problems across diverse disc...
Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction : Abstract: Dense video captioning jointly localizes and captions salient events in untrimmed videos. Recent methods primarily focus on leveraging additional prior knowledge and advanced multi-task arch...
RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo : Abstract: Stereo Depth Estimation in real-world environments poses significant challenges due to dynamic domain shifts, sparse or unreliable supervision, and the high cost of acquiring dense ground-tr...
MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models : Abstract: Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and ins...
SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition : Abstract: Large Language Models (LLMs) hold rich implicit knowledge and powerful transferability. In this paper, we explore the combination of LLMs with the human skeleton to perform action classifica...
GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs : Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities in a wide range of vision-language tasks. However, the large number of visual tokens introduces significant comput...
Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints : Abstract: Reliable co-speech motion generation requires precise motion representation and consistent structural priors across all joints. Existing generative methods typically operate on local joint r...
VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System : Abstract: We propose Vision-Language Feature-based Multimodal Semantic Communication (VLF-MSC), a unified system that transmits a single compact vision-language representation to support both image an...
Perceive, Act and Correct: Confidence Is Not Enough for Hyperspectral Classification : Abstract: Confidence alone is often misleading in hyperspectral image classification, as models tend to mistake high predictive scores for correctness while lacking awareness of uncertainty. This lead...
When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? : Abstract: Can Multimodal Large Language Models (MLLMs) discern confused objects that are visually present but audio-absent? To study this, we introduce a new benchmark, AV-ConfuseBench, which simulate...
Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance : Abstract: The performance of image generation has been significantly improved in recent years. However, the study of image screening is rare and its performance with Multimodal Large Language Models (...
MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples : Abstract: Zero-shot anomaly classification (AC) and segmentation (AS) methods aim to identify and outline defects without using any labeled samples. In this paper, we reveal a key property that is ove...
FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection : Abstract: Visible-infrared object detection has gained sufficient attention due to its detection performance in low light, fog, and rain conditions. However, visible and infrared modalities captured b...
LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning : Abstract: Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while ...
DGFusion: Dual-guided Fusion for Robust Multi-Modal 3D Object Detection : Abstract: As a critical task in autonomous driving perception systems, 3D object detection is used to identify and track key objects, such as vehicles and pedestrians. However, detecting distant, smal...
AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models : Abstract: Effective human-agent collaboration in physical environments requires understanding not only what to act upon, but also where the actionable elements are and how to interact with them. Exist...
LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers : Abstract: How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation deman...
DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Instance Segmentation : Abstract: Weakly supervised 3D instance segmentation is essential for 3D scene understanding, especially as the growing scale of data and high annotation costs associated with fully supervised approac...
MOBA: A Material-Oriented Backdoor Attack against LiDAR-based 3D Object Detection Systems : Abstract: LiDAR-based 3D object detection is widely used in safety-critical systems. However, these systems remain vulnerable to backdoor attacks that embed hidden malicious behaviors during training....
STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data : Abstract: Scene Text Editing (STE) is the task of modifying text content in an image while preserving its visual style, such as font, color, and background. While recent diffusion-based approaches hav...
Equivariant Sampling for Improving Diffusion Model-based Image Restoration : Abstract: Recent advances in generative models, especially diffusion models, have significantly improved image restoration (IR) performance. However, existing problem-agnostic diffusion model-based im...
Robust Object Detection with Pseudo Labels from VLMs using Per-Object Co-teaching : Abstract: Foundation models, especially vision-language models (VLMs), offer compelling zero-shot object detection for applications like autonomous driving, a domain where manual labelling is prohibit...
TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting offers a strong speed-quality trade-off but struggles to reconstruct semi-transparent surfaces because most methods assume a single depth per pixel, which fails when mu...
Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification : Abstract: Person re-identification (ReID) is a fundamental task in many real-world applications such as pedestrian trajectory tracking. However, advanced deep learning-based ReID models are highly sus...
MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding : Abstract: Despite the rapid progress of Vision-Language Models (VLMs), their capabilities are inadequately assessed by existing benchmarks, which are predominantly English-centric, feature simplistic ...
Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection : Abstract: In this paper, we focus on Single-Domain Generalized Object Detection (Single-DGOD), aiming to transfer a detector trained on one source domain to multiple unknown domains. Existing methods ...
HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models : Abstract: 3D understanding has drawn significant attention recently, leveraging Vision-Language Models (VLMs) to enable multi-modal reasoning between point cloud and text data. Current 3D-VLMs directl...
RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion : Abstract: Semantic Scene Completion (SSC) aims to generate a complete semantic scene from an incomplete input. Existing approaches often employ dense network architectures with a high parameter count,...
SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection : Abstract: Recently segment anything model (SAM) has attracted widespread concerns, and it is often treated as a vision foundation model for universal segmentation. Some researchers have attempted to d...
Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies : Abstract: Large Vision-Language Models (LVLMs) have achieved impressive performance across a wide range of multimodal tasks. However, they still face critical challenges in modeling long-range depende...
IPCD: Intrinsic Point-Cloud Decomposition : Abstract: Point clouds are widely used in various fields, including augmented reality (AR) and robotics, where relighting and texture editing are crucial for realistic visualization. Achieving these t...
CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena : Abstract: Space weather at Earth, driven by the solar activity, poses growing risks to satellites around our planet as well as to critical ground-based technological infrastructure. Major space weathe...
AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting : Abstract: We present a novel framework for animating humans in 3D scenes using 3D Gaussian Splatting (3DGS), a neural scene representation that has recently achieved state-of-the-art photorealistic re...
Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration : Abstract: Restoring 3D scenes captured under low-light con- ditions remains a fundamental yet challenging problem. Most existing approaches depend on precomputed camera poses and scene-specific optimi...
PANDA - Patch And Distribution-Aware Augmentation for Long-Tailed Exemplar-Free Continual Learning : Abstract: Exemplar-Free Continual Learning (EFCL) restricts the storage of previous task data and is highly susceptible to catastrophic forgetting. While pre-trained models (PTMs) are increasingly lev...
STORM: Segment, Track, and Object Re-Localization from a Single 3D Model : Abstract: Accurate 6D pose estimation and tracking are fundamental capabilities for physical AI systems such as robots. However, existing approaches typically rely on a manually annotated segmentation...
Density Estimation and Crowd Counting : Abstract: This study enhances a crowd density estimation algorithm originally designed for image-based analysis by adapting it for video-based scenarios. The proposed method integrates a denoising pro...
SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control : Abstract: Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply ea...
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation : Abstract: While thinking-aware generation aims to improve performance on complex tasks, we identify a critical failure mode where existing sequential, autoregressive approaches can paradoxically degra...
FedeCouple: Fine-Grained Balancing of Global-Generalization and Local-Adaptability in Federated Learning : Abstract: In privacy-preserving mobile network transmission scenarios with heterogeneous client data, personalized federated learning methods that decouple feature extractors and classifiers have demo...
Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay : Abstract: We show that human players' gameplay in the game of Wordle is influenced by the semantics, orthography, and phonology of the player's previous guesses. We compare actual human players' guess...
MedMobile: A mobile-sized language model with clinical capabilities : Abstract: Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale impleme...
Lessons in co-creation: the inconvenient truths of inclusive sign language technology development : Abstract: In the era of AI-driven language technologies, the participation of deaf communities in sign language technology development, often framed as co-creation, is increasingly emphasized. We pres...
Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework : Abstract: The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threa...
Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals : Abstract: Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for ...
Music Flamingo: Scaling Music Understanding in Audio Language Models : Abstract: We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progr...
Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning : Abstract: Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with ...
ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference : Abstract: Weight-only post-training quantization (PTQ) compresses the weights of Large Language Models (LLMs) into low-precision representations to reduce memory footprint and accelerate inference. Ho...
DESS: DeBERTa Enhanced Syntactic-Semantic Aspect Sentiment Triplet Extraction : Abstract: Fine-grained sentiment analysis faces ongoing challenges in Aspect Sentiment Triple Extraction (ASTE), particularly in accurately capturing the relationships between aspects, opinions, and s...
URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding : Abstract: Recent multimodal large language models (MLLMs) still struggle with long document understanding due to two fundamental challenges: information interference from abundant irrelevant content, ...
Computing the Formal and Institutional Boundaries of Contemporary Genre and Literary Fiction : Abstract: Though the concept of genre has been a subject of discussion for millennia, the relatively recent emergence of genre fiction has added a new layer to this ongoing conversation. While more tr...
Convomem Benchmark: Why Your First 150 Conversations Don't Need RAG : Abstract: We introduce a comprehensive benchmark for conversational memory evaluation containing 75,336 question-answer pairs across diverse categories including user facts, assistant recall, abstenti...
Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following : Abstract: Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-...
Exploring State Tracking Capabilities of Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in solving complex tasks, including those requiring a certain level of reasoning. In this paper, we focus on state trac...
Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction : Abstract: Large language models achieve strong performance through training on vast datasets. Can analogical paradigm organization enable lightweight models to match this performance with minimal data...
DELICATE: Diachronic Entity LInking using Classes And Temporal Evidence : Abstract: In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document ty...
Position: On the Methodological Pitfalls of Evaluating Base LLMs for Reasoning : Abstract: Existing work investigates the reasoning capabilities of large language models (LLMs) to uncover their limitations, human-like biases and underlying processes. Such studies include evaluatio...
TruthfulRAG: Resolving Factual-level Conflicts in Retrieval-Augmented Generation with Knowledge Graphs : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful framework for enhancing the capabilities of Large Language Models (LLMs) by integrating retrieval-based methods with generative...
Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates : Abstract: Cultural Heritage texts contain rich knowledge that is difficult to query systematically due to the challenges of converting unstructured discourse into structured Knowledge Graphs (KGs). Th...
Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning : Abstract: To improve Multi-step Mathematical Reasoning (MsMR) of Large Language Models (LLMs), it is crucial to obtain scalable supervision from the corpus by automatically critiquing mistakes in the ...
Local Hybrid Retrieval-Augmented Document QA : Abstract: Organizations handling sensitive documents face a critical dilemma: adopt cloud-based AI systems that offer powerful question-answering capabilities but compromise data privacy, or maintain ...
LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning : Abstract: Joint multilingual instruction tuning is a widely adopted approach to improve the multilingual instruction-following ability and downstream performance of large language models (LLMs), but t...
EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models : Abstract: Large language models (LLMs) with Chain-of-Thought (CoT) prompting achieve strong reasoning but often produce unnecessarily long explanations, increasing cost and sometimes reducing accuracy...
Text2SQL-Flow: A Robust SQL-Aware Data Augmentation Framework for Text-to-SQL : Abstract: The data-centric paradigm has become pivotal in AI, especially for Text-to-SQL, where performance is limited by scarce, simplistic, and low-diversity datasets. To address this, we propose Te...
Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA : Abstract: Recent research has increasingly focused on the reasoning capabilities of Large Language Models (LLMs) in multi-turn interactions, as these scenarios more closely mirror real-world problem-s...
ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks : Abstract: This paper describes Elyadata \& LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi...
Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts : Abstract: With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core com...
ADI-20: Arabic Dialect Identification dataset and models : Abstract: We present ADI-20, an extension of the previously published ADI-17 Arabic Dialect Identification (ADI) dataset. ADI-20 covers all Arabic-speaking countries' dialects. It comprises 3,556 hour...
GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt : Abstract: Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches...
Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism : Abstract: Sound symbolism is a linguistic concept that refers to non-arbitrary associations between phonetic forms and their meanings. We suggest that this can be a compelling probe into how Multimoda...
ScaleFormer: Span Representation Cumulation for Long-Context Transformer : Abstract: The quadratic complexity of standard self-attention severely limits the application of Transformer-based models to long-context tasks. While efficient Transformer variants exist, they often ...
FinNuE: Exposing the Risks of Using BERTScore for Numerical Semantic Evaluation in Finance : Abstract: BERTScore has become a widely adopted metric for evaluating semantic similarity between natural language sentences. However, we identify a critical limitation: BERTScore exhibits low sensiti...
Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitigation : Abstract: Multilingual Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to perform knowledge-intensive tasks in multilingual settings by leveraging retrieved documents as exte...
Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG : Abstract: Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in ...
NumPert: Numerical Perturbations to Probe Language Models for Veracity Prediction : Abstract: Large language models show strong performance on knowledge intensive tasks such as fact-checking and question answering, yet they often struggle with numerical reasoning. We present a system...
REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering : Abstract: Retrieval-augmented generation (RAG) has been extensively employed to mitigate hallucinations in large language models (LLMs). However, existing methods for multi-hop reasoning tasks often l...
Leveraging Large Language Models for Identifying Knowledge Components : Abstract: Knowledge Components (KCs) are foundational to adaptive learning systems, but their manual identification by domain experts is a significant bottleneck. While Large Language Models (LLMs) of...
MINDS: A Cross-cultural Dialogue Corpus for Social Norm Classification and Adherence Detection : Abstract: Social norms are implicit, culturally grounded expectations that guide interpersonal communication. Unlike factual commonsense, norm reasoning is subjective, context-dependent, and varies ac...
HI-TransPA: Hearing Impairments Translation Personal Assistant : Abstract: To provide a unified and flexible solution for daily communication among hearing-impaired individuals, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, ...
EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models : Abstract: Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code generation, biomedical analysis, and mathematical p...
In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback : Abstract: Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penali...
TermGPT: Multi-Level Contrastive Fine-Tuning for Terminology Adaptation in Legal and Financial Domain : Abstract: Large language models (LLMs) have demonstrated impressive performance in text generation tasks; however, their embedding spaces often suffer from the isotropy problem, resulting in poor disc...
Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM : Abstract: The course forums are increasingly significant and play vital role in facilitating student discussions and answering their questions related to the course. It provides a platform for student...
Improving Graduate Outcomes by Identifying Skills Gaps and Recommending Courses Based on Career Interests : Abstract: This paper aims to address the challenge of selecting relevant courses for students by proposing the design and development of a course recommendation system. The course recommendation syste...
Khmer Spellchecking: A Holistic Approach : Abstract: Compared to English and other high-resource languages, spellchecking for Khmer remains an unresolved problem due to several challenges. First, there are misalignments between words in the le...
TARG: Training-Free Adaptive Retrieval Gating for Efficient RAG : Abstract: Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Ga...
Contextual morphologically-guided tokenization for Latin encoder models : Abstract: Tokenization is a critical component of language model pretraining, yet standard tokenization methods often prioritize information-theoretical goals like high compression and low fertility r...
Order Matters: Rethinking Prompt Construction in In-Context Learning : Abstract: In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples. Most prior work reasonably and intuitively assumes that which examples...
Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages : Abstract: Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expa...
Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals : Abstract: Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc...
Provably Scalable Black-Box Variational Inference with Structured Variational Families : Abstract: Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent comput...
Spectral methods for Neural Integral Equations : Abstract: Neural integral equations are deep learning models based on the theory of integral equations, where the model consists of an integral operator and the corresponding equation (of the second k...
HyperEvent: A Strong Baseline for Dynamic Link Prediction via Relative Structural Encoding : Abstract: Learning representations for continuous-time dynamic graphs is critical for dynamic link prediction. While recent methods have become increasingly complex, the field lacks a strong and infor...
DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs : Abstract: Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation ...
Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language Models : Abstract: Continual learning (CL) enables models to adapt to evolving data streams without catastrophic forgetting, a fundamental requirement for real-world AI systems. However, the current methods of...
Distribution Learning Meets Graph Structure Sampling : Abstract: This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an onlin...
Lipschitz-Regularized Critics Lead to Policy Robustness Against Transition Dynamics Uncertainty : Abstract: Uncertainties in transition dynamics pose a critical challenge in reinforcement learning (RL), often resulting in performance degradation of trained policies when deployed on hardware. Many ...
Effector: A Python package for regional explanations : Abstract: Effector is a Python package for interpreting machine learning (ML) models that are trained on tabular data through global and regional feature effects. Global effects, like Partial Dependen...
Reassessing feature-based Android malware detection in a contemporary context : Abstract: We report the findings of a reimplementation of 18 foundational studies in feature-based machine learning for Android malware detection, published during the period 2013-2023. These studies ...
Transfer in Reinforcement Learning via Regret Bounds for Learning Agents : Abstract: We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents ope...
Robot Crash Course: Learning Soft and Stylized Falling : Abstract: Despite recent advances in robust locomotion, bipedal robots operating in the real world remain at risk of falling. While most research focuses on preventing such events, we instead concentr...
Global Solutions to Non-Convex Functional Constrained Problems with Hidden Convexity : Abstract: Constrained non-convex optimization is fundamentally challenging, as global solutions are generally intractable and constraint qualifications may not hold. However, in many applications, inc...
Multitask GLocal OBIA-Mamba for Sentinel-2 Landcover Mapping : Abstract: Although Sentinel-2 based land use and land cover (LULC) classification is critical for various environmental monitoring applications, it is a very difficult task due to some key data challe...
Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation : Abstract: Despite advances in generation quality, current text-to-image (T2I) models often lack diversity, generating homogeneous outputs. This work introduces a framework to address the need for robu...
Two Americas of Well-Being: Divergent Rural-Urban Patterns of Life Satisfaction and Happiness from 2.6 B Social Media Posts : Abstract: Using 2.6 billion geolocated social-media posts (2014-2022) and a fine-tuned generative language model, we construct county-level indicators of life satisfaction and happiness for the United...
Edge Machine Learning for Cluster Counting in Next-Generation Drift Chambers : Abstract: Drift chambers have long been central to collider tracking, but future machines like a Higgs factory motivate higher granularity and cluster counting for particle ID, posing new data process...
Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding : Abstract: Optimizing recommender systems for objectives beyond accuracy, such as diversity, novelty, and personalization, is crucial for long-term user satisfaction. To this end, industrial practition...
OpenSR-SRGAN: A Flexible Super-Resolution Framework for Multispectral Earth Observation Data : Abstract: We present OpenSR-SRGAN, an open and modular framework for single-image super-resolution in Earth Observation. The software provides a unified implementation of SRGAN-style models that is ea...
Continuum Dropout for Neural Differential Equations : Abstract: Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their adva...
Physics informed Transformer-VAE for biophysical parameter estimation: PROSAIL model inversion in Sentinel-2 imagery : Abstract: Accurate retrieval of vegetation biophysical variables from satellite imagery is crucial for ecosystem monitoring and agricultural management. In this work, we propose a physics-informed Tra...
Operator Models for Continuous-Time Offline Reinforcement Learning : Abstract: Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often...
Revisiting Evaluation of Deep Neural Networks for Pedestrian Detection : Abstract: Reliable pedestrian detection represents a crucial step towards automated driving systems. However, the current performance benchmarks exhibit weaknesses. The currently applied metrics for v...
Fault Detection in Solar Thermal Systems using Probabilistic Reconstructions : Abstract: Solar thermal systems (STS) present a promising avenue for low-carbon heat generation, with a well-running system providing heat at minimal cost and carbon emissions. However, STS can exhibi...
Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access : Abstract: Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hind...
Generalizing to Unseen Disaster Events: A Causal View : Abstract: Due to the rapid growth of social media platforms, these tools have become essential for monitoring information during ongoing disaster events. However, extracting valuable insights requires...
Physics-informed Machine Learning for Static Friction Modeling in Robotic Manipulators Based on Kolmogorov-Arnold Networks : Abstract: Friction modeling plays a crucial role in achieving high-precision motion control in robotic operating systems. Traditional static friction models (such as the Stribeck model) are widely use...
Multi-agent In-context Coordination via Decentralized Memory Retrieval : Abstract: Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has a...
Global Convergence of Four-Layer Matrix Factorization under Random Initialization : Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-...
Beyond empirical models: Discovering new constitutive laws in solids with graph-based equation discovery : Abstract: Constitutive models are fundamental to solid mechanics and materials science, underpinning the quantitative description and prediction of material responses under diverse loading conditions....
Theory and computation for structured variational inference : Abstract: Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have inte...
HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning : Abstract: Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs, limiting their deployment in resource-constrained or real-...
Generalized infinite dimensional Alpha-Procrustes based geometries : Abstract: This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wass...
A Robust Task-Level Control Architecture for Learned Dynamical Systems : Abstract: Dynamical system (DS)-based learning from demonstration (LfD) is a powerful tool for generating motion plans in the operation (`task') space of robotic systems. However, the realization of t...
Symmetry aware Reynolds Averaged Navier Stokes turbulence models with equivariant neural networks : Abstract: Accurate and generalizable Reynolds-averaged Navier-Stokes (RANS) models for turbulent flows rely on effective closures. We introduce tensor-based, symmetry aware closures using equivariant ...
Modelos Empiricos de Pos-Dupla Selecao por LASSO: Discussoes para Estudos do Transporte Aereo : Abstract: This paper presents and discusses forms of estimation by regularized regression and model selection using the LASSO method - Least Absolute Shrinkage and Selection Operator. LASSO is recogni...
Gradient-Guided Exploration of Generative Model's Latent Space for Controlled Iris Image Augmentations : Abstract: Developing reliable iris recognition and presentation attack detection methods requires diverse datasets that capture realistic variations in iris features and a wide spectrum of anomalies. ...
Assessing the Applicability of Natural Language Processing to Traditional Social Science Methodology: A Case Study in Identifying Strategic Signaling Patterns in Presidential Directives : Abstract: Our research investigates how Natural Language Processing (NLP) can be used to extract main topics from a larger corpus of written data, as applied to the case of identifying signaling theme...
A Fourier-Based Global Denoising Model for Smart Artifacts Removing of Microscopy Images : Abstract: Microscopy such as Scanning Tunneling Microscopy (STM), Atomic Force Microscopy (AFM) and Scanning Electron Microscopy (SEM) are essential tools in material imaging at micro- and nanoscale r...
The Data Fusion Labeler (dFL): Challenges and Solutions to Data Harmonization, Labeling, and Provenance in Fusion Energy : Abstract: Fusion energy research increasingly depends on the ability to integrate heterogeneous, multimodal datasets from high-resolution diagnostics, control systems, and multiscale simulations. The ...
Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling : Abstract: Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspir...
Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression : Abstract: Phonotrauma refers to vocal fold tissue damage resulting from exposure to forces during voicing. It occurs on a continuum from mild to severe, and treatment options can vary based on severit...
PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild : Abstract: Non-human primates are our closest living relatives, and analyzing their behavior is central to research in cognition, evolution, and conservation. Computer vision could greatly aid this res...
Lithological Controls on the Permeability of Geologic Faults: Surrogate Modeling and Sensitivity Analysis : Abstract: Fault zones exhibit complex and heterogeneous permeability structures influenced by stratigraphic, compositional, and structural factors, making them critical yet uncertain components in sub...
Analysis of the TAIGA-HiSCORE Data Using the Latent Space of Autoencoders : Abstract: The aim of extensive air shower (EAS) analysis is to reconstruct the physical parameters of the primary particle that initiated the shower. The TAIGA experiment is a hybrid detector system t...
Siegel Neural Networks : Abstract: Riemannian symmetric spaces (RSS) such as hyperbolic spaces and symmetric positive definite (SPD) manifolds have become popular spaces for representation learning. In this paper, we propose ...
Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem : Abstract: The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performin...
Pretrained Joint Predictions for Scalable Batch Bayesian Optimization of Molecular Designs : Abstract: Batched synthesis and testing of molecular designs is the key bottleneck of drug development. There has been great interest in leveraging biomolecular foundation models as surrogates to acce...
Tight Robustness Certification through the Convex Hull of $\ell_0$ Attacks : Abstract: Few-pixel attacks mislead a classifier by modifying a few pixels of an image. Their perturbation space is an $\ell_0$-ball, which is not convex, unlike $\ell_p$-balls for $p\geq1$. However, ...
Semi-Unified Sparse Dictionary Learning with Learnable Top-K LISTA and FISTA Encoders : Abstract: We present a semi-unified sparse dictionary learning framework that bridges the gap between classical sparse models and modern deep architectures. Specifically, the method integrates strict ...
Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations : Abstract: Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch (EM) algo...
Oya: Deep Learning for Accurate Global Precipitation Estimation : Abstract: Accurate precipitation estimation is critical for hydrological applications, especially in the Global South where ground-based observation networks are sparse and forecasting skill is limite...
Maximizing Efficiency of Dataset Compression for Machine Learning Potentials With Information Theory : Abstract: Machine learning interatomic potentials (MLIPs) balance high accuracy and lower costs compared to density functional theory calculations, but their performance often depends on the size and ...
Holonorm : Abstract: Normalization is a key point in transformer training . In Dynamic Tanh (DyT), the author demonstrated that Tanh can be used as an alternative layer normalization (LN) and confirmed the effec...
Weak Relation Enforcement for Kinematic-Informed Long-Term Stock Prediction with Artificial Neural Networks : Abstract: We propose loss function week enforcement of the velocity relations between time-series points in the Kinematic-Informed artificial Neural Networks (KINN) for long-term stock prediction. Pro...
Panda: Test-Time Adaptation with Negative Data Augmentation : Abstract: Pretrained VLMs exhibit strong zero-shot classification capabilities, but their predictions degrade significantly under common image corruptions. To improve robustness, many test-time adapta...
Intrinsic Dimensionality as a Model-Free Measure of Class Imbalance : Abstract: Imbalance in classification tasks is commonly quantified by the cardinalities of examples across classes. This, however, disregards the presence of redundant examples and inherent difference...
Neuronal Fluctuations: Learning Rates vs Participating Neurons : Abstract: Deep Neural Networks (DNNs) rely on inherent fluctuations in their internal parameters (weights and biases) to effectively navigate the complex optimization landscape and achieve robust perf...
Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting : Abstract: Spatio-temporal graphs are powerful tools for modeling complex dependencies in traffic time series. However, the distributed nature of real-world traffic data across multiple stakeholders po...
Product distribution learning with imperfect advice : Abstract: Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of...
Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective : Abstract: The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent trai...
Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience : Abstract: Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been exte...
EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training : Abstract: Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues,...
PITE: Multi-Prototype Alignment for Individual Treatment Effect Estimation : Abstract: Estimating Individual Treatment Effects (ITE) from observational data is challenging due to confounding bias. Most studies tackle this bias by balancing distributions globally, but ignore in...
OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models : Abstract: Since Multimodal Large Language Models (MLLMs) are increasingly being integrated into everyday tools and intelligent agents, growing concerns have arisen regarding their possible output of u...
Unitho: A Unified Multi-Task Framework for Computational Lithography : Abstract: Reliable, generalizable data foundations are critical for enabling large-scale models in computational lithography. However, essential tasks-mask generation, rule violation detection, and la...
FedCure: Mitigating Participation Bias in Semi-Asynchronous Federated Learning with Non-IID Data : Abstract: While semi-asynchronous federated learning (SAFL) combines the efficiency of synchronous training with the flexibility of asynchronous updates, it inherently suffers from participation bias,...
Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training : Abstract: Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports, which refers to place authentic images into out-of-context or fabricated image-text pairings. This p...
Beyond MSE: Ordinal Cross-Entropy for Probabilistic Time Series Forecasting : Abstract: Time series forecasting is an important task that involves analyzing temporal dependencies and underlying patterns (such as trends, cyclicality, and seasonality) in historical data to predic...
Towards Leveraging Sequential Structure in Animal Vocalizations : Abstract: Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features acro...
EPO: Diverse and Realistic Protein Ensemble Generation via Energy Preference Optimization : Abstract: Accurate exploration of protein conformational ensembles is essential for uncovering function but remains hard because molecular-dynamics (MD) simulations suffer from high computational cost...
RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting : Abstract: Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize us...
How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders : Abstract: Although recent generative models are remarkably capable of producing instruction-following and realistic outputs, they remain prone to notable physical plausibility failures. Though critica...
Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games : Abstract: Urban Network Security Games (UNSGs), which model the strategic allocation of limited security resources on city road networks, are critical for urban safety. However, finding a Nash Equilib...
FAQNAS: FLOPs-aware Hybrid Quantum Neural Architecture Search using Genetic Algorithm : Abstract: Hybrid Quantum Neural Networks (HQNNs), which combine parameterized quantum circuits with classical neural layers, are emerging as promising models in the noisy intermediate-scale quantum (N...
From Static Structures to Ensembles: Studying and Harnessing Protein Structure Tokenization : Abstract: Protein structure tokenization converts 3D structures into discrete or vectorized representations, enabling the integration of structural and sequence data. Despite many recent works on stru...
SVD-NO: Learning PDE Solution Operators with SVD Integral Kernels : Abstract: Neural operators have emerged as a promising paradigm for learning solution operators of partial differential equa- tions (PDEs) directly from data. Existing methods, such as those based on ...
GraphSB: Boosting Imbalanced Node Classification on Graphs through Structural Balance : Abstract: Imbalanced node classification is a critical challenge in graph learning, where most existing methods typically utilize Graph Neural Networks (GNNs) to learn node representations. These meth...
Interaction as Interference: A Quantum-Inspired Aggregation Approach : Abstract: Classical approaches often treat interaction as engineered product terms or as emergent patterns in flexible models, offering little control over how synergy or antagonism arises. We take a ...
DemoTuner: Efficient DBMS Knobs Tuning via LLM-Assisted Demonstration Reinforcement Learning : Abstract: The performance of modern DBMSs such as MySQL and PostgreSQL heavily depends on the configuration of performance-critical knobs. Manual tuning these knobs is laborious and inefficient due to...
A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes : Abstract: We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to suc...
Towards Robust Multimodal Learning in the Open World : Abstract: The rapid evolution of machine learning has propelled neural networks to unprecedented success across diverse domains. In particular, multimodal learning has emerged as a transformative para...
Rediscovering the Lunar Equation of the Centre with AI Feynman via Embedded Physical Biases : Abstract: This work explores using the physics-inspired AI Feynman symbolic regression algorithm to automatically rediscover a fundamental equation in astronomy -- the Equation of the Centre. Through ...
Autonomous Concept Drift Threshold Determination : Abstract: Existing drift detection methods focus on designing sensitive test statistics. They treat the detection threshold as a fixed hyperparameter, set once to balance false alarms and late detecti...
Towards Multiple Missing Values-resistant Unsupervised Graph Anomaly Detection : Abstract: Unsupervised graph anomaly detection (GAD) has received increasing attention in recent years, which aims to identify data anomalous patterns utilizing only unlabeled node information from gr...
Incremental Generation is Necessity and Sufficient for Universality in Flow-Based Modelling : Abstract: Incremental flow-based denoising models have reshaped generative modelling, but their empirical advantage still lacks a rigorous approximation-theoretic foundation. We show that incremental ...
Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training : Abstract: Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging parad...
Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models : Abstract: Reinforcement learning (RL) finetuning is crucial to aligning large language models (LLMs), but the process is notoriously unstable and exhibits high variance across model checkpoints. In pr...
Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting : Abstract: The growing use of large language models in sensitive domains has exposed a critical weakness: the inability to ensure that private information can be permanently forgotten. Yet these system...
ConSurv: Multimodal Continual Learning for Survival Analysis : Abstract: Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to ad...
Steering Pretrained Drafters during Speculative Decoding : Abstract: Speculative decoding accelerates language model inference by separating generation into fast drafting and parallel verification. Its main limitation is drafter-verifier misalignment, which l...
ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking : Abstract: Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language mode...
Learning Intersections of Halfspaces under Factorizable Distribution : Abstract: Learning intersections of halfspaces is a central problem in Computational Learning Theory. Even for just two halfspaces, it remains a major open question whether learning is possible in pol...
SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data : Abstract: Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. Data heterogeneity acros...
Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning : Abstract: Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function in...
CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting : Abstract: Recent advances in deep forecasting models have achieved remarkable performance, yet most approaches still struggle to provide both accurate predictions and interpretable insights into tempo...
Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO : Abstract: Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinfor...
NeuroLingua: A Language-Inspired Hierarchical Framework for Multimodal Sleep Stage Classification Using EEG and EOG : Abstract: Automated sleep stage classification from polysomnography remains limited by the lack of expressive temporal hierarchies, challenges in multimodal EEG and EOG fusion, and the limited interpr...
Is nasty noise actually harder than malicious noise? : Abstract: We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise mod...
Data Heterogeneity and Forgotten Labels in Split Federated Learning : Abstract: In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and agg...
FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching : Abstract: Radar-based precipitation nowcasting, the task of forecasting short-term precipitation fields from previous radar images, is a critical problem for flood risk management and decision-making....
Generalizing PDE Emulation with Equation-Aware Neural Operators : Abstract: Solving partial differential equations (PDEs) can be prohibitively expensive using traditional numerical methods. Deep learning-based surrogate models typically specialize in a single PDE wi...
Efficient Hyperdimensional Computing with Modular Composite Representations : Abstract: The modular composite representation (MCR) is a computing model that represents information with high-dimensional integer vectors using modular arithmetic. Originally proposed as a generaliz...
ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning : Abstract: Reinforcement learning (RL) has demonstrated significant promise in enhancing the reasoning capabilities of Text2SQL LLMs, especially with advanced algorithms such as GRPO and DAPO. However,...
Boosted GFlowNets: Improving Exploration via Sequential Learning : Abstract: Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they oft...
GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks : Abstract: State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iterativel...
Generalization Can Emerge in Tabular Foundation Models From a Single Table : Abstract: Deep tabular modelling increasingly relies on in-context learning where, during inference, a model receives a set of $(x,y)$ pairs as context and predicts labels for new inputs without weigh...
Parametric Expensive Multi-Objective Optimization via Generative Solution Modeling : Abstract: Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This gives rise to parametric expensive...
Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off : Abstract: The design of Large Language Models (LLMs) has long been hampered by a fundamental conflict within their core attention mechanism: its remarkable expressivity is built upon a computational c...
DynamicRTL: RTL Representation Learning for Dynamic Circuit Behavior : Abstract: There is a growing body of work on using Graph Neural Networks (GNNs) to learn representations of circuits, focusing primarily on their static characteristics. However, these models fail to ...
Group Averaging for Physics Applications: Accuracy Improvements at Zero Training Cost : Abstract: Many machine learning tasks in the natural sciences are precisely equivariant to particular symmetries. Nonetheless, equivariant methods are often not employed, perhaps because training is p...
Filtering Jump Markov Systems with Partially Known Dynamics: A Model-Based Deep Learning Approach : Abstract: This paper presents the Jump Markov Filtering Network (JMFNet), a novel model-based deep learning framework for real-time state-state estimation in jump Markov systems with unknown noise sta...
Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads : Abstract: Deep mixture-of-experts models have attracted a lot of attention for survival analysis problems, particularly for their ability to cluster similar patients together. In practice, grouping of...
Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests : Abstract: Password security plays a crucial role in cybersecurity, yet traditional password strength meters, which rely on static rules like character-type requirements, often fail. Such methods are e...
Leveraging Large Language Models for Use Case Model Generation from Software Requirements : Abstract: Use case modeling employs user-centered scenarios to outline system requirements. These help to achieve consensus among relevant stakeholders. Because the manual creation of use case models ...
Enhancing PIBT via Multi-Action Operations : Abstract: PIBT is a rule-based Multi-Agent Path Finding (MAPF) solver, widely used as a low-level planner or action sampler in many state-of-the-art approaches. Its primary advantage lies in its excep...
Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support : Abstract: Financial advisors and investors struggle with information overload from financial news, where irrelevant content and noise obscure key market signals and hinder timely investment decisions....
Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance : Abstract: Automated visual inspection in medical-device manufacturing faces unique challenges, including extremely low defect rates, limited annotated data, hardware restrictions on production lines, ...
ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory : Abstract: Data scarcity continues to be a major challenge in the field of robotic manipulation. Although diffusion models provide a promising solution for generating robotic manipulation videos, exist...
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA : Abstract: Medical imaging plays a pivotal role in modern healthcare, with computed tomography pulmonary angiography (CTPA) being a critical tool for diagnosing pulmonary embolism and other thoracic co...
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data : Abstract: Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However,...
Multi-Turn Interactions for Text-to-SQL with Large Language Models : Abstract: This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods are still i...
Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool : Abstract: While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and Artificial Intelli...
GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets : Abstract: We study GCS-TSP, a new variant of the Traveling Salesman Problem (TSP) defined over a Graph of Convex Sets (GCS) -- a powerful representation for trajectory planning that decomposes the con...
Enhancing Conflict Resolution in Language Models via Abstract Argumentation : Abstract: In recent years, large language models (LLMs) have made significant advancements in developing human-like and engaging dialogue systems. However, in tasks such as consensus-building and pers...
A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning : Abstract: Multi-modal conversation emotion recognition (MCER) aims to recognize and track the speaker's emotional state using text, speech, and visual information in the conversation scene. Analyzing ...
Black-Box On-Policy Distillation of Large Language Models : Abstract: Black-box distillation creates student large language models (LLMs) by learning from a proprietary teacher model's text outputs alone, without access to its internal logits or parameters. In...
Instella: Fully Open Language Models with Stellar Performance : Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limitin...
SSR: Socratic Self-Refine for Large Language Model Reasoning : Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, yet existing test-time frameworks often rely on coarse self-verification and self-correction, limiting their ef...
Know Your Limits: Entropy Estimation Modeling for Compression and Generalization : Abstract: Language prediction is constrained by informational entropy intrinsic to language, such that there exists a limit to how accurate any language model can become and equivalently a lower bound...
Towards an Agentic Workflow for Internet Measurement Research : Abstract: Internet measurement research faces an accessibility crisis: complex analyses require custom integration of multiple specialized tools that demands specialized domain expertise. When network...
Mined Prompting and Metadata-Guided Generation for Wound Care Visual Question Answering : Abstract: The rapid expansion of asynchronous remote care has intensified provider workload, creating demand for AI systems that can assist clinicians in managing patient queries more efficiently. The...
Textual understanding boost in the WikiRace : Abstract: The WikiRace game, where players navigate between Wikipedia articles using only hyperlinks, serves as a compelling benchmark for goal-directed search in complex information networks. This pa...
Evaluating Prompting Strategies with MedGemma for Medical Order Extraction : Abstract: The accurate extraction of medical orders from doctor-patient conversations is a critical task for reducing clinical documentation burdens and ensuring patient safety. This paper details our...
Towards Emotionally Intelligent and Responsible Reinforcement Learning : Abstract: Personalized decision systems in healthcare and behavioral support often rely on static rule-based or engagement-maximizing heuristics that overlook users' emotional context and ethical cons...
Impact of Layer Norm on Memorization and Generalization in Transformers : Abstract: Layer Normalization (LayerNorm) is one of the fundamental components in transformers that stabilizes training and improves optimization. In recent times, Pre-LayerNorm transformers have beco...
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space : Abstract: Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typical...
From Euler to Today: Universal Mathematical Fallibility A Large-Scale Computational Analysis of Errors in ArXiv Papers : Abstract: We present the results of a large-scale computational analysis of mathematical papers from the ArXiv repository, demonstrating a comprehensive system that not only detects mathematical error...
Preview, Accept or Discard? A Predictive Low-Motion Interaction Paradigm : Abstract: Repetitive strain injury (RSI) affects roughly one in five computer users and remains largely unresolved despite decades of ergonomic mouse redesign. All such devices share a fundamental lim...
Say It Differently: Linguistic Styles as Jailbreak Vectors : Abstract: Large Language Models (LLMs) are commonly evaluated for robustness against paraphrased or semantically equivalent jailbreak prompts, yet little attention has been paid to linguistic variatio...
LOCA-R: Near-Perfect Performance on the Chinese Physics Olympiad 2025 : Abstract: Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, a...
On the Detectability of Active Gradient Inversion Attacks in Federated Learning : Abstract: One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a fal...
Utility of Pancreas Surface Lobularity as a CT Biomarker for Opportunistic Screening of Type 2 Diabetes : Abstract: Type 2 Diabetes Mellitus (T2DM) is a chronic metabolic disease that affects millions of people worldwide. Early detection is crucial as it can alter pancreas function through morphological c...
Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs : Abstract: Optimizing the performance of large language models (LLMs) on large-scale AI training and inference systems requires a scalable and expressive mechanism to model distributed workload executi...
Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks : Abstract: While prompt optimization has emerged as a critical technique for enhancing language model performance, existing approaches primarily focus on elicitation-based strategies that search for op...
LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning : Abstract: Large language models (LLMs) have been widely evaluated on macro-scale geographic tasks, such as global factual recall, event summarization, and regional reasoning. Yet, their ability to han...
Reasoning About Intent for Ambiguous Requests : Abstract: Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address thi...
Completion of partial structures using Patterson maps with the CrysFormer machine learning model : Abstract: Protein structure determination has long been one of the primary challenges of structural biology, to which deep machine learning (ML)-based approaches have increasingly been applied. Howeve...
Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration : Abstract: Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model b...
nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation : Abstract: Recent advances in closed-loop planning benchmarks have significantly improved the evaluation of autonomous vehicles. However, existing benchmarks still rely on rule-based reactive agents su...
Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance : Abstract: Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large lang...
AgentEvolver: Towards Efficient Self-Evolving Agent System : Abstract: Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse env...
Enhancing Kernel Power K-means: Scalable and Robust Clustering with Random Fourier Features and Possibilistic Method : Abstract: Kernel power $k$-means (KPKM) leverages a family of means to mitigate local minima issues in kernel $k$-means. However, KPKM faces two key limitations: (1) the computational burden of the fu...
MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns : Abstract: Document parsing is a core task in document intelligence, supporting applications such as information extraction, retrieval-augmented generation, and automated document analysis. However, re...
Simulating Misinformation Propagation in Social Networks using Large Language Models : Abstract: Misinformation on social media thrives on surprise, emotion, and identity-driven reasoning, often amplified through human cognitive biases. To investigate these mechanisms, we model large la...
SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation : Abstract: Geospatial foundation models for Earth observation often fail to perform reliably in environments underrepresented during pretraining. We introduce SHRUG-FM, a framework for reliability-awar...
DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile : Abstract: AI-based dermatology adoption remains limited by biased datasets, variable image quality, and limited validation. We introduce DermAI, a lightweight, smartphone-based application that enable...
BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages : Abstract: In the context of pretraining of Large Language Models (LLMs), synthetic data has emerged as an alternative for generating high-quality pretraining data at scale. This is particularly benefi...
Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision : Abstract: Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing method...
Rethinking Visual Information Processing in Multimodal LLMs : Abstract: Despite the remarkable success of the LLaVA architecture for vision-language tasks, its design inherently struggles to effectively integrate visual features due to the inherent mismatch betw...
Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models : Abstract: Large Vision-Language Models (LVLMs) often suffer from object hallucination, generating text inconsistent with visual inputs, which can critically undermine their reliability. Existing infer...
Torch-Uncertainty: A Deep Learning Framework for Uncertainty Quantification : Abstract: Deep Neural Networks (DNNs) have demonstrated remarkable performance across various domains, including computer vision and natural language processing. However, they often struggle to accura...
RoboBenchMart: Benchmarking Robots in Retail Environment : Abstract: Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To addr...
Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics : Abstract: In recent years, LLMs have been widely integrated into software engineering workflows, supporting tasks like code generation. However, while these models often generate functionally correct ...
MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models : Abstract: Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. Ho...
H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification : Abstract: Fine-Grained Visual Classification (FGVC) remains a challenging task due to subtle inter-class differences and large intra-class variations. Existing approaches typically rely on feature-sel...
Workload Schedulers -- Genesis, Algorithms and Differences : Abstract: This paper presents a novel approach to categorization of modern workload schedulers. We provide descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Sy...
Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning : Abstract: Transformers have demonstrated exceptional in-context learning (ICL) capabilities, enabling applications across natural language processing, computer vision, and sequential decision-making. ...
FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment : Abstract: Action Quality Assessment (AQA) aims to evaluate and score sports actions, which has attracted widespread interest in recent years. Existing AQA methods primarily predict scores based on fea...
Robustness and Imperceptibility Analysis of Hybrid Spatial-Frequency Domain Image Watermarking : Abstract: The proliferation of digital media necessitates robust methods for copyright protection and content authentication. This paper presents a comprehensive comparative study of digital image wat...
Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners : Abstract: While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can...
VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction : Abstract: Current end-to-end spoken language models (SLMs) have made notable progress, yet they still encounter considerable response latency. This delay primarily arises from the autoregressive gener...
Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard : Abstract: Recent progress in large language models (LLMs) has enabled understanding of both speech and non-speech audio, but exposing new safety risks emerging from complex audio inputs that are inade...
Persona-Aware Alignment Framework for Personalized Dialogue Generation : Abstract: Personalized dialogue generation aims to leverage persona profiles and dialogue history to generate persona-relevant and consistent responses. Mainstream models typically rely on token-level...
Fractional neural attention for efficient multiscale sequence processing : Abstract: Attention mechanisms underpin the computational power of Transformer models, which have achieved remarkable success across diverse domains. Yet understanding and extending the principles und...
VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction : Abstract: Multi-agent trajectory prediction is crucial for autonomous systems operating in dense, interactive environments. Existing methods often fail to jointly capture agents' long-term goals and t...
Improved Offline Reinforcement Learning via Quantum Metric Encoding : Abstract: Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative a...
Utilizing a Geospatial Foundation Model for Coastline Delineation in Small Sandy Islands : Abstract: We present an initial evaluation of NASA and IBM's Prithvi-EO-2.0 geospatial foundation model on shoreline delineation of small sandy islands using satellite images. We curated and labeled a...
GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval : Abstract: Text-to-Image Person Retrieval (TIPR) aims to retrieve person images based on natural language descriptions. Although many TIPR methods have achieved promising results, sometimes textual que...
Right Looks, Wrong Reasons: Compositional Fidelity in Text-to-Image Generation : Abstract: The architectural blueprint of today's leading text-to-image models contains a fundamental flaw: an inability to handle logical composition. This survey investigates this breakdown across th...
MATAI: A Generalist Machine Learning Framework for Property Prediction and Inverse Design of Advanced Alloys : Abstract: The discovery of advanced metallic alloys is hindered by vast composition spaces, competing property objectives, and real-world constraints on manufacturability. Here we introduce MATAI, a g...
On the Military Applications of Large Language Models : Abstract: In this paper, military use cases or applications and implementation thereof are considered for natural language processing and large language models, which have broken into fame with the in...
T2IBias: Uncovering Societal Bias Encoded in the Latent Space of Text-to-Image Generative Models : Abstract: Text-to-image (T2I) generative models are largely used in AI-powered real-world applications and value creation. However, their strategic deployment raises critical concerns for responsible ...
eXIAA: eXplainable Injections for Adversarial Attack : Abstract: Post-hoc explainability methods are a subset of Machine Learning (ML) that aim to provide a reason for why a model behaves in a certain way. In this paper, we show a new black-box model-agno...
Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning : Abstract: Offline-to-online reinforcement learning (O2O-RL) has emerged as a promising paradigm for safe and efficient robotic policy deployment but suffers from two fundamental challenges: limited co...
Multivariate Gaussian Representation Learning for Medical Action Evaluation : Abstract: Fine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotempora...
BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference : Abstract: Mixture-of-Experts (MoE) architectures scale language models by activating only a subset of specialized expert networks for each input token, thereby reducing the number of floating-point op...
Moral Change or Noise? On Problems of Aligning AI With Temporally Unstable Human Feedback : Abstract: Alignment methods in moral domains seek to elicit moral preferences of human stakeholders and incorporate them into AI. This presupposes moral preferences as static targets, but such prefere...
Temporal Latent Variable Structural Causal Model for Causal Discovery under External Interferences : Abstract: Inferring causal relationships from observed data is an important task, yet it becomes challenging when the data is subject to various external interferences. Most of these interferences are...
Efficient Automated Diagnosis of Retinopathy of Prematurity by Customize CNN Models : Abstract: This paper encompasses an in-depth examination of Retinopathy of Prematurity (ROP) diagnosis, employing advanced deep learning methodologies. Our focus centers on refining and evaluating CNN...
Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation : Abstract: We propose Anomagic, a zero-shot anomaly generation method that produces semantically coherent anomalies without requiring any exemplar anomalies. By unifying both visual and textual cues th...
fastbmRAG: A Fast Graph-Based RAG Framework for Efficient Processing of Large-Scale Biomedical Literature : Abstract: Large language models (LLMs) are rapidly transforming various domains, including biomedicine and healthcare, and demonstrate remarkable potential from scientific research to new drug discove...
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging : Abstract: Automated interpretation of medical images demands robust modeling of complex visual-semantic relationships while addressing annotation scarcity, label imbalance, and clinical plausibility c...
The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads : Abstract: The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs),...
Phantom Menace: Exploring and Enhancing the Robustness of VLA Models against Physical Sensor Attacks : Abstract: Vision-Language-Action (VLA) models revolutionize robotic systems by enabling end-to-end perception-to-action pipelines that integrate multiple sensory modalities, such as visual signals pro...
PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like content. This has revolutionized various sectors such as healthcare, softwar...
Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models : Abstract: Contrastive pre-trained vision-language models, such as CLIP, demonstrate strong generalization abilities in zero-shot classification by leveraging embeddings extracted from image and text e...
MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data : Abstract: Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datasets grow and span multiple related targets, there i...
Owlgorithm: Supporting Self-Regulated Learning in Competitive Programming through LLM-Driven Reflection : Abstract: We present Owlgorithm, an educational platform that supports Self-Regulated Learning (SRL) in competitive programming (CP) through AI-generated reflective questions. Leveraging GPT-4o, Owlgo...
EnvTrace: Simulation-Based Semantic Evaluation of LLM Code via Execution Trace Alignment -- Demonstrated at Synchrotron Beamlines : Abstract: Evaluating large language models (LLMs) for instrument control requires methods that go beyond standard, stateless algorithmic benchmarks, since the behavior of physical systems cannot be fu...
AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics : Abstract: The rapid proliferation of AI-generated content (AIGC) has reshaped the dynamics of digital marketing and online consumer behavior. However, predicting the diffusion trajectory and market im...
Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment : Abstract: Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the imag...
EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models : Abstract: Scalable and generalizable analysis of brain activity is essential for advancing both clinical diagnostics and cognitive research. Electroencephalography (EEG), a non-invasive modality with ...
AdaptViG: Adaptive Vision GNN with Exponential Decay Gating : Abstract: Vision Graph Neural Networks (ViGs) offer a new direction for advancements in vision architectures. While powerful, ViGs often face substantial computational challenges stemming from their g...
Compensating Distribution Drifts in Class-incremental Learning of Pre-trained Vision Transformers : Abstract: Recent advances have shown that sequential fine-tuning (SeqFT) of pre-trained vision transformers (ViTs), followed by classifier refinement using approximate distributions of class features,...
MDMLP-EIA: Multi-domain Dynamic MLPs with Energy Invariant Attention for Time Series Forecasting : Abstract: Time series forecasting is essential across diverse domains. While MLP-based methods have gained attention for achieving Transformer-comparable performance with fewer parameters and better r...
Harnessing Bounded-Support Evolution Strategies for Policy Refinement : Abstract: Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploratio...
PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors : Abstract: Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bi...
Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation : Abstract: Cardiovascular disease (CVD) is a leading cause of mortality worldwide. Electrocardiograms (ECGs) are the most widely used non-invasive tool for cardiac assessment, yet large, well-annotated...
Scale-Aware Relay and Scale-Adaptive Loss for Tiny Object Detection in Aerial Images : Abstract: Recently, despite the remarkable advancements in object detection, modern detectors still struggle to detect tiny objects in aerial images. One key reason is that tiny objects carry limited ...
A General Anchor-Based Framework for Scalable Fair Clustering : Abstract: Fair clustering is crucial for mitigating bias in unsupervised learning, yet existing algorithms often suffer from quadratic or super-quadratic computational complexity, rendering them impra...
Taught by the Flawed: How Dataset Insecurity Breeds Vulnerable AI Code : Abstract: AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately responsible for validating and reviewing su...
Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning : Abstract: Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeate...
CertMask: Certifiable Defense Against Adversarial Patches via Theoretically Optimal Mask Coverage : Abstract: Adversarial patch attacks inject localized perturbations into images to mislead deep vision models. These attacks can be physically deployed, posing serious risks to real-world applications....
From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance : Abstract: Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied envir...
Multiple Treatments Causal Effects Estimation with Task Embeddings and Balanced Representation Learning : Abstract: The simultaneous application of multiple treatments is increasingly common in many fields, such as healthcare and marketing. In such scenarios, it is important to estimate the single treatme...
On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks : Abstract: This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparamet...
Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models : Abstract: Vision-Language Models (VLMs) excel at zero-shot inference but often degrade under test-time domain shifts. For this reason, episodic test-time adaptation strategies have recently emerged as...
Constrained Best Arm Identification with Tests for Feasibility : Abstract: Best arm identification (BAI) aims to identify the highest-performance arm among a set of $K$ arms by collecting stochastic samples from each arm. In real-world problems, the best arm needs ...
Predicate-Argument Structure Divergences in Chinese and English Parallel Sentences and their Impact on Language Transfer : Abstract: Cross-lingual Natural Language Processing (NLP) has gained significant traction in recent years, offering practical solutions in low-resource settings by transferring linguistic knowledge fr...
Koopman Invariants as Drivers of Emergent Time-Series Clustering in Joint-Embedding Predictive Architectures : Abstract: Joint-Embedding Predictive Architectures (JEPAs), a powerful class of self-supervised models, exhibit an unexplained ability to cluster time-series data by their underlying dynamical regimes...
Privacy-Preserving Explainable AIoT Application via SHAP Entropy Regularization : Abstract: The widespread integration of Artificial Intelligence of Things (AIoT) in smart home environments has amplified the demand for transparent and interpretable machine learning models. To foste...
Solvaformer: an SE(3)-equivariant graph transformer for small molecule solubility prediction : Abstract: Accurate prediction of small molecule solubility using material-sparing approaches is critical for accelerating synthesis and process optimization, yet experimental measurement is costly and...
Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation : Abstract: Resource orchestration and configuration parameter search are key concerns for container-based infrastructure in cloud data centers. Large configuration search space and cloud uncertainties ...
Brian Intensify: An Adaptive Machine Learning Framework for Auditory EEG Stimulation and Cognitive Enhancement in FXS : Abstract: Neurodevelopmental disorders such as Fragile X Syndrome (FXS) and Autism Spectrum Disorder (ASD) are characterized by disrupted cortical oscillatory activity, particularly in the alpha and g...
History Rhymes: Macro-Contextual Retrieval for Robust Financial Forecasting : Abstract: Financial markets are inherently non-stationary: structural breaks and macroeconomic regime shifts often cause forecasting models to fail when deployed out of distribution (OOD). Conventiona...
How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation : Abstract: Large Language Models (LLMs) excel at evaluating machine translation (MT), but their scale and cost hinder deployment on edge devices and in privacy-sensitive workflows. We ask: how small ca...
Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation : Abstract: Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided),...
TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training : Abstract: Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by...
Soiling detection for Advanced Driver Assistance Systems : Abstract: Soiling detection for automotive cameras is a crucial part of advanced driver assistance systems to make them more robust to external conditions like weather, dust, etc. In this paper, we re...
Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy : Abstract: Generalization to unseen environments is a significant challenge in the field of robotics and control. In this work, we focus on contextual reinforcement learning, where agents act within en...
Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction : Abstract: In dynamic and crowded environments, realistic pedestrian trajectory prediction remains a challenging task due to the complex nature of human motion and the mutual influences among individua...
Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard : Abstract: Inspired by infant development, we propose a Reinforcement Learning (RL) framework for autonomous self-exploration in a robotic agent, Baby Sophia, using the BabyBench simulation environment...
PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model : Abstract: Indoor localization in GPS-denied environments is crucial for applications like emergency response and assistive navigation. Vision-based methods such as PALMS enable infrastructure-free loc...
SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning : Abstract: Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-b...
Alignment Debt: The Hidden Work of Making AI Usable : Abstract: Frontier LLMs are optimised around high-resource assumptions about language, knowledge, devices, and connectivity. Whilst widely accessible, they often misfit conditions in the Global South....
Optimistic Reinforcement Learning with Quantile Objectives : Abstract: Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which...
TomoGraphView: 3D Medical Image Classification with Omnidirectional Slice Representations and Graph Neural Networks : Abstract: The growing number of medical tomography examinations has necessitated the development of automated methods capable of extracting comprehensive imaging features to facilitate downstream task...
An explainable Recursive Feature Elimination to detect Advanced Persistent Threats using Random Forest classifier : Abstract: Intrusion Detection Systems (IDS) play a vital role in modern cybersecurity frameworks by providing a primary defense mechanism against sophisticated threat actors. In this paper, we propose...
Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey : Abstract: LLM-based agents can autonomously accomplish complex tasks across various domains. However, to further cultivate capabilities such as adaptive behavior and long-term decision-making, trainin...
HeatGen: A Guided Diffusion Framework for Multiphysics Heat Sink Design Optimization : Abstract: This study presents a generative optimization framework based on a guided denoising diffusion probabilistic model (DDPM) that leverages surrogate gradients to generate heat sink designs mini...
Prostate-VarBench: A Benchmark with Interpretable TabNet Framework for Prostate Cancer Variant Classification : Abstract: Variants of Uncertain Significance (VUS) limit the clinical utility of prostate cancer genomics by delaying diagnosis and therapy when evidence for pathogenicity or benignity is incomplete. ...
General Intelligence-based Fragmentation (GIF): A framework for peak-labeled spectra simulation : Abstract: Despite growing reference libraries and advanced computational tools, progress in the field of metabolomics remains constrained by low rates of annotating measured spectra. The recent develo...
VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing : Abstract: Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they of...
Mamba-driven multi-perspective structural understanding for molecular ground-state conformation prediction : Abstract: A comprehensive understanding of molecular structures is important for the prediction of molecular ground-state conformation involving property information. Meanwhile, state space model (e.g...
Probability-Biased Attention over Directed Bipartite Graphs for Long-Tail ICD Coding : Abstract: Automated International Classification of Diseases (ICD) coding aims to assign multiple disease codes to clinical documents, constituting a crucial multi-label text classification task in he...
Querying Labeled Time Series Data with Scenario Programs : Abstract: Simulation-based testing has become a crucial complement to road testing for ensuring the safety of cyber physical systems (CPS). As a result, significant research efforts have been directed...
Regular Games -- an Automata-Based General Game Playing Language : Abstract: We propose a new General Game Playing (GGP) system called Regular Games (RG). The main goal of RG is to be both computationally efficient and convenient for game design. The system consists ...
Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback : Abstract: Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for ...
Rethinking Science in the Age of Artificial Intelligence : Abstract: Artificial intelligence (AI) is reshaping how research is conceived, conducted, and communicated across fields from chemistry to biomedicine. This commentary examines how AI is transforming ...
Strategic Opponent Modeling with Graph Neural Networks, Deep Reinforcement Learning and Probabilistic Topic Modeling : Abstract: This paper provides a comprehensive review of mainly Graph Neural Networks, Deep Reinforcement Learning, and Probabilistic Topic Modeling methods with a focus on their potential incorporatio...
Proceedings of The third international workshop on eXplainable AI for the Arts (XAIxArts) : Abstract: This third international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts...
Non-Monotonic S4F Standpoint Logic : Abstract: Standpoint logics offer unified modal logic-based formalisms for representing multiple heterogeneous viewpoints. At the same time, many non-monotonic reasoning frameworks can be naturally ca...
Preference Elicitation for Step-Wise Explanations in Logic Puzzles : Abstract: Step-wise explanations can explain logic puzzles and other satisfaction problems by showing how to derive decisions step by step. Each step consists of a set of constraints that derive an as...
Using Certifying Constraint Solvers for Generating Step-wise Explanations : Abstract: In the field of Explainable Constraint Solving, it is common to explain to a user why a problem is unsatisfiable. A recently proposed method for this is to compute a sequence of explanation ...
Generalizing Analogical Inference from Boolean to Continuous Domains : Abstract: Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have be...
Explaining Decentralized Multi-Agent Reinforcement Learning Policies : Abstract: Multi-Agent Reinforcement Learning (MARL) has gained significant interest in recent years, enabling sequential decision-making across multiple agents in various domains. However, most existi...
SITA: A Framework for Structure-to-Instance Theorem Autoformalization : Abstract: While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in conc...
Massively Parallel Proof-Number Search for Impartial Games and Beyond : Abstract: Proof-Number Search is a best-first search algorithm with many successful applications, especially in game solving. As large-scale computing clusters become increasingly accessible, parallel...
Beyond Verification: Abductive Explanations for Post-AI Assessment of Privacy Leakage : Abstract: Privacy leakage in AI-based decision processes poses significant risks, particularly when sensitive information can be inferred. We propose a formal framework to audit privacy leakage using ...
FactGuard: Event-Centric and Commonsense-Guided Fake News Detection : Abstract: Fake news detection methods based on writing style have achieved remarkable progress. However, as adversaries increasingly imitate the style of authentic news, the effectiveness of such appr...
Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limi...
Bidirectional Bounded-Suboptimal Heuristic Search with Consistent Heuristics : Abstract: Recent advancements in bidirectional heuristic search have yielded significant theoretical insights and novel algorithms. While most previous work has concentrated on optimal search methods,...
Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention : Abstract: Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primar- ily stems from spurio...
Temporal Properties of Conditional Independence in Dynamic Bayesian Networks : Abstract: Dynamic Bayesian networks (DBNs) are compact graphical representations used to model probabilistic systems where interdependent random variables and their distributions evolve over time. In ...
Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search : Abstract: Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard...
PepTriX: A Framework for Explainable Peptide Analysis through Protein Language Models : Abstract: Peptide classification tasks, such as predicting toxicity and HIV inhibition, are fundamental to bioinformatics and drug discovery. Traditional approaches rely heavily on handcrafted encodin...
ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs : Abstract: Large Language Models (LLMs) demonstrate strong reasoning capabilities but struggle with hallucinations and limited transparency. Recently, KG-enhanced LLMs that integrate knowledge graphs (...
Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation : Abstract: Recent advances in Neural Combinatorial Optimization (NCO) methods have significantly improved the capability of neural solvers to handle synthetic routing instances. Nonetheless, existing n...
MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion : Abstract: With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in...
Advanced Black-Box Tuning of Large Language Models with Limited API Calls : Abstract: Black-box tuning is an emerging paradigm for adapting large language models (LLMs) to better achieve desired behaviors, particularly when direct access to model parameters is unavailable. Cu...
Two Constraint Compilation Methods for Lifted Planning : Abstract: We study planning in a fragment of PDDL with qualitative state-trajectory constraints, capturing safety requirements, task ordering conditions, and intermediate sub-goals commonly found in r...
DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI Models : Abstract: The performance of Machine Learning (ML) models, particularly those operating within the Interpretable Artificial Intelligence (Interpretable AI) framework, is significantly affected by the ...
RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge ...
Intilligence Foundation Model: A New Perspective to Approach Artificial General Intelligence : Abstract: We propose a new perspective for approaching artificial general intelligence (AGI) through an intelligence foundation model (IFM). Unlike existing foundation models (FMs), which specialize i...
Balancing Centralized Learning and Distributed Self-Organization: A Hybrid Model for Embodied Morphogenesis : Abstract: We investigate how to couple a learnable brain-like'' controller to a cell-like'' Gray--Scott substrate to steer pattern formation with minimal effort. A compact convolutional policy is embe...
Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning : Abstract: Large language models (LLMs) have shown great promise in the medical domain, achieving strong performance on several benchmarks. However, they continue to underperform in real-world medical ...
Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation : Abstract: Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical ...
Efficient Thought Space Exploration through Strategic Intervention : Abstract: While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs by exhaustive sampling. Through...
Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning : Abstract: Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps ...
ChEmREF: Evaluating Language Model Readiness for Chemical Emergency Response : Abstract: Emergency responders managing hazardous material HAZMAT incidents face critical, time-sensitive decisions, manually navigating extensive chemical guidelines. We investigate whether today's l...
SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models : Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten c...
Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces : Abstract: Hierarchical data pervades diverse machine learning applications, including natural language processing, computer vision, and social network analysis. Hyperbolic space, characterized by its ...
OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive : Abstract: The opioid crisis represents a significant moment in public health that reveals systemic shortcomings across regulatory systems, healthcare practices, corporate governance, and public policy...
Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models : Abstract: Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches ...
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D : Abstract: AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning (ML) R&D itself. Frontier AI systems may ...
Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search : Abstract: Protein evolution through amino acid sequence mutations is a cornerstone of life sciences. While current in-silicon directed evolution algorithms focus on designing search strategies, they o...
EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services : Abstract: Emergency Medical Services (EMS) are critical to patient survival in emergencies, but first responders often face intense cognitive demands in high-stakes situations. AI cognitive assistants...
Quantum Artificial Intelligence (QAI): Foundations, Architectural Elements, and Future Directions : Abstract: Mission critical (MC) applications such as defense operations, energy management, cybersecurity, and aerospace control require reliable, deterministic, and low-latency decision making under ...
Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems : Abstract: Adversarial patches have emerged as a popular privacy-preserving approach for resisting AI-driven surveillance systems. However, their conspicuous appearance makes them difficult to deploy i...
Robust Watermarking on Gradient Boosting Decision Trees : Abstract: Gradient Boosting Decision Trees (GBDTs) are widely used in industry and academia for their high accuracy and efficiency, particularly on structured data. However, watermarking GBDT models r...
SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations : Abstract: Large Language Models (LLMs) have shown immense potential in education, automating tasks like quiz generation and content summarization. However, generating effective presentation slides int...
Why Open Small AI Models Matter for Interactive Art : Abstract: This position paper argues for the importance of open small AI models in creative independence for interactive art practices. Deployable locally, these models offer artists vital control ove...
AI Annotation Orchestration: Evaluating LLM verifiers to Improve the Quality of LLM Annotations in Learning Analytics : Abstract: Large Language Models (LLMs) are increasingly used to annotate learning interactions, yet concerns about reliability limit their utility. We test whether verification-oriented orchestration-...
ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias : Abstract: Operationalizing definitions of fairness is difficult in practice, as multiple definitions can be incompatible while each being arguably desirable. Instead, it may be easier to directly desc...
Echoing: Identity Failures when LLM Agents Talk to Each Other : Abstract: As large language model (LLM) based agents interact autonomously with one another, a new class of failures emerges that cannot be predicted from single agent performance: behavioral drifts i...
Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models : Abstract: Instilling reasoning capabilities in large models (LMs) using reasoning training (RT) significantly improves LMs' performances. Thus Audio Reasoning Models (ARMs), i.e., audio LMs that can r...
Cogent argument extensions are weakly admissible but not vice versa : Abstract: In this research note, we show the relationship between two non-admissible argumentation framework semantics: cogent and weakly admissible semantics. We prove that, while cogent extensions a...
Proceedings of the Second International Workshop on Next-Generation Language Models for Knowledge Representation and Reasoning (NeLaMKRR 2025) : Abstract: Reasoning is an essential component of human intelligence in that it plays a fundamental role in our ability to think critically, support responsible decisions, and solve challenging problem...
SynthTools: A Framework for Scaling Synthetic Tools for Agent Development : Abstract: AI agents increasingly rely on external tools to solve complex, long-horizon tasks. Advancing such agents requires reproducible evaluation and large-scale training in controllable, diverse, ...
Variable Neighborhood Search for the Electric Vehicle Routing Problem : Abstract: The Electric Vehicle Routing Problem (EVRP) extends the classical Vehicle Routing Problem (VRP) to reflect the growing use of electric and hybrid vehicles in logistics. Due to the variety of...
An Efficient and Almost Optimal Solver for the Joint Routing-Assignment Problem via Partial JRA and Large-{\alpha} Optimization : Abstract: The Joint Routing-Assignment (JRA) optimization problem simultaneously determines the assignment of items to placeholders and a Hamiltonian cycle that visits each node pair exactly once, wit...

Research Sources: 427 | Generated: 11/14/2025