AI RESEARCH PAPERS & ACADEMIC SOURCES
- Self-Supervised Training For Low Dose CT Reconstruction : Abstract: Ionizing radiation has been the biggest concern in CT imaging. To reduce the dose level without compromising the image quality, low-dose CT reconstruction has been offered with the availabil...
- Generating Physically Stable and Buildable Brick Structures from Text : Abstract: We introduce BrickGPT, the first approach for generating physically stable interconnecting brick assembly models from text prompts. To achieve this, we construct a large-scale, physically st...
- STATIC : Surface Temporal Affine for TIme Consistency in Video Monocular Depth Estimation : Abstract: Video monocular depth estimation is essential for applications such as autonomous driving, AR/VR, and robotics. Recent transformer-based single-image monocular depth estimation models perfor...
- Mitigating Perception Bias: A Training-Free Approach to Enhance LMM for Image Quality Assessment : Abstract: Despite the impressive performance of large multimodal models (LMMs) in high-level visual tasks, their capacity for image quality assessment (IQA) remains limited. One main reason is that LM...
- Latent Knowledge-Guided Video Diffusion for Scientific Phenomena Generation from a Single Initial Frame : Abstract: Video diffusion models have achieved impressive results in natural scene generation, yet they struggle to generalize to scientific phenomena such as fluid simulations and meteorological proc...
- Trapped by Their Own Light: Deployable and Stealth Retroreflective Patch Attacks on Traffic Sign Recognition Systems : Abstract: Traffic sign recognition plays a critical role in ensuring safe and efficient transportation of autonomous vehicles but remain vulnerable to adversarial attacks using stickers or laser proje...
- Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling : Abstract: Outcome-reward reinforcement learning (RL) is a common and increasingly significant way to refine the step-by-step reasoning of multimodal large language models (MLLMs). In the multiple-choi...
- Depth Anything 3: Recovering the Visual Space from Any Views : Abstract: We present Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from an arbitrary number of visual inputs, with or without known camera poses. In pursuit of minimal mo...
- One Small Step in Latent, One Giant Leap for Pixels: Fast Latent Upscale Adapter for Your Diffusion Models : Abstract: Diffusion models struggle to scale beyond their training resolutions, as direct high-resolution sampling is slow and costly, while post-hoc image super-resolution (ISR) introduces artifacts ...
- From 2D to 3D Without Extra Baggage: Data-Efficient Cancer Detection in Digital Breast Tomosynthesis : Abstract: Digital Breast Tomosynthesis (DBT) enhances finding visibility for breast cancer detection by providing volumetric information that reduces the impact of overlapping tissues; however, limite...
- OmniVGGT: Omni-Modality Driven Visual Geometry Grounded : Abstract: General 3D foundation models have started to lead the trend of unifying diverse vision tasks, yet most assume RGB-only inputs and ignore readily available geometric cues (e.g., camera intrin...
- Dynamic Avatar-Scene Rendering from Human-centric Context : Abstract: Reconstructing dynamic humans interacting with real-world environments from monocular videos is an important and challenging task. Despite considerable progress in 4D neural rendering, exist...
- SemanticVLA: Semantic-Aligned Sparsification and Enhancement for Efficient Robotic Manipulation : Abstract: Vision-Language-Action (VLA) models have advanced in robotic manipulation, yet practical deployment remains hindered by two key limitations: 1) perceptual redundancy, where irrelevant visual...
- Learnable Total Variation with Lambda Mapping for Low-Dose CT Denoising : Abstract: Although Total Variation (TV) performs well in noise reduction and edge preservation on images, its dependence on the lambda parameter limits its efficiency and makes it difficult to use eff...
- SPOT: Sparsification with Attention Dynamics via Token Relevance in Vision Transformers : Abstract: While Vision Transformers (ViT) have demonstrated remarkable performance across diverse tasks, their computational demands are substantial, scaling quadratically with the number of processed...
- Histology-informed tiling of whole tissue sections improves the interpretability and predictability of cancer relapse and genetic alterations : Abstract: Histopathologists establish cancer grade by assessing histological structures, such as glands in prostate cancer. Yet, digital pathology pipelines often rely on grid-based tiling that ignore...
- RodEpil: A Video Dataset of Laboratory Rodents for Seizure Detection and Benchmark Evaluation : Abstract: We introduce a curated video dataset of laboratory rodents for automatic detection of convulsive events. The dataset contains short (10~s) top-down and side-view video clips of individual ro...
- 3DFETUS: Standardizing Fetal Facial Planes in 3D Ultrasound : Abstract: Acquiring standard facial planes during routine fetal ultrasound (US) examinations is often challenging due to fetal movement, variability in orientation, and operator-dependent expertise. T...
- LLM-YOLOMS: Large Language Model-based Semantic Interpretation and Fault Diagnosis for Wind Turbine Components : Abstract: The health condition of wind turbine (WT) components is crucial for ensuring stable and reliable operation. However, existing fault detection methods are largely limited to visual recognitio...
- GrounDiff: Diffusion-Based Ground Surface Generation from Digital Surface Models : Abstract: Digital Terrain Models (DTMs) represent the bare-earth elevation and are important in numerous geospatial applications. Such data models cannot be directly measured by sensors and are typica...
- SAMIRO: Spatial Attention Mutual Information Regularization with a Pre-trained Model as Oracle for Lane Detection : Abstract: Lane detection is an important topic in the future mobility solutions. Real-world environmental challenges such as background clutter, varying illumination, and occlusions pose significant o...
- Fragile by Design: On the Limits of Adversarial Defenses in Personalized Generation : Abstract: Personalized AI applications such as DreamBooth enable the generation of customized content from user images, but also raise significant privacy concerns, particularly the risk of facial ide...
- MSGNav: Unleashing the Power of Multi-modal 3D Scene Graph for Zero-Shot Embodied Navigation : Abstract: Embodied navigation is a fundamental capability for robotic agents operating. Real-world deployment requires open vocabulary generalization and low training overhead, motivating zero-shot me...
- FOUND: Fourier-based von Mises Distribution for Robust Single Domain Generalization in Object Detection : Abstract: Single Domain Generalization (SDG) for object detection aims to train a model on a single source domain that can generalize effectively to unseen target domains. While recent methods like CL...
- Learning to Tell Apart: Weakly Supervised Video Anomaly Detection via Disentangled Semantic Alignment : Abstract: Recent advancements in weakly-supervised video anomaly detection have achieved remarkable performance by applying the multiple instance learning paradigm based on multimodal foundation model...
- CLIP4VI-ReID: Learning Modality-shared Representations via CLIP Semantic Bridge for Visible-Infrared Person Re-identification : Abstract: This paper proposes a novel CLIP-driven modality-shared representation learning network named CLIP4VI-ReID for VI-ReID task, which consists of Text Semantic Generation (TSG), Infrared Featur...
- Generalizable Slum Detection from Satellite Imagery with Mixture-of-Experts : Abstract: Satellite-based slum segmentation holds significant promise in generating global estimates of urban poverty. However, the morphological heterogeneity of informal settlements presents a major...
- PROPA: Toward Process-level Optimization in Visual Reasoning via Reinforcement Learning : Abstract: Despite significant progress, Vision-Language Models (VLMs) still struggle with complex visual reasoning, where multi-step dependencies cause early errors to cascade through the reasoning ch...
- Facial-R1: Aligning Reasoning and Recognition for Facial Emotion Analysis : Abstract: Facial Emotion Analysis (FEA) extends traditional facial emotion recognition by incorporating explainable, fine-grained reasoning. The task integrates three subtasks: emotion recognition, fa...
- TubeRMC: Tube-conditioned Reconstruction with Mutual Constraints for Weakly-supervised Spatio-Temporal Video Grounding : Abstract: Spatio-Temporal Video Grounding (STVG) aims to localize a spatio-temporal tube that corresponds to a given language query in an untrimmed video. This is a challenging task since it involves ...
- Next-Frame Feature Prediction for Multimodal Deepfake Detection and Temporal Localization : Abstract: Recent multimodal deepfake detection methods designed for generalization conjecture that single-stage supervised training struggles to generalize across unseen manipulations and datasets. Ho...
- HeatV2X: Scalable Heterogeneous Collaborative Perception via Efficient Alignment and Interaction : Abstract: Vehicle-to-Everything (V2X) collaborative perception extends sensing beyond single vehicle limits through transmission. However, as more agents participate, existing frameworks face two key ...
- LiNeXt: Revisiting LiDAR Completion with Efficient Non-Diffusion Architectures : Abstract: 3D LiDAR scene completion from point clouds is a fundamental component of perception systems in autonomous vehicles. Previous methods have predominantly employed diffusion models for high-fi...
- CephRes-MHNet: A Multi-Head Residual Network for Accurate and Robust Cephalometric Landmark Detection : Abstract: Accurate localization of cephalometric landmarks from 2D lateral skull X-rays is vital for orthodontic diagnosis and treatment. Manual annotation is time-consuming and error-prone, whereas a...
- Physically Interpretable Multi-Degradation Image Restoration via Deep Unfolding and Explainable Convolution : Abstract: Although image restoration has advanced significantly, most existing methods target only a single type of degradation. In real-world scenarios, images often contain multiple degradations sim...
- Decoupling Bias, Aligning Distributions: Synergistic Fairness Optimization for Deepfake Detection : Abstract: Fairness is a core element in the trustworthy deployment of deepfake detection models, especially in the field of digital identity security. Biases in detection models toward different demog...
- Split-Layer: Enhancing Implicit Neural Representation by Maximizing the Dimensionality of Feature Space : Abstract: Implicit neural representation (INR) models signals as continuous functions using neural networks, offering efficient and differentiable optimization for inverse problems across diverse disc...
- Explicit Temporal-Semantic Modeling for Dense Video Captioning via Context-Aware Cross-Modal Interaction : Abstract: Dense video captioning jointly localizes and captions salient events in untrimmed videos. Recent methods primarily focus on leveraging additional prior knowledge and advanced multi-task arch...
- RobIA: Robust Instance-aware Continual Test-time Adaptation for Deep Stereo : Abstract: Stereo Depth Estimation in real-world environments poses significant challenges due to dynamic domain shifts, sparse or unreliable supervision, and the high cost of acquiring dense ground-tr...
- MTAttack: Multi-Target Backdoor Attacks against Large Vision-Language Models : Abstract: Recent advances in Large Visual Language Models (LVLMs) have demonstrated impressive performance across various vision-language tasks by leveraging large-scale image-text pretraining and ins...
- SUGAR: Learning Skeleton Representation with Visual-Motion Knowledge for Action Recognition : Abstract: Large Language Models (LLMs) hold rich implicit knowledge and powerful transferability. In this paper, we explore the combination of LLMs with the human skeleton to perform action classifica...
- GridPrune: From "Where to Look" to "What to Select" in Visual Token Pruning for MLLMs : Abstract: Multimodal large language models (MLLMs) have shown remarkable capabilities in a wide range of vision-language tasks. However, the large number of visual tokens introduces significant comput...
- Mitigating Error Accumulation in Co-Speech Motion Generation via Global Rotation Diffusion and Multi-Level Constraints : Abstract: Reliable co-speech motion generation requires precise motion representation and consistent structural priors across all joints. Existing generative methods typically operate on local joint r...
- VLF-MSC: Vision-Language Feature-Based Multimodal Semantic Communication System : Abstract: We propose Vision-Language Feature-based Multimodal Semantic Communication (VLF-MSC), a unified system that transmits a single compact vision-language representation to support both image an...
- Perceive, Act and Correct: Confidence Is Not Enough for Hyperspectral Classification : Abstract: Confidence alone is often misleading in hyperspectral image classification, as models tend to mistake high predictive scores for correctness while lacking awareness of uncertainty. This lead...
- When Eyes and Ears Disagree: Can MLLMs Discern Audio-Visual Confusion? : Abstract: Can Multimodal Large Language Models (MLLMs) discern confused objects that are visually present but audio-absent? To study this, we introduce a new benchmark, AV-ConfuseBench, which simulate...
- Image Aesthetic Reasoning via HCM-GRPO: Empowering Compact Model for Superior Performance : Abstract: The performance of image generation has been significantly improved in recent years. However, the study of image screening is rare and its performance with Multimodal Large Language Models (...
- MuSc-V2: Zero-Shot Multimodal Industrial Anomaly Classification and Segmentation with Mutual Scoring of Unlabeled Samples : Abstract: Zero-shot anomaly classification (AC) and segmentation (AS) methods aim to identify and outline defects without using any labeled samples. In this paper, we reveal a key property that is ove...
- FreDFT: Frequency Domain Fusion Transformer for Visible-Infrared Object Detection : Abstract: Visible-infrared object detection has gained sufficient attention due to its detection performance in low light, fog, and rain conditions. However, visible and infrared modalities captured b...
- LoG3D: Ultra-High-Resolution 3D Shape Modeling via Local-to-Global Partitioning : Abstract: Generating high-fidelity 3D contents remains a fundamental challenge due to the complexity of representing arbitrary topologies-such as open surfaces and intricate internal structures-while ...
- DGFusion: Dual-guided Fusion for Robust Multi-Modal 3D Object Detection : Abstract: As a critical task in autonomous driving perception systems, 3D object detection is used to identify and track key objects, such as vehicles and pedestrians. However, detecting distant, smal...
- AffordBot: 3D Fine-grained Embodied Reasoning via Multimodal Large Language Models : Abstract: Effective human-agent collaboration in physical environments requires understanding not only what to act upon, but also where the actionable elements are and how to interact with them. Exist...
- LampQ: Towards Accurate Layer-wise Mixed Precision Quantization for Vision Transformers : Abstract: How can we accurately quantize a pre-trained Vision Transformer model? Quantization algorithms compress Vision Transformers (ViTs) into low-bit formats, reducing memory and computation deman...
- DBGroup: Dual-Branch Point Grouping for Weakly Supervised 3D Instance Segmentation : Abstract: Weakly supervised 3D instance segmentation is essential for 3D scene understanding, especially as the growing scale of data and high annotation costs associated with fully supervised approac...
- MOBA: A Material-Oriented Backdoor Attack against LiDAR-based 3D Object Detection Systems : Abstract: LiDAR-based 3D object detection is widely used in safety-critical systems. However, these systems remain vulnerable to backdoor attacks that embed hidden malicious behaviors during training....
- STELLAR: Scene Text Editor for Low-Resource Languages and Real-World Data : Abstract: Scene Text Editing (STE) is the task of modifying text content in an image while preserving its visual style, such as font, color, and background. While recent diffusion-based approaches hav...
- Equivariant Sampling for Improving Diffusion Model-based Image Restoration : Abstract: Recent advances in generative models, especially diffusion models, have significantly improved image restoration (IR) performance. However, existing problem-agnostic diffusion model-based im...
- Robust Object Detection with Pseudo Labels from VLMs using Per-Object Co-teaching : Abstract: Foundation models, especially vision-language models (VLMs), offer compelling zero-shot object detection for applications like autonomous driving, a domain where manual labelling is prohibit...
- TSPE-GS: Probabilistic Depth Extraction for Semi-Transparent Surface Reconstruction via 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting offers a strong speed-quality trade-off but struggles to reconstruct semi-transparent surfaces because most methods assume a single depth per pixel, which fails when mu...
- Debiased Dual-Invariant Defense for Adversarially Robust Person Re-Identification : Abstract: Person re-identification (ReID) is a fundamental task in many real-world applications such as pedestrian trajectory tracking. However, advanced deep learning-based ReID models are highly sus...
- MosaicDoc: A Large-Scale Bilingual Benchmark for Visually Rich Document Understanding : Abstract: Despite the rapid progress of Vision-Language Models (VLMs), their capabilities are inadequately assessed by existing benchmarks, which are predominantly English-centric, feature simplistic ...
- Simulating Distribution Dynamics: Liquid Temporal Feature Evolution for Single-Domain Generalized Object Detection : Abstract: In this paper, we focus on Single-Domain Generalized Object Detection (Single-DGOD), aiming to transfer a detector trained on one source domain to multiple unknown domains. Existing methods ...
- HCC-3D: Hierarchical Compensatory Compression for 98% 3D Token Reduction in Vision-Language Models : Abstract: 3D understanding has drawn significant attention recently, leveraging Vision-Language Models (VLMs) to enable multi-modal reasoning between point cloud and text data. Current 3D-VLMs directl...
- RWKV-PCSSC: Exploring RWKV Model for Point Cloud Semantic Scene Completion : Abstract: Semantic Scene Completion (SSC) aims to generate a complete semantic scene from an incomplete input. Existing approaches often employ dense network architectures with a high parameter count,...
- SAM-DAQ: Segment Anything Model with Depth-guided Adaptive Queries for RGB-D Video Salient Object Detection : Abstract: Recently segment anything model (SAM) has attracted widespread concerns, and it is often treated as a vision foundation model for universal segmentation. Some researchers have attempted to d...
- Remember Me: Bridging the Long-Range Gap in LVLMs with Three-Step Inference-Only Decay Resilience Strategies : Abstract: Large Vision-Language Models (LVLMs) have achieved impressive performance across a wide range of multimodal tasks. However, they still face critical challenges in modeling long-range depende...
- IPCD: Intrinsic Point-Cloud Decomposition : Abstract: Point clouds are widely used in various fields, including augmented reality (AR) and robotics, where relighting and texture editing are crucial for realistic visualization. Achieving these t...
- CORONA-Fields: Leveraging Foundation Models for Classification of Solar Wind Phenomena : Abstract: Space weather at Earth, driven by the solar activity, poses growing risks to satellites around our planet as well as to critical ground-based technological infrastructure. Major space weathe...
- AHA! Animating Human Avatars in Diverse Scenes with Gaussian Splatting : Abstract: We present a novel framework for animating humans in 3D scenes using 3D Gaussian Splatting (3DGS), a neural scene representation that has recently achieved state-of-the-art photorealistic re...
- Lumos3D: A Single-Forward Framework for Low-Light 3D Scene Restoration : Abstract: Restoring 3D scenes captured under low-light con- ditions remains a fundamental yet challenging problem. Most existing approaches depend on precomputed camera poses and scene-specific optimi...
- PANDA - Patch And Distribution-Aware Augmentation for Long-Tailed Exemplar-Free Continual Learning : Abstract: Exemplar-Free Continual Learning (EFCL) restricts the storage of previous task data and is highly susceptible to catastrophic forgetting. While pre-trained models (PTMs) are increasingly lev...
- STORM: Segment, Track, and Object Re-Localization from a Single 3D Model : Abstract: Accurate 6D pose estimation and tracking are fundamental capabilities for physical AI systems such as robots. However, existing approaches typically rely on a manually annotated segmentation...
- Density Estimation and Crowd Counting : Abstract: This study enhances a crowd density estimation algorithm originally designed for image-based analysis by adapting it for video-based scenarios. The proposed method integrates a denoising pro...
- SliderEdit: Continuous Image Editing with Fine-Grained Instruction Control : Abstract: Instruction-based image editing models have recently achieved impressive performance, enabling complex edits to an input image from a multi-instruction prompt. However, these models apply ea...
- MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation : Abstract: While thinking-aware generation aims to improve performance on complex tasks, we identify a critical failure mode where existing sequential, autoregressive approaches can paradoxically degra...
- FedeCouple: Fine-Grained Balancing of Global-Generalization and Local-Adaptability in Federated Learning : Abstract: In privacy-preserving mobile network transmission scenarios with heterogeneous client data, personalized federated learning methods that decouple feature extractors and classifiers have demo...
- Semantic, Orthographic, and Phonological Biases in Humans' Wordle Gameplay : Abstract: We show that human players' gameplay in the game of Wordle is influenced by the semantics, orthography, and phonology of the player's previous guesses. We compare actual human players' guess...
- MedMobile: A mobile-sized language model with clinical capabilities : Abstract: Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale impleme...
- Lessons in co-creation: the inconvenient truths of inclusive sign language technology development : Abstract: In the era of AI-driven language technologies, the participation of deaf communities in sign language technology development, often framed as co-creation, is increasingly emphasized. We pres...
- Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework : Abstract: The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threa...
- Towards Blind and Low-Vision Accessibility of Lightweight VLMs and Custom LLM-Evals : Abstract: Large Vision-Language Models (VLMs) excel at understanding and generating video descriptions but their high memory, computation, and deployment demands hinder practical use particularly for ...
- Music Flamingo: Scaling Music Understanding in Audio Language Models : Abstract: We introduce Music Flamingo, a novel large audio-language model designed to advance music (including song) understanding in foundational audio models. While audio-language research has progr...
- Regional Attention-Enhanced Swin Transformer for Clinically Relevant Medical Image Captioning : Abstract: Automated medical image captioning translates complex radiological images into diagnostic narratives that can support reporting workflows. We present a Swin-BART encoder-decoder system with ...
- ParoQuant: Pairwise Rotation Quantization for Efficient Reasoning LLM Inference : Abstract: Weight-only post-training quantization (PTQ) compresses the weights of Large Language Models (LLMs) into low-precision representations to reduce memory footprint and accelerate inference. Ho...
- DESS: DeBERTa Enhanced Syntactic-Semantic Aspect Sentiment Triplet Extraction : Abstract: Fine-grained sentiment analysis faces ongoing challenges in Aspect Sentiment Triple Extraction (ASTE), particularly in accurately capturing the relationships between aspects, opinions, and s...
- URaG: Unified Retrieval and Generation in Multimodal LLMs for Efficient Long Document Understanding : Abstract: Recent multimodal large language models (MLLMs) still struggle with long document understanding due to two fundamental challenges: information interference from abundant irrelevant content, ...
- Computing the Formal and Institutional Boundaries of Contemporary Genre and Literary Fiction : Abstract: Though the concept of genre has been a subject of discussion for millennia, the relatively recent emergence of genre fiction has added a new layer to this ongoing conversation. While more tr...
- Convomem Benchmark: Why Your First 150 Conversations Don't Need RAG : Abstract: We introduce a comprehensive benchmark for conversational memory evaluation containing 75,336 question-answer pairs across diverse categories including user facts, assistant recall, abstenti...
- Rubric-Based Benchmarking and Reinforcement Learning for Advancing LLM Instruction Following : Abstract: Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-...
- Exploring State Tracking Capabilities of Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated impressive capabilities in solving complex tasks, including those requiring a certain level of reasoning. In this paper, we focus on state trac...
- Analogical Structure, Minimal Contextual Cues and Contrastive Distractors: Input Design for Sample-Efficient Linguistic Rule Induction : Abstract: Large language models achieve strong performance through training on vast datasets. Can analogical paradigm organization enable lightweight models to match this performance with minimal data...
- DELICATE: Diachronic Entity LInking using Classes And Temporal Evidence : Abstract: In spite of the remarkable advancements in the field of Natural Language Processing, the task of Entity Linking (EL) remains challenging in the field of humanities due to complex document ty...
- Position: On the Methodological Pitfalls of Evaluating Base LLMs for Reasoning : Abstract: Existing work investigates the reasoning capabilities of large language models (LLMs) to uncover their limitations, human-like biases and underlying processes. Such studies include evaluatio...
- TruthfulRAG: Resolving Factual-level Conflicts in Retrieval-Augmented Generation with Knowledge Graphs : Abstract: Retrieval-Augmented Generation (RAG) has emerged as a powerful framework for enhancing the capabilities of Large Language Models (LLMs) by integrating retrieval-based methods with generative...
- Knowledge Graphs Generation from Cultural Heritage Texts: Combining LLMs and Ontological Engineering for Scholarly Debates : Abstract: Cultural Heritage texts contain rich knowledge that is difficult to query systematically due to the challenges of converting unstructured discourse into structured Knowledge Graphs (KGs). Th...
- Rectify Evaluation Preference: Improving LLMs' Critique on Math Reasoning via Perplexity-aware Reinforcement Learning : Abstract: To improve Multi-step Mathematical Reasoning (MsMR) of Large Language Models (LLMs), it is crucial to obtain scalable supervision from the corpus by automatically critiquing mistakes in the ...
- Local Hybrid Retrieval-Augmented Document QA : Abstract: Organizations handling sensitive documents face a critical dilemma: adopt cloud-based AI systems that offer powerful question-answering capabilities but compromise data privacy, or maintain ...
- LangGPS: Language Separability Guided Data Pre-Selection for Joint Multilingual Instruction Tuning : Abstract: Joint multilingual instruction tuning is a widely adopted approach to improve the multilingual instruction-following ability and downstream performance of large language models (LLMs), but t...
- EffiReason-Bench: A Unified Benchmark for Evaluating and Advancing Efficient Reasoning in Large Language Models : Abstract: Large language models (LLMs) with Chain-of-Thought (CoT) prompting achieve strong reasoning but often produce unnecessarily long explanations, increasing cost and sometimes reducing accuracy...
- Text2SQL-Flow: A Robust SQL-Aware Data Augmentation Framework for Text-to-SQL : Abstract: The data-centric paradigm has become pivotal in AI, especially for Text-to-SQL, where performance is limited by scarce, simplistic, and low-diversity datasets. To address this, we propose Te...
- Beyond the Black Box: Demystifying Multi-Turn LLM Reasoning with VISTA : Abstract: Recent research has increasingly focused on the reasoning capabilities of Large Language Models (LLMs) in multi-turn interactions, as these scenarios more closely mirror real-world problem-s...
- ELYADATA & LIA at NADI 2025: ASR and ADI Subtasks : Abstract: This paper describes Elyadata \& LIA's joint submission to the NADI multi-dialectal Arabic Speech Processing 2025. We participated in the Spoken Arabic Dialect Identification (ADI) and multi...
- Format Matters: The Robustness of Multimodal LLMs in Reviewing Evidence from Tables and Charts : Abstract: With the growing number of submitted scientific papers, there is an increasing demand for systems that can assist reviewers in evaluating research claims. Experimental results are a core com...
- ADI-20: Arabic Dialect Identification dataset and models : Abstract: We present ADI-20, an extension of the previously published ADI-17 Arabic Dialect Identification (ADI) dataset. ADI-20 covers all Arabic-speaking countries' dialects. It comprises 3,556 hour...
- GraphIF: Enhancing Multi-Turn Instruction Following for Large Language Models with Relation Graph Prompt : Abstract: Multi-turn instruction following is essential for building intelligent conversational systems that can consistently adhere to instructions across dialogue turns. However, existing approaches...
- Do Language Models Associate Sound with Meaning? A Multimodal Study of Sound Symbolism : Abstract: Sound symbolism is a linguistic concept that refers to non-arbitrary associations between phonetic forms and their meanings. We suggest that this can be a compelling probe into how Multimoda...
- ScaleFormer: Span Representation Cumulation for Long-Context Transformer : Abstract: The quadratic complexity of standard self-attention severely limits the application of Transformer-based models to long-context tasks. While efficient Transformer variants exist, they often ...
- FinNuE: Exposing the Risks of Using BERTScore for Numerical Semantic Evaluation in Finance : Abstract: BERTScore has become a widely adopted metric for evaluating semantic similarity between natural language sentences. However, we identify a critical limitation: BERTScore exhibits low sensiti...
- Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitigation : Abstract: Multilingual Retrieval-Augmented Generation (RAG) enables large language models (LLMs) to perform knowledge-intensive tasks in multilingual settings by leveraging retrieved documents as exte...
- Modeling Uncertainty Trends for Timely Retrieval in Dynamic RAG : Abstract: Dynamic retrieval-augmented generation (RAG) allows large language models (LLMs) to fetch external knowledge on demand, offering greater adaptability than static RAG. A central challenge in ...
- NumPert: Numerical Perturbations to Probe Language Models for Veracity Prediction : Abstract: Large language models show strong performance on knowledge intensive tasks such as fact-checking and question answering, yet they often struggle with numerical reasoning. We present a system...
- REAP: Enhancing RAG with Recursive Evaluation and Adaptive Planning for Multi-Hop Question Answering : Abstract: Retrieval-augmented generation (RAG) has been extensively employed to mitigate hallucinations in large language models (LLMs). However, existing methods for multi-hop reasoning tasks often l...
- Leveraging Large Language Models for Identifying Knowledge Components : Abstract: Knowledge Components (KCs) are foundational to adaptive learning systems, but their manual identification by domain experts is a significant bottleneck. While Large Language Models (LLMs) of...
- MINDS: A Cross-cultural Dialogue Corpus for Social Norm Classification and Adherence Detection : Abstract: Social norms are implicit, culturally grounded expectations that guide interpersonal communication. Unlike factual commonsense, norm reasoning is subjective, context-dependent, and varies ac...
- HI-TransPA: Hearing Impairments Translation Personal Assistant : Abstract: To provide a unified and flexible solution for daily communication among hearing-impaired individuals, we introduce the Omni-Model paradigm into assistive technology and present HI-TransPA, ...
- EnchTable: Unified Safety Alignment Transfer in Fine-tuned Large Language Models : Abstract: Many machine learning models are fine-tuned from large language models (LLMs) to achieve high performance in specialized domains like code generation, biomedical analysis, and mathematical p...
- In-Token Rationality Optimization: Towards Accurate and Concise LLM Reasoning via Self-Feedback : Abstract: Training Large Language Models (LLMs) for chain-of-thought reasoning presents a significant challenge: supervised fine-tuning on a single "golden" rationale hurts generalization as it penali...
- TermGPT: Multi-Level Contrastive Fine-Tuning for Terminology Adaptation in Legal and Financial Domain : Abstract: Large language models (LLMs) have demonstrated impressive performance in text generation tasks; however, their embedding spaces often suffer from the isotropy problem, resulting in poor disc...
- Answering Students' Questions on Course Forums Using Multiple Chain-of-Thought Reasoning and Finetuning RAG-Enabled LLM : Abstract: The course forums are increasingly significant and play vital role in facilitating student discussions and answering their questions related to the course. It provides a platform for student...
- Improving Graduate Outcomes by Identifying Skills Gaps and Recommending Courses Based on Career Interests : Abstract: This paper aims to address the challenge of selecting relevant courses for students by proposing the design and development of a course recommendation system. The course recommendation syste...
- Khmer Spellchecking: A Holistic Approach : Abstract: Compared to English and other high-resource languages, spellchecking for Khmer remains an unresolved problem due to several challenges. First, there are misalignments between words in the le...
- TARG: Training-Free Adaptive Retrieval Gating for Efficient RAG : Abstract: Retrieval-Augmented Generation (RAG) improves factuality but retrieving for every query often hurts quality while inflating tokens and latency. We propose Training-free Adaptive Retrieval Ga...
- Contextual morphologically-guided tokenization for Latin encoder models : Abstract: Tokenization is a critical component of language model pretraining, yet standard tokenization methods often prioritize information-theoretical goals like high compression and low fertility r...
- Order Matters: Rethinking Prompt Construction in In-Context Learning : Abstract: In-context learning (ICL) enables large language models to perform new tasks by conditioning on a sequence of examples. Most prior work reasonably and intuitively assumes that which examples...
- Omnilingual ASR: Open-Source Multilingual Speech Recognition for 1600+ Languages : Abstract: Automatic speech recognition (ASR) has advanced in high-resource languages, but most of the world's 7,000+ languages remain unsupported, leaving thousands of long-tail languages behind. Expa...
- Attri-Net: A Globally and Locally Inherently Interpretable Model for Multi-Label Classification Using Class-Specific Counterfactuals : Abstract: Interpretability is crucial for machine learning algorithms in high-stakes medical applications. However, high-performing neural networks typically cannot explain their predictions. Post-hoc...
- Provably Scalable Black-Box Variational Inference with Structured Variational Families : Abstract: Variational families with full-rank covariance approximations are known not to work well in black-box variational inference (BBVI), both empirically and theoretically. In fact, recent comput...
- Spectral methods for Neural Integral Equations : Abstract: Neural integral equations are deep learning models based on the theory of integral equations, where the model consists of an integral operator and the corresponding equation (of the second k...
- HyperEvent: A Strong Baseline for Dynamic Link Prediction via Relative Structural Encoding : Abstract: Learning representations for continuous-time dynamic graphs is critical for dynamic link prediction. While recent methods have become increasingly complex, the field lacks a strong and infor...
- DuoGPT: Training-free Dual Sparsity through Activation-aware Pruning in LLMs : Abstract: Large language models (LLMs) deliver strong performance but are difficult to deploy due to high memory and compute costs. While pruning reduces these demands, most methods ignore activation ...
- Caption, Create, Continue: Continual Learning with Pre-trained Generative Vision-Language Models : Abstract: Continual learning (CL) enables models to adapt to evolving data streams without catastrophic forgetting, a fundamental requirement for real-world AI systems. However, the current methods of...
- Distribution Learning Meets Graph Structure Sampling : Abstract: This work establishes a novel link between the problem of PAC-learning high-dimensional graphical models and the task of (efficient) counting and sampling of graph structures, using an onlin...
- Lipschitz-Regularized Critics Lead to Policy Robustness Against Transition Dynamics Uncertainty : Abstract: Uncertainties in transition dynamics pose a critical challenge in reinforcement learning (RL), often resulting in performance degradation of trained policies when deployed on hardware. Many ...
- Effector: A Python package for regional explanations : Abstract: Effector is a Python package for interpreting machine learning (ML) models that are trained on tabular data through global and regional feature effects. Global effects, like Partial Dependen...
- Reassessing feature-based Android malware detection in a contemporary context : Abstract: We report the findings of a reimplementation of 18 foundational studies in feature-based machine learning for Android malware detection, published during the period 2013-2023. These studies ...
- Transfer in Reinforcement Learning via Regret Bounds for Learning Agents : Abstract: We present an approach for the quantification of the usefulness of transfer in reinforcement learning via regret bounds for a multi-agent setting. Considering a number of $\aleph$ agents ope...
- Robot Crash Course: Learning Soft and Stylized Falling : Abstract: Despite recent advances in robust locomotion, bipedal robots operating in the real world remain at risk of falling. While most research focuses on preventing such events, we instead concentr...
- Global Solutions to Non-Convex Functional Constrained Problems with Hidden Convexity : Abstract: Constrained non-convex optimization is fundamentally challenging, as global solutions are generally intractable and constraint qualifications may not hold. However, in many applications, inc...
- Multitask GLocal OBIA-Mamba for Sentinel-2 Landcover Mapping : Abstract: Although Sentinel-2 based land use and land cover (LULC) classification is critical for various environmental monitoring applications, it is a very difficult task due to some key data challe...
- Benchmarking Diversity in Image Generation via Attribute-Conditional Human Evaluation : Abstract: Despite advances in generation quality, current text-to-image (T2I) models often lack diversity, generating homogeneous outputs. This work introduces a framework to address the need for robu...
- Two Americas of Well-Being: Divergent Rural-Urban Patterns of Life Satisfaction and Happiness from 2.6 B Social Media Posts : Abstract: Using 2.6 billion geolocated social-media posts (2014-2022) and a fine-tuned generative language model, we construct county-level indicators of life satisfaction and happiness for the United...
- Edge Machine Learning for Cluster Counting in Next-Generation Drift Chambers : Abstract: Drift chambers have long been central to collider tracking, but future machines like a Higgs factory motivate higher granularity and cluster counting for particle ID, posing new data process...
- Don't Waste It: Guiding Generative Recommenders with Structured Human Priors via Multi-head Decoding : Abstract: Optimizing recommender systems for objectives beyond accuracy, such as diversity, novelty, and personalization, is crucial for long-term user satisfaction. To this end, industrial practition...
- OpenSR-SRGAN: A Flexible Super-Resolution Framework for Multispectral Earth Observation Data : Abstract: We present OpenSR-SRGAN, an open and modular framework for single-image super-resolution in Earth Observation. The software provides a unified implementation of SRGAN-style models that is ea...
- Continuum Dropout for Neural Differential Equations : Abstract: Neural Differential Equations (NDEs) excel at modeling continuous-time dynamics, effectively handling challenges such as irregular observations, missing values, and noise. Despite their adva...
- Physics informed Transformer-VAE for biophysical parameter estimation: PROSAIL model inversion in Sentinel-2 imagery : Abstract: Accurate retrieval of vegetation biophysical variables from satellite imagery is crucial for ecosystem monitoring and agricultural management. In this work, we propose a physics-informed Tra...
- Operator Models for Continuous-Time Offline Reinforcement Learning : Abstract: Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often...
- Revisiting Evaluation of Deep Neural Networks for Pedestrian Detection : Abstract: Reliable pedestrian detection represents a crucial step towards automated driving systems. However, the current performance benchmarks exhibit weaknesses. The currently applied metrics for v...
- Fault Detection in Solar Thermal Systems using Probabilistic Reconstructions : Abstract: Solar thermal systems (STS) present a promising avenue for low-carbon heat generation, with a well-running system providing heat at minimal cost and carbon emissions. However, STS can exhibi...
- Causal Model-Based Reinforcement Learning for Sample-Efficient IoT Channel Access : Abstract: Despite the advantages of multi-agent reinforcement learning (MARL) for wireless use case such as medium access control (MAC), their real-world deployment in Internet of Things (IoT) is hind...
- Generalizing to Unseen Disaster Events: A Causal View : Abstract: Due to the rapid growth of social media platforms, these tools have become essential for monitoring information during ongoing disaster events. However, extracting valuable insights requires...
- Physics-informed Machine Learning for Static Friction Modeling in Robotic Manipulators Based on Kolmogorov-Arnold Networks : Abstract: Friction modeling plays a crucial role in achieving high-precision motion control in robotic operating systems. Traditional static friction models (such as the Stribeck model) are widely use...
- Multi-agent In-context Coordination via Decentralized Memory Retrieval : Abstract: Large transformer models, trained on diverse datasets, have demonstrated impressive few-shot performance on previously unseen tasks without requiring parameter updates. This capability has a...
- Global Convergence of Four-Layer Matrix Factorization under Random Initialization : Abstract: Gradient descent dynamics on the deep matrix factorization problem is extensively studied as a simplified theoretical model for deep neural networks. Although the convergence theory for two-...
- Beyond empirical models: Discovering new constitutive laws in solids with graph-based equation discovery : Abstract: Constitutive models are fundamental to solid mechanics and materials science, underpinning the quantitative description and prediction of material responses under diverse loading conditions....
- Theory and computation for structured variational inference : Abstract: Structured variational inference constitutes a core methodology in modern statistical applications. Unlike mean-field variational inference, the approximate posterior is assumed to have inte...
- HierRouter: Coordinated Routing of Specialized Large Language Models via Reinforcement Learning : Abstract: Large Language Models (LLMs) deliver state-of-the-art performance across many tasks but impose high computational and memory costs, limiting their deployment in resource-constrained or real-...
- Generalized infinite dimensional Alpha-Procrustes based geometries : Abstract: This work extends the recently introduced Alpha-Procrustes family of Riemannian metrics for symmetric positive definite (SPD) matrices by incorporating generalized versions of the Bures-Wass...
- A Robust Task-Level Control Architecture for Learned Dynamical Systems : Abstract: Dynamical system (DS)-based learning from demonstration (LfD) is a powerful tool for generating motion plans in the operation (`task') space of robotic systems. However, the realization of t...
- Symmetry aware Reynolds Averaged Navier Stokes turbulence models with equivariant neural networks : Abstract: Accurate and generalizable Reynolds-averaged Navier-Stokes (RANS) models for turbulent flows rely on effective closures. We introduce tensor-based, symmetry aware closures using equivariant ...
- Modelos Empiricos de Pos-Dupla Selecao por LASSO: Discussoes para Estudos do Transporte Aereo : Abstract: This paper presents and discusses forms of estimation by regularized regression and model selection using the LASSO method - Least Absolute Shrinkage and Selection Operator. LASSO is recogni...
- Gradient-Guided Exploration of Generative Model's Latent Space for Controlled Iris Image Augmentations : Abstract: Developing reliable iris recognition and presentation attack detection methods requires diverse datasets that capture realistic variations in iris features and a wide spectrum of anomalies. ...
- Assessing the Applicability of Natural Language Processing to Traditional Social Science Methodology: A Case Study in Identifying Strategic Signaling Patterns in Presidential Directives : Abstract: Our research investigates how Natural Language Processing (NLP) can be used to extract main topics from a larger corpus of written data, as applied to the case of identifying signaling theme...
- A Fourier-Based Global Denoising Model for Smart Artifacts Removing of Microscopy Images : Abstract: Microscopy such as Scanning Tunneling Microscopy (STM), Atomic Force Microscopy (AFM) and Scanning Electron Microscopy (SEM) are essential tools in material imaging at micro- and nanoscale r...
- The Data Fusion Labeler (dFL): Challenges and Solutions to Data Harmonization, Labeling, and Provenance in Fusion Energy : Abstract: Fusion energy research increasingly depends on the ability to integrate heterogeneous, multimodal datasets from high-resolution diagnostics, control systems, and multiscale simulations. The ...
- Masked Mineral Modeling: Continent-Scale Mineral Prospecting via Geospatial Infilling : Abstract: Minerals play a critical role in the advanced energy technologies necessary for decarbonization, but characterizing mineral deposits hidden underground remains costly and challenging. Inspir...
- Classifying Phonotrauma Severity from Vocal Fold Images with Soft Ordinal Regression : Abstract: Phonotrauma refers to vocal fold tissue damage resulting from exposure to forces during voicing. It occurs on a continuum from mild to severe, and treatment options can vary based on severit...
- PriVi: Towards A General-Purpose Video Model For Primate Behavior In The Wild : Abstract: Non-human primates are our closest living relatives, and analyzing their behavior is central to research in cognition, evolution, and conservation. Computer vision could greatly aid this res...
- Lithological Controls on the Permeability of Geologic Faults: Surrogate Modeling and Sensitivity Analysis : Abstract: Fault zones exhibit complex and heterogeneous permeability structures influenced by stratigraphic, compositional, and structural factors, making them critical yet uncertain components in sub...
- Analysis of the TAIGA-HiSCORE Data Using the Latent Space of Autoencoders : Abstract: The aim of extensive air shower (EAS) analysis is to reconstruct the physical parameters of the primary particle that initiated the shower. The TAIGA experiment is a hybrid detector system t...
- Siegel Neural Networks : Abstract: Riemannian symmetric spaces (RSS) such as hyperbolic spaces and symmetric positive definite (SPD) manifolds have become popular spaces for representation learning. In this paper, we propose ...
- Algorithm Design and Stronger Guarantees for the Improving Multi-Armed Bandits Problem : Abstract: The improving multi-armed bandits problem is a formal model for allocating effort under uncertainty, motivated by scenarios such as investing research effort into new technologies, performin...
- Pretrained Joint Predictions for Scalable Batch Bayesian Optimization of Molecular Designs : Abstract: Batched synthesis and testing of molecular designs is the key bottleneck of drug development. There has been great interest in leveraging biomolecular foundation models as surrogates to acce...
- Tight Robustness Certification through the Convex Hull of $\ell_0$ Attacks : Abstract: Few-pixel attacks mislead a classifier by modifying a few pixels of an image. Their perturbation space is an $\ell_0$-ball, which is not convex, unlike $\ell_p$-balls for $p\geq1$. However, ...
- Semi-Unified Sparse Dictionary Learning with Learnable Top-K LISTA and FISTA Encoders : Abstract: We present a semi-unified sparse dictionary learning framework that bridges the gap between classical sparse models and modern deep architectures. Specifically, the method integrates strict ...
- Belief Net: A Filter-Based Framework for Learning Hidden Markov Models from Observations : Abstract: Hidden Markov Models (HMMs) are fundamental for modeling sequential data, yet learning their parameters from observations remains challenging. Classical methods like the Baum-Welch (EM) algo...
- Oya: Deep Learning for Accurate Global Precipitation Estimation : Abstract: Accurate precipitation estimation is critical for hydrological applications, especially in the Global South where ground-based observation networks are sparse and forecasting skill is limite...
- Maximizing Efficiency of Dataset Compression for Machine Learning Potentials With Information Theory : Abstract: Machine learning interatomic potentials (MLIPs) balance high accuracy and lower costs compared to density functional theory calculations, but their performance often depends on the size and ...
- Holonorm : Abstract: Normalization is a key point in transformer training . In Dynamic Tanh (DyT), the author demonstrated that Tanh can be used as an alternative layer normalization (LN) and confirmed the effec...
- Weak Relation Enforcement for Kinematic-Informed Long-Term Stock Prediction with Artificial Neural Networks : Abstract: We propose loss function week enforcement of the velocity relations between time-series points in the Kinematic-Informed artificial Neural Networks (KINN) for long-term stock prediction. Pro...
- Panda: Test-Time Adaptation with Negative Data Augmentation : Abstract: Pretrained VLMs exhibit strong zero-shot classification capabilities, but their predictions degrade significantly under common image corruptions. To improve robustness, many test-time adapta...
- Intrinsic Dimensionality as a Model-Free Measure of Class Imbalance : Abstract: Imbalance in classification tasks is commonly quantified by the cardinalities of examples across classes. This, however, disregards the presence of redundant examples and inherent difference...
- Neuronal Fluctuations: Learning Rates vs Participating Neurons : Abstract: Deep Neural Networks (DNNs) rely on inherent fluctuations in their internal parameters (weights and biases) to effectively navigate the complex optimization landscape and achieve robust perf...
- Unlocking Dynamic Inter-Client Spatial Dependencies: A Federated Spatio-Temporal Graph Learning Method for Traffic Flow Forecasting : Abstract: Spatio-temporal graphs are powerful tools for modeling complex dependencies in traffic time series. However, the distributed nature of real-world traffic data across multiple stakeholders po...
- Product distribution learning with imperfect advice : Abstract: Given i.i.d.~samples from an unknown distribution $P$, the goal of distribution learning is to recover the parameters of a distribution that is close to $P$. When $P$ belongs to the class of...
- Gradient Flow Equations for Deep Linear Neural Networks: A Survey from a Network Perspective : Abstract: The paper surveys recent progresses in understanding the dynamics and loss landscape of the gradient flow equations associated to deep linear neural networks, i.e., the gradient descent trai...
- Robust Decentralized Multi-armed Bandits: From Corruption-Resilience to Byzantine-Resilience : Abstract: Decentralized cooperative multi-agent multi-armed bandits (DeCMA2B) considers how multiple agents collaborate in a decentralized multi-armed bandit setting. Though this problem has been exte...
- EDGC: Entropy-driven Dynamic Gradient Compression for Efficient LLM Training : Abstract: Training large language models (LLMs) poses significant challenges regarding computational resources and memory capacity. Although distributed training techniques help mitigate these issues,...
- PITE: Multi-Prototype Alignment for Individual Treatment Effect Estimation : Abstract: Estimating Individual Treatment Effects (ITE) from observational data is challenging due to confounding bias. Most studies tackle this bias by balancing distributions globally, but ignore in...
- OutSafe-Bench: A Benchmark for Multimodal Offensive Content Detection in Large Language Models : Abstract: Since Multimodal Large Language Models (MLLMs) are increasingly being integrated into everyday tools and intelligent agents, growing concerns have arisen regarding their possible output of u...
- Unitho: A Unified Multi-Task Framework for Computational Lithography : Abstract: Reliable, generalizable data foundations are critical for enabling large-scale models in computational lithography. However, essential tasks-mask generation, rule violation detection, and la...
- FedCure: Mitigating Participation Bias in Semi-Asynchronous Federated Learning with Non-IID Data : Abstract: While semi-asynchronous federated learning (SAFL) combines the efficiency of synchronous training with the flexibility of asynchronous updates, it inherently suffers from participation bias,...
- Out-of-Context Misinformation Detection via Variational Domain-Invariant Learning with Test-Time Training : Abstract: Out-of-context misinformation (OOC) is a low-cost form of misinformation in news reports, which refers to place authentic images into out-of-context or fabricated image-text pairings. This p...
- Beyond MSE: Ordinal Cross-Entropy for Probabilistic Time Series Forecasting : Abstract: Time series forecasting is an important task that involves analyzing temporal dependencies and underlying patterns (such as trends, cyclicality, and seasonality) in historical data to predic...
- Towards Leveraging Sequential Structure in Animal Vocalizations : Abstract: Animal vocalizations contain sequential structures that carry important communicative information, yet most computational bioacoustics studies average the extracted frame-level features acro...
- EPO: Diverse and Realistic Protein Ensemble Generation via Energy Preference Optimization : Abstract: Accurate exploration of protein conformational ensembles is essential for uncovering function but remains hard because molecular-dynamics (MD) simulations suffer from high computational cost...
- RI-Loss: A Learnable Residual-Informed Loss for Time Series Forecasting : Abstract: Time series forecasting relies on predicting future values from historical data, yet most state-of-the-art approaches-including transformer and multilayer perceptron-based models-optimize us...
- How does My Model Fail? Automatic Identification and Interpretation of Physical Plausibility Failure Modes with Matryoshka Transcoders : Abstract: Although recent generative models are remarkably capable of producing instruction-following and realistic outputs, they remain prone to notable physical plausibility failures. Though critica...
- Tree-Based Stochastic Optimization for Solving Large-Scale Urban Network Security Games : Abstract: Urban Network Security Games (UNSGs), which model the strategic allocation of limited security resources on city road networks, are critical for urban safety. However, finding a Nash Equilib...
- FAQNAS: FLOPs-aware Hybrid Quantum Neural Architecture Search using Genetic Algorithm : Abstract: Hybrid Quantum Neural Networks (HQNNs), which combine parameterized quantum circuits with classical neural layers, are emerging as promising models in the noisy intermediate-scale quantum (N...
- From Static Structures to Ensembles: Studying and Harnessing Protein Structure Tokenization : Abstract: Protein structure tokenization converts 3D structures into discrete or vectorized representations, enabling the integration of structural and sequence data. Despite many recent works on stru...
- SVD-NO: Learning PDE Solution Operators with SVD Integral Kernels : Abstract: Neural operators have emerged as a promising paradigm for learning solution operators of partial differential equa- tions (PDEs) directly from data. Existing methods, such as those based on ...
- GraphSB: Boosting Imbalanced Node Classification on Graphs through Structural Balance : Abstract: Imbalanced node classification is a critical challenge in graph learning, where most existing methods typically utilize Graph Neural Networks (GNNs) to learn node representations. These meth...
- Interaction as Interference: A Quantum-Inspired Aggregation Approach : Abstract: Classical approaches often treat interaction as engineered product terms or as emergent patterns in flexible models, offering little control over how synergy or antagonism arises. We take a ...
- DemoTuner: Efficient DBMS Knobs Tuning via LLM-Assisted Demonstration Reinforcement Learning : Abstract: The performance of modern DBMSs such as MySQL and PostgreSQL heavily depends on the configuration of performance-critical knobs. Manual tuning these knobs is laborious and inefficient due to...
- A Novel Data-Dependent Learning Paradigm for Large Hypothesis Classes : Abstract: We address the general task of learning with a set of candidate models that is too large to have a uniform convergence of empirical estimates to true losses. While the common approach to suc...
- Towards Robust Multimodal Learning in the Open World : Abstract: The rapid evolution of machine learning has propelled neural networks to unprecedented success across diverse domains. In particular, multimodal learning has emerged as a transformative para...
- Rediscovering the Lunar Equation of the Centre with AI Feynman via Embedded Physical Biases : Abstract: This work explores using the physics-inspired AI Feynman symbolic regression algorithm to automatically rediscover a fundamental equation in astronomy -- the Equation of the Centre. Through ...
- Autonomous Concept Drift Threshold Determination : Abstract: Existing drift detection methods focus on designing sensitive test statistics. They treat the detection threshold as a fixed hyperparameter, set once to balance false alarms and late detecti...
- Towards Multiple Missing Values-resistant Unsupervised Graph Anomaly Detection : Abstract: Unsupervised graph anomaly detection (GAD) has received increasing attention in recent years, which aims to identify data anomalous patterns utilizing only unlabeled node information from gr...
- Incremental Generation is Necessity and Sufficient for Universality in Flow-Based Modelling : Abstract: Incremental flow-based denoising models have reshaped generative modelling, but their empirical advantage still lacks a rigorous approximation-theoretic foundation. We show that incremental ...
- Explore and Establish Synergistic Effects Between Weight Pruning and Coreset Selection in Neural Network Training : Abstract: Modern deep neural networks rely heavily on massive model weights and training samples, incurring substantial computational costs. Weight pruning and coreset selection are two emerging parad...
- Uncertainty-Guided Checkpoint Selection for Reinforcement Finetuning of Large Language Models : Abstract: Reinforcement learning (RL) finetuning is crucial to aligning large language models (LLMs), but the process is notoriously unstable and exhibits high variance across model checkpoints. In pr...
- Unlearning Imperative: Securing Trustworthy and Responsible LLMs through Engineered Forgetting : Abstract: The growing use of large language models in sensitive domains has exposed a critical weakness: the inability to ensure that private information can be permanently forgotten. Yet these system...
- ConSurv: Multimodal Continual Learning for Survival Analysis : Abstract: Survival prediction of cancers is crucial for clinical practice, as it informs mortality risks and influences treatment plans. However, a static model trained on a single dataset fails to ad...
- Steering Pretrained Drafters during Speculative Decoding : Abstract: Speculative decoding accelerates language model inference by separating generation into fast drafting and parallel verification. Its main limitation is drafter-verifier misalignment, which l...
- ACT as Human: Multimodal Large Language Model Data Annotation with Critical Thinking : Abstract: Supervised learning relies on high-quality labeled data, but obtaining such data through human annotation is both expensive and time-consuming. Recent work explores using large language mode...
- Learning Intersections of Halfspaces under Factorizable Distribution : Abstract: Learning intersections of halfspaces is a central problem in Computational Learning Theory. Even for just two halfspaces, it remains a major open question whether learning is possible in pol...
- SMoFi: Step-wise Momentum Fusion for Split Federated Learning on Heterogeneous Data : Abstract: Split Federated Learning is a system-efficient federated learning paradigm that leverages the rich computing resources at a central server to train model partitions. Data heterogeneity acros...
- Beyond Monotonicity: Revisiting Factorization Principles in Multi-Agent Q-Learning : Abstract: Value decomposition is a central approach in multi-agent reinforcement learning (MARL), enabling centralized training with decentralized execution by factorizing the global value function in...
- CaReTS: A Multi-Task Framework Unifying Classification and Regression for Time Series Forecasting : Abstract: Recent advances in deep forecasting models have achieved remarkable performance, yet most approaches still struggle to provide both accurate predictions and interpretable insights into tempo...
- Hail to the Thief: Exploring Attacks and Defenses in Decentralised GRPO : Abstract: Group Relative Policy Optimization (GRPO) has demonstrated great utilization in post-training of Large Language Models (LLMs). In GRPO, prompts are answered by the model and, through reinfor...
- NeuroLingua: A Language-Inspired Hierarchical Framework for Multimodal Sleep Stage Classification Using EEG and EOG : Abstract: Automated sleep stage classification from polysomnography remains limited by the lack of expressive temporal hierarchies, challenges in multimodal EEG and EOG fusion, and the limited interpr...
- Is nasty noise actually harder than malicious noise? : Abstract: We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise mod...
- Data Heterogeneity and Forgotten Labels in Split Federated Learning : Abstract: In Split Federated Learning (SFL), the clients collaboratively train a model with the help of a server by splitting the model into two parts. Part-1 is trained locally at each client and agg...
- FlowCast: Advancing Precipitation Nowcasting with Conditional Flow Matching : Abstract: Radar-based precipitation nowcasting, the task of forecasting short-term precipitation fields from previous radar images, is a critical problem for flood risk management and decision-making....
- Generalizing PDE Emulation with Equation-Aware Neural Operators : Abstract: Solving partial differential equations (PDEs) can be prohibitively expensive using traditional numerical methods. Deep learning-based surrogate models typically specialize in a single PDE wi...
- Efficient Hyperdimensional Computing with Modular Composite Representations : Abstract: The modular composite representation (MCR) is a computing model that represents information with high-dimensional integer vectors using modular arithmetic. Originally proposed as a generaliz...
- ConstrainedSQL: Training LLMs for Text2SQL via Constrained Reinforcement Learning : Abstract: Reinforcement learning (RL) has demonstrated significant promise in enhancing the reasoning capabilities of Text2SQL LLMs, especially with advanced algorithms such as GRPO and DAPO. However,...
- Boosted GFlowNets: Improving Exploration via Sequential Learning : Abstract: Generative Flow Networks (GFlowNets) are powerful samplers for compositional objects that, by design, sample proportionally to a given non-negative reward. Nonetheless, in practice, they oft...
- GEM+: Scalable State-of-the-Art Private Synthetic Data with Generator Networks : Abstract: State-of-the-art differentially private synthetic tabular data has been defined by adaptive 'select-measure-generate' frameworks, exemplified by methods like AIM. These approaches iterativel...
- Generalization Can Emerge in Tabular Foundation Models From a Single Table : Abstract: Deep tabular modelling increasingly relies on in-context learning where, during inference, a model receives a set of $(x,y)$ pairs as context and predicts labels for new inputs without weigh...
- Parametric Expensive Multi-Objective Optimization via Generative Solution Modeling : Abstract: Many real-world applications require solving families of expensive multi-objective optimization problems~(EMOPs) under varying operational conditions. This gives rise to parametric expensive...
- Making Every Head Count: Sparse Attention Without the Speed-Performance Trade-off : Abstract: The design of Large Language Models (LLMs) has long been hampered by a fundamental conflict within their core attention mechanism: its remarkable expressivity is built upon a computational c...
- DynamicRTL: RTL Representation Learning for Dynamic Circuit Behavior : Abstract: There is a growing body of work on using Graph Neural Networks (GNNs) to learn representations of circuits, focusing primarily on their static characteristics. However, these models fail to ...
- Group Averaging for Physics Applications: Accuracy Improvements at Zero Training Cost : Abstract: Many machine learning tasks in the natural sciences are precisely equivariant to particular symmetries. Nonetheless, equivariant methods are often not employed, perhaps because training is p...
- Filtering Jump Markov Systems with Partially Known Dynamics: A Model-Based Deep Learning Approach : Abstract: This paper presents the Jump Markov Filtering Network (JMFNet), a novel model-based deep learning framework for real-time state-state estimation in jump Markov systems with unknown noise sta...
- Let the Experts Speak: Improving Survival Prediction & Calibration via Mixture-of-Experts Heads : Abstract: Deep mixture-of-experts models have attracted a lot of attention for survival analysis problems, particularly for their ability to cluster similar patients together. In practice, grouping of...
- Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests : Abstract: Password security plays a crucial role in cybersecurity, yet traditional password strength meters, which rely on static rules like character-type requirements, often fail. Such methods are e...
- Leveraging Large Language Models for Use Case Model Generation from Software Requirements : Abstract: Use case modeling employs user-centered scenarios to outline system requirements. These help to achieve consensus among relevant stakeholders. Because the manual creation of use case models ...
- Enhancing PIBT via Multi-Action Operations : Abstract: PIBT is a rule-based Multi-Agent Path Finding (MAPF) solver, widely used as a low-level planner or action sampler in many state-of-the-art approaches. Its primary advantage lies in its excep...
- Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support : Abstract: Financial advisors and investors struggle with information overload from financial news, where irrelevant content and noise obscure key market signals and hinder timely investment decisions....
- Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance : Abstract: Automated visual inspection in medical-device manufacturing faces unique challenges, including extremely low defect rates, limited annotated data, hardware restrictions on production lines, ...
- ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory : Abstract: Data scarcity continues to be a major challenge in the field of robotic manipulation. Although diffusion models provide a promising solution for generating robotic manipulation videos, exist...
- Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA : Abstract: Medical imaging plays a pivotal role in modern healthcare, with computed tomography pulmonary angiography (CTPA) being a critical tool for diagnosing pulmonary embolism and other thoracic co...
- Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data : Abstract: Leveraging multimodal data to drive breakthroughs in e-commerce applications through Multimodal Foundation Models (MFMs) is gaining increasing attention from the research community. However,...
- Multi-Turn Interactions for Text-to-SQL with Large Language Models : Abstract: This study explores text-to-SQL parsing by leveraging the powerful reasoning capabilities of large language models (LLMs). Despite recent advancements, existing LLM-based methods are still i...
- Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool : Abstract: While extensive research has focused on ChatGPT in recent years, very few studies have systematically quantified and compared linguistic features between human-written and Artificial Intelli...
- GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets : Abstract: We study GCS-TSP, a new variant of the Traveling Salesman Problem (TSP) defined over a Graph of Convex Sets (GCS) -- a powerful representation for trajectory planning that decomposes the con...
- Enhancing Conflict Resolution in Language Models via Abstract Argumentation : Abstract: In recent years, large language models (LLMs) have made significant advancements in developing human-like and engaging dialogue systems. However, in tasks such as consensus-building and pers...
- A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning : Abstract: Multi-modal conversation emotion recognition (MCER) aims to recognize and track the speaker's emotional state using text, speech, and visual information in the conversation scene. Analyzing ...
- Black-Box On-Policy Distillation of Large Language Models : Abstract: Black-box distillation creates student large language models (LLMs) by learning from a proprietary teacher model's text outputs alone, without access to its internal logits or parameters. In...
- Instella: Fully Open Language Models with Stellar Performance : Abstract: Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limitin...
- SSR: Socratic Self-Refine for Large Language Model Reasoning : Abstract: Large Language Models (LLMs) have demonstrated remarkable reasoning abilities, yet existing test-time frameworks often rely on coarse self-verification and self-correction, limiting their ef...
- Know Your Limits: Entropy Estimation Modeling for Compression and Generalization : Abstract: Language prediction is constrained by informational entropy intrinsic to language, such that there exists a limit to how accurate any language model can become and equivalently a lower bound...
- Towards an Agentic Workflow for Internet Measurement Research : Abstract: Internet measurement research faces an accessibility crisis: complex analyses require custom integration of multiple specialized tools that demands specialized domain expertise. When network...
- Mined Prompting and Metadata-Guided Generation for Wound Care Visual Question Answering : Abstract: The rapid expansion of asynchronous remote care has intensified provider workload, creating demand for AI systems that can assist clinicians in managing patient queries more efficiently. The...
- Textual understanding boost in the WikiRace : Abstract: The WikiRace game, where players navigate between Wikipedia articles using only hyperlinks, serves as a compelling benchmark for goal-directed search in complex information networks. This pa...
- Evaluating Prompting Strategies with MedGemma for Medical Order Extraction : Abstract: The accurate extraction of medical orders from doctor-patient conversations is a critical task for reducing clinical documentation burdens and ensuring patient safety. This paper details our...
- Towards Emotionally Intelligent and Responsible Reinforcement Learning : Abstract: Personalized decision systems in healthcare and behavioral support often rely on static rule-based or engagement-maximizing heuristics that overlook users' emotional context and ethical cons...
- Impact of Layer Norm on Memorization and Generalization in Transformers : Abstract: Layer Normalization (LayerNorm) is one of the fundamental components in transformers that stabilizes training and improves optimization. In recent times, Pre-LayerNorm transformers have beco...
- A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space : Abstract: Innovative visual stylization is a cornerstone of artistic creation, yet generating novel and consistent visual styles remains a significant challenge. Existing generative approaches typical...
- From Euler to Today: Universal Mathematical Fallibility A Large-Scale Computational Analysis of Errors in ArXiv Papers : Abstract: We present the results of a large-scale computational analysis of mathematical papers from the ArXiv repository, demonstrating a comprehensive system that not only detects mathematical error...
- Preview, Accept or Discard? A Predictive Low-Motion Interaction Paradigm : Abstract: Repetitive strain injury (RSI) affects roughly one in five computer users and remains largely unresolved despite decades of ergonomic mouse redesign. All such devices share a fundamental lim...
- Say It Differently: Linguistic Styles as Jailbreak Vectors : Abstract: Large Language Models (LLMs) are commonly evaluated for robustness against paraphrased or semantically equivalent jailbreak prompts, yet little attention has been paid to linguistic variatio...
- LOCA-R: Near-Perfect Performance on the Chinese Physics Olympiad 2025 : Abstract: Olympiad-level physics problem-solving presents a significant challenge for both humans and artificial intelligence (AI), as it requires a sophisticated integration of precise calculation, a...
- On the Detectability of Active Gradient Inversion Attacks in Federated Learning : Abstract: One of the key advantages of Federated Learning (FL) is its ability to collaboratively train a Machine Learning (ML) model while keeping clients' data on-site. However, this can create a fal...
- Utility of Pancreas Surface Lobularity as a CT Biomarker for Opportunistic Screening of Type 2 Diabetes : Abstract: Type 2 Diabetes Mellitus (T2DM) is a chronic metabolic disease that affects millions of people worldwide. Early detection is crucial as it can alter pancreas function through morphological c...
- Scalable Synthesis of distributed LLM workloads through Symbolic Tensor Graphs : Abstract: Optimizing the performance of large language models (LLMs) on large-scale AI training and inference systems requires a scalable and expressive mechanism to model distributed workload executi...
- Beyond Elicitation: Provision-based Prompt Optimization for Knowledge-Intensive Tasks : Abstract: While prompt optimization has emerged as a critical technique for enhancing language model performance, existing approaches primarily focus on elicitation-based strategies that search for op...
- LocalBench: Benchmarking LLMs on County-Level Local Knowledge and Reasoning : Abstract: Large language models (LLMs) have been widely evaluated on macro-scale geographic tasks, such as global factual recall, event summarization, and regional reasoning. Yet, their ability to han...
- Reasoning About Intent for Ambiguous Requests : Abstract: Large language models often respond to ambiguous requests by implicitly committing to one interpretation. Intent misunderstandings can frustrate users and create safety risks. To address thi...
- Completion of partial structures using Patterson maps with the CrysFormer machine learning model : Abstract: Protein structure determination has long been one of the primary challenges of structural biology, to which deep machine learning (ML)-based approaches have increasingly been applied. Howeve...
- Improving Perturbation-based Explanations by Understanding the Role of Uncertainty Calibration : Abstract: Perturbation-based explanations are widely utilized to enhance the transparency of machine-learning models in practice. However, their reliability is often compromised by the unknown model b...
- nuPlan-R: A Closed-Loop Planning Benchmark for Autonomous Driving via Reactive Multi-Agent Simulation : Abstract: Recent advances in closed-loop planning benchmarks have significantly improved the evaluation of autonomous vehicles. However, existing benchmarks still rely on rule-based reactive agents su...
- Rethinking the Reliability of Multi-agent System: A Perspective from Byzantine Fault Tolerance : Abstract: Ensuring the reliability of agent architectures and effectively identifying problematic agents when failures occur are crucial challenges in multi-agent systems (MAS). Advances in large lang...
- AgentEvolver: Towards Efficient Self-Evolving Agent System : Abstract: Autonomous agents powered by large language models (LLMs) have the potential to significantly enhance human productivity by reasoning, using tools, and executing complex tasks in diverse env...
- Enhancing Kernel Power K-means: Scalable and Robust Clustering with Random Fourier Features and Possibilistic Method : Abstract: Kernel power $k$-means (KPKM) leverages a family of means to mitigate local minima issues in kernel $k$-means. However, KPKM faces two key limitations: (1) the computational burden of the fu...
- MonkeyOCR v1.5 Technical Report: Unlocking Robust Document Parsing for Complex Patterns : Abstract: Document parsing is a core task in document intelligence, supporting applications such as information extraction, retrieval-augmented generation, and automated document analysis. However, re...
- Simulating Misinformation Propagation in Social Networks using Large Language Models : Abstract: Misinformation on social media thrives on surprise, emotion, and identity-driven reasoning, often amplified through human cognitive biases. To investigate these mechanisms, we model large la...
- SHRUG-FM: Reliability-Aware Foundation Models for Earth Observation : Abstract: Geospatial foundation models for Earth observation often fail to perform reliably in environments underrepresented during pretraining. We introduce SHRUG-FM, a framework for reliability-awar...
- DermAI: Clinical dermatology acquisition through quality-driven image collection for AI classification in mobile : Abstract: AI-based dermatology adoption remains limited by biased datasets, variable image quality, and limited validation. We introduce DermAI, a lightweight, smartphone-based application that enable...
- BhashaKritika: Building Synthetic Pretraining Data at Scale for Indic Languages : Abstract: In the context of pretraining of Large Language Models (LLMs), synthetic data has emerged as an alternative for generating high-quality pretraining data at scale. This is particularly benefi...
- Depth-Consistent 3D Gaussian Splatting via Physical Defocus Modeling and Multi-View Geometric Supervision : Abstract: Three-dimensional reconstruction in scenes with extreme depth variations remains challenging due to inconsistent supervisory signals between near-field and far-field regions. Existing method...
- Rethinking Visual Information Processing in Multimodal LLMs : Abstract: Despite the remarkable success of the LLaVA architecture for vision-language tasks, its design inherently struggles to effectively integrate visual features due to the inherent mismatch betw...
- Adaptive Residual-Update Steering for Low-Overhead Hallucination Mitigation in Large Vision Language Models : Abstract: Large Vision-Language Models (LVLMs) often suffer from object hallucination, generating text inconsistent with visual inputs, which can critically undermine their reliability. Existing infer...
- Torch-Uncertainty: A Deep Learning Framework for Uncertainty Quantification : Abstract: Deep Neural Networks (DNNs) have demonstrated remarkable performance across various domains, including computer vision and natural language processing. However, they often struggle to accura...
- RoboBenchMart: Benchmarking Robots in Retail Environment : Abstract: Most existing robotic manipulation benchmarks focus on simplified tabletop scenarios, typically involving a stationary robotic arm interacting with various objects on a flat surface. To addr...
- Quality Assurance of LLM-generated Code: Addressing Non-Functional Quality Characteristics : Abstract: In recent years, LLMs have been widely integrated into software engineering workflows, supporting tasks like code generation. However, while these models often generate functionally correct ...
- MTR-DuplexBench: Towards a Comprehensive Evaluation of Multi-Round Conversations for Full-Duplex Speech Language Models : Abstract: Full-Duplex Speech Language Models (FD-SLMs) enable real-time, overlapping conversational interactions, offering a more dynamic user experience compared to traditional half-duplex models. Ho...
- H3Former: Hypergraph-based Semantic-Aware Aggregation via Hyperbolic Hierarchical Contrastive Loss for Fine-Grained Visual Classification : Abstract: Fine-Grained Visual Classification (FGVC) remains a challenging task due to subtle inter-class differences and large intra-class variations. Existing approaches typically rely on feature-sel...
- Workload Schedulers -- Genesis, Algorithms and Differences : Abstract: This paper presents a novel approach to categorization of modern workload schedulers. We provide descriptions of three classes of schedulers: Operating Systems Process Schedulers, Cluster Sy...
- Heuristic Transformer: Belief Augmented In-Context Reinforcement Learning : Abstract: Transformers have demonstrated exceptional in-context learning (ICL) capabilities, enabling applications across natural language processing, computer vision, and sequential decision-making. ...
- FineSkiing: A Fine-grained Benchmark for Skiing Action Quality Assessment : Abstract: Action Quality Assessment (AQA) aims to evaluate and score sports actions, which has attracted widespread interest in recent years. Existing AQA methods primarily predict scores based on fea...
- Robustness and Imperceptibility Analysis of Hybrid Spatial-Frequency Domain Image Watermarking : Abstract: The proliferation of digital media necessitates robust methods for copyright protection and content authentication. This paper presents a comprehensive comparative study of digital image wat...
- Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners : Abstract: While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can...
- VocalNet-M2: Advancing Low-Latency Spoken Language Modeling via Integrated Multi-Codebook Tokenization and Multi-Token Prediction : Abstract: Current end-to-end spoken language models (SLMs) have made notable progress, yet they still encounter considerable response latency. This delay primarily arises from the autoregressive gener...
- Speech-Audio Compositional Attacks on Multimodal LLMs and Their Mitigation with SALMONN-Guard : Abstract: Recent progress in large language models (LLMs) has enabled understanding of both speech and non-speech audio, but exposing new safety risks emerging from complex audio inputs that are inade...
- Persona-Aware Alignment Framework for Personalized Dialogue Generation : Abstract: Personalized dialogue generation aims to leverage persona profiles and dialogue history to generate persona-relevant and consistent responses. Mainstream models typically rely on token-level...
- Fractional neural attention for efficient multiscale sequence processing : Abstract: Attention mechanisms underpin the computational power of Transformer models, which have achieved remarkable success across diverse domains. Yet understanding and extending the principles und...
- VISTA: A Vision and Intent-Aware Social Attention Framework for Multi-Agent Trajectory Prediction : Abstract: Multi-agent trajectory prediction is crucial for autonomous systems operating in dense, interactive environments. Existing methods often fail to jointly capture agents' long-term goals and t...
- Improved Offline Reinforcement Learning via Quantum Metric Encoding : Abstract: Reinforcement learning (RL) with limited samples is common in real-world applications. However, offline RL performance under this constraint is often suboptimal. We consider an alternative a...
- Utilizing a Geospatial Foundation Model for Coastline Delineation in Small Sandy Islands : Abstract: We present an initial evaluation of NASA and IBM's Prithvi-EO-2.0 geospatial foundation model on shoreline delineation of small sandy islands using satellite images. We curated and labeled a...
- GEA: Generation-Enhanced Alignment for Text-to-Image Person Retrieval : Abstract: Text-to-Image Person Retrieval (TIPR) aims to retrieve person images based on natural language descriptions. Although many TIPR methods have achieved promising results, sometimes textual que...
- Right Looks, Wrong Reasons: Compositional Fidelity in Text-to-Image Generation : Abstract: The architectural blueprint of today's leading text-to-image models contains a fundamental flaw: an inability to handle logical composition. This survey investigates this breakdown across th...
- MATAI: A Generalist Machine Learning Framework for Property Prediction and Inverse Design of Advanced Alloys : Abstract: The discovery of advanced metallic alloys is hindered by vast composition spaces, competing property objectives, and real-world constraints on manufacturability. Here we introduce MATAI, a g...
- On the Military Applications of Large Language Models : Abstract: In this paper, military use cases or applications and implementation thereof are considered for natural language processing and large language models, which have broken into fame with the in...
- T2IBias: Uncovering Societal Bias Encoded in the Latent Space of Text-to-Image Generative Models : Abstract: Text-to-image (T2I) generative models are largely used in AI-powered real-world applications and value creation. However, their strategic deployment raises critical concerns for responsible ...
- eXIAA: eXplainable Injections for Adversarial Attack : Abstract: Post-hoc explainability methods are a subset of Machine Learning (ML) that aim to provide a reason for why a model behaves in a certain way. In this paper, we show a new black-box model-agno...
- Opinion: Towards Unified Expressive Policy Optimization for Robust Robot Learning : Abstract: Offline-to-online reinforcement learning (O2O-RL) has emerged as a promising paradigm for safe and efficient robotic policy deployment but suffers from two fundamental challenges: limited co...
- Multivariate Gaussian Representation Learning for Medical Action Evaluation : Abstract: Fine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotempora...
- BuddyMoE: Exploiting Expert Redundancy to Accelerate Memory-Constrained Mixture-of-Experts Inference : Abstract: Mixture-of-Experts (MoE) architectures scale language models by activating only a subset of specialized expert networks for each input token, thereby reducing the number of floating-point op...
- Moral Change or Noise? On Problems of Aligning AI With Temporally Unstable Human Feedback : Abstract: Alignment methods in moral domains seek to elicit moral preferences of human stakeholders and incorporate them into AI. This presupposes moral preferences as static targets, but such prefere...
- Temporal Latent Variable Structural Causal Model for Causal Discovery under External Interferences : Abstract: Inferring causal relationships from observed data is an important task, yet it becomes challenging when the data is subject to various external interferences. Most of these interferences are...
- Efficient Automated Diagnosis of Retinopathy of Prematurity by Customize CNN Models : Abstract: This paper encompasses an in-depth examination of Retinopathy of Prematurity (ROP) diagnosis, employing advanced deep learning methodologies. Our focus centers on refining and evaluating CNN...
- Anomagic: Crossmodal Prompt-driven Zero-shot Anomaly Generation : Abstract: We propose Anomagic, a zero-shot anomaly generation method that produces semantically coherent anomalies without requiring any exemplar anomalies. By unifying both visual and textual cues th...
- fastbmRAG: A Fast Graph-Based RAG Framework for Efficient Processing of Large-Scale Biomedical Literature : Abstract: Large language models (LLMs) are rapidly transforming various domains, including biomedicine and healthcare, and demonstrate remarkable potential from scientific research to new drug discove...
- MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging : Abstract: Automated interpretation of medical images demands robust modeling of complex visual-semantic relationships while addressing annotation scarcity, label imbalance, and clinical plausibility c...
- The Role of Advanced Computer Architectures in Accelerating Artificial Intelligence Workloads : Abstract: The remarkable progress in Artificial Intelligence (AI) is foundation-ally linked to a concurrent revolution in computer architecture. As AI models, particularly Deep Neural Networks (DNNs),...
- Phantom Menace: Exploring and Enhancing the Robustness of VLA Models against Physical Sensor Attacks : Abstract: Vision-Language-Action (VLA) models revolutionize robotic systems by enabling end-to-end perception-to-action pipelines that integrate multiple sensory modalities, such as visual signals pro...
- PustakAI: Curriculum-Aligned and Interactive Textbooks Using Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in understanding and generating human-like content. This has revolutionized various sectors such as healthcare, softwar...
- Difference Vector Equalization for Robust Fine-tuning of Vision-Language Models : Abstract: Contrastive pre-trained vision-language models, such as CLIP, demonstrate strong generalization abilities in zero-shot classification by leveraging embeddings extracted from image and text e...
- MultiTab: A Scalable Foundation for Multitask Learning on Tabular Data : Abstract: Tabular data is the most abundant data type in the world, powering systems in finance, healthcare, e-commerce, and beyond. As tabular datasets grow and span multiple related targets, there i...
- Owlgorithm: Supporting Self-Regulated Learning in Competitive Programming through LLM-Driven Reflection : Abstract: We present Owlgorithm, an educational platform that supports Self-Regulated Learning (SRL) in competitive programming (CP) through AI-generated reflective questions. Leveraging GPT-4o, Owlgo...
- EnvTrace: Simulation-Based Semantic Evaluation of LLM Code via Execution Trace Alignment -- Demonstrated at Synchrotron Beamlines : Abstract: Evaluating large language models (LLMs) for instrument control requires methods that go beyond standard, stateless algorithmic benchmarks, since the behavior of physical systems cannot be fu...
- AI-Integrated Decision Support System for Real-Time Market Growth Forecasting and Multi-Source Content Diffusion Analytics : Abstract: The rapid proliferation of AI-generated content (AIGC) has reshaped the dynamics of digital marketing and online consumer behavior. However, predicting the diffusion trajectory and market im...
- Beyond Cosine Similarity Magnitude-Aware CLIP for No-Reference Image Quality Assessment : Abstract: Recent efforts have repurposed the Contrastive Language-Image Pre-training (CLIP) model for No-Reference Image Quality Assessment (NR-IQA) by measuring the cosine similarity between the imag...
- EEGAgent: A Unified Framework for Automated EEG Analysis Using Large Language Models : Abstract: Scalable and generalizable analysis of brain activity is essential for advancing both clinical diagnostics and cognitive research. Electroencephalography (EEG), a non-invasive modality with ...
- AdaptViG: Adaptive Vision GNN with Exponential Decay Gating : Abstract: Vision Graph Neural Networks (ViGs) offer a new direction for advancements in vision architectures. While powerful, ViGs often face substantial computational challenges stemming from their g...
- Compensating Distribution Drifts in Class-incremental Learning of Pre-trained Vision Transformers : Abstract: Recent advances have shown that sequential fine-tuning (SeqFT) of pre-trained vision transformers (ViTs), followed by classifier refinement using approximate distributions of class features,...
- MDMLP-EIA: Multi-domain Dynamic MLPs with Energy Invariant Attention for Time Series Forecasting : Abstract: Time series forecasting is essential across diverse domains. While MLP-based methods have gained attention for achieving Transformer-comparable performance with fewer parameters and better r...
- Harnessing Bounded-Support Evolution Strategies for Policy Refinement : Abstract: Improving competent robot policies with on-policy RL is often hampered by noisy, low-signal gradients. We revisit Evolution Strategies (ES) as a policy-gradient proxy and localize exploratio...
- PRISM: Diversifying Dataset Distillation by Decoupling Architectural Priors : Abstract: Dataset distillation (DD) promises compact yet faithful synthetic data, but existing approaches often inherit the inductive bias of a single teacher model. As dataset size increases, this bi...
- Simulator and Experience Enhanced Diffusion Model for Comprehensive ECG Generation : Abstract: Cardiovascular disease (CVD) is a leading cause of mortality worldwide. Electrocardiograms (ECGs) are the most widely used non-invasive tool for cardiac assessment, yet large, well-annotated...
- Scale-Aware Relay and Scale-Adaptive Loss for Tiny Object Detection in Aerial Images : Abstract: Recently, despite the remarkable advancements in object detection, modern detectors still struggle to detect tiny objects in aerial images. One key reason is that tiny objects carry limited ...
- A General Anchor-Based Framework for Scalable Fair Clustering : Abstract: Fair clustering is crucial for mitigating bias in unsupervised learning, yet existing algorithms often suffer from quadratic or super-quadratic computational complexity, rendering them impra...
- Taught by the Flawed: How Dataset Insecurity Breeds Vulnerable AI Code : Abstract: AI programming assistants have demonstrated a tendency to generate code containing basic security vulnerabilities. While developers are ultimately responsible for validating and reviewing su...
- Expandable and Differentiable Dual Memories with Orthogonal Regularization for Exemplar-free Continual Learning : Abstract: Continual learning methods used to force neural networks to process sequential tasks in isolation, preventing them from leveraging useful inter-task relationships and causing them to repeate...
- CertMask: Certifiable Defense Against Adversarial Patches via Theoretically Optimal Mask Coverage : Abstract: Adversarial patch attacks inject localized perturbations into images to mislead deep vision models. These attacks can be physically deployed, posing serious risks to real-world applications....
- From Street to Orbit: Training-Free Cross-View Retrieval via Location Semantics and LLM Guidance : Abstract: Cross-view image retrieval, particularly street-to-satellite matching, is a critical task for applications such as autonomous navigation, urban planning, and localization in GPS-denied envir...
- Multiple Treatments Causal Effects Estimation with Task Embeddings and Balanced Representation Learning : Abstract: The simultaneous application of multiple treatments is increasingly common in many fields, such as healthcare and marketing. In such scenarios, it is important to estimate the single treatme...
- On the Convergence of Overparameterized Problems: Inherent Properties of the Compositional Structure of Neural Networks : Abstract: This paper investigates how the compositional structure of neural networks shapes their optimization landscape and training dynamics. We analyze the gradient flow associated with overparamet...
- Test-Time Spectrum-Aware Latent Steering for Zero-Shot Generalization in Vision-Language Models : Abstract: Vision-Language Models (VLMs) excel at zero-shot inference but often degrade under test-time domain shifts. For this reason, episodic test-time adaptation strategies have recently emerged as...
- Constrained Best Arm Identification with Tests for Feasibility : Abstract: Best arm identification (BAI) aims to identify the highest-performance arm among a set of $K$ arms by collecting stochastic samples from each arm. In real-world problems, the best arm needs ...
- Predicate-Argument Structure Divergences in Chinese and English Parallel Sentences and their Impact on Language Transfer : Abstract: Cross-lingual Natural Language Processing (NLP) has gained significant traction in recent years, offering practical solutions in low-resource settings by transferring linguistic knowledge fr...
- Koopman Invariants as Drivers of Emergent Time-Series Clustering in Joint-Embedding Predictive Architectures : Abstract: Joint-Embedding Predictive Architectures (JEPAs), a powerful class of self-supervised models, exhibit an unexplained ability to cluster time-series data by their underlying dynamical regimes...
- Privacy-Preserving Explainable AIoT Application via SHAP Entropy Regularization : Abstract: The widespread integration of Artificial Intelligence of Things (AIoT) in smart home environments has amplified the demand for transparent and interpretable machine learning models. To foste...
- Solvaformer: an SE(3)-equivariant graph transformer for small molecule solubility prediction : Abstract: Accurate prediction of small molecule solubility using material-sparing approaches is critical for accelerating synthesis and process optimization, yet experimental measurement is costly and...
- Ksurf-Drone: Attention Kalman Filter for Contextual Bandit Optimization in Cloud Resource Allocation : Abstract: Resource orchestration and configuration parameter search are key concerns for container-based infrastructure in cloud data centers. Large configuration search space and cloud uncertainties ...
- Brian Intensify: An Adaptive Machine Learning Framework for Auditory EEG Stimulation and Cognitive Enhancement in FXS : Abstract: Neurodevelopmental disorders such as Fragile X Syndrome (FXS) and Autism Spectrum Disorder (ASD) are characterized by disrupted cortical oscillatory activity, particularly in the alpha and g...
- History Rhymes: Macro-Contextual Retrieval for Robust Financial Forecasting : Abstract: Financial markets are inherently non-stationary: structural breaks and macroeconomic regime shifts often cause forecasting models to fail when deployed out of distribution (OOD). Conventiona...
- How Small Can You Go? Compact Language Models for On-Device Critical Error Detection in Machine Translation : Abstract: Large Language Models (LLMs) excel at evaluating machine translation (MT), but their scale and cost hinder deployment on edge devices and in privacy-sensitive workflows. We ask: how small ca...
- Feature Quality and Adaptability of Medical Foundation Models: A Comparative Evaluation for Radiographic Classification and Segmentation : Abstract: Foundation models (FMs) promise to generalize medical imaging, but their effectiveness varies. It remains unclear how pre-training domain (medical vs. general), paradigm (e.g., text-guided),...
- TawPipe: Topology-Aware Weight Pipeline Parallelism for Accelerating Long-Context Large Models Training : Abstract: Training large language models (LLMs) is fundamentally constrained by limited device memory and costly inter-device communication. Although pipeline parallelism alleviates memory pressure by...
- Soiling detection for Advanced Driver Assistance Systems : Abstract: Soiling detection for automotive cameras is a crucial part of advanced driver assistance systems to make them more robust to external conditions like weather, dust, etc. In this paper, we re...
- Out-of-Distribution Generalization with a SPARC: Racing 100 Unseen Vehicles with a Single Policy : Abstract: Generalization to unseen environments is a significant challenge in the field of robotics and control. In this work, we focus on contextual reinforcement learning, where agents act within en...
- Social LSTM with Dynamic Occupancy Modeling for Realistic Pedestrian Trajectory Prediction : Abstract: In dynamic and crowded environments, realistic pedestrian trajectory prediction remains a challenging task due to the complex nature of human motion and the mutual influences among individua...
- Baby Sophia: A Developmental Approach to Self-Exploration through Self-Touch and Hand Regard : Abstract: Inspired by infant development, we propose a Reinforcement Learning (RL) framework for autonomous self-exploration in a robotic agent, Baby Sophia, using the BabyBench simulation environment...
- PALMS+: Modular Image-Based Floor Plan Localization Leveraging Depth Foundation Model : Abstract: Indoor localization in GPS-denied environments is crucial for applications like emergency response and assistive navigation. Vision-based methods such as PALMS enable infrastructure-free loc...
- SEBA: Sample-Efficient Black-Box Attacks on Visual Reinforcement Learning : Abstract: Visual reinforcement learning has achieved remarkable progress in visual control and robotics, but its vulnerability to adversarial perturbations remains underexplored. Most existing black-b...
- Alignment Debt: The Hidden Work of Making AI Usable : Abstract: Frontier LLMs are optimised around high-resource assumptions about language, knowledge, devices, and connectivity. Whilst widely accessible, they often misfit conditions in the Global South....
- Optimistic Reinforcement Learning with Quantile Objectives : Abstract: Reinforcement Learning (RL) has achieved tremendous success in recent years. However, the classical foundations of RL do not account for the risk sensitivity of the objective function, which...
- TomoGraphView: 3D Medical Image Classification with Omnidirectional Slice Representations and Graph Neural Networks : Abstract: The growing number of medical tomography examinations has necessitated the development of automated methods capable of extracting comprehensive imaging features to facilitate downstream task...
- An explainable Recursive Feature Elimination to detect Advanced Persistent Threats using Random Forest classifier : Abstract: Intrusion Detection Systems (IDS) play a vital role in modern cybersecurity frameworks by providing a primary defense mechanism against sophisticated threat actors. In this paper, we propose...
- Scaling Environments for LLM Agents in the Era of Learning from Interaction: A Survey : Abstract: LLM-based agents can autonomously accomplish complex tasks across various domains. However, to further cultivate capabilities such as adaptive behavior and long-term decision-making, trainin...
- HeatGen: A Guided Diffusion Framework for Multiphysics Heat Sink Design Optimization : Abstract: This study presents a generative optimization framework based on a guided denoising diffusion probabilistic model (DDPM) that leverages surrogate gradients to generate heat sink designs mini...
- Prostate-VarBench: A Benchmark with Interpretable TabNet Framework for Prostate Cancer Variant Classification : Abstract: Variants of Uncertain Significance (VUS) limit the clinical utility of prostate cancer genomics by delaying diagnosis and therapy when evidence for pathogenicity or benignity is incomplete. ...
- General Intelligence-based Fragmentation (GIF): A framework for peak-labeled spectra simulation : Abstract: Despite growing reference libraries and advanced computational tools, progress in the field of metabolomics remains constrained by low rates of annotating measured spectra. The recent develo...
- VEDA: 3D Molecular Generation via Variance-Exploding Diffusion with Annealing : Abstract: Diffusion models show promise for 3D molecular generation, but face a fundamental trade-off between sampling efficiency and conformational accuracy. While flow-based models are fast, they of...
- Mamba-driven multi-perspective structural understanding for molecular ground-state conformation prediction : Abstract: A comprehensive understanding of molecular structures is important for the prediction of molecular ground-state conformation involving property information. Meanwhile, state space model (e.g...
- Probability-Biased Attention over Directed Bipartite Graphs for Long-Tail ICD Coding : Abstract: Automated International Classification of Diseases (ICD) coding aims to assign multiple disease codes to clinical documents, constituting a crucial multi-label text classification task in he...
- Querying Labeled Time Series Data with Scenario Programs : Abstract: Simulation-based testing has become a crucial complement to road testing for ensuring the safety of cyber physical systems (CPS). As a result, significant research efforts have been directed...
- Regular Games -- an Automata-Based General Game Playing Language : Abstract: We propose a new General Game Playing (GGP) system called Regular Games (RG). The main goal of RG is to be both computationally efficient and convenient for game design. The system consists ...
- Bi-Level Contextual Bandits for Individualized Resource Allocation under Delayed Feedback : Abstract: Equitably allocating limited resources in high-stakes domains-such as education, employment, and healthcare-requires balancing short-term utility with long-term impact, while accounting for ...
- Rethinking Science in the Age of Artificial Intelligence : Abstract: Artificial intelligence (AI) is reshaping how research is conceived, conducted, and communicated across fields from chemistry to biomedicine. This commentary examines how AI is transforming ...
- Strategic Opponent Modeling with Graph Neural Networks, Deep Reinforcement Learning and Probabilistic Topic Modeling : Abstract: This paper provides a comprehensive review of mainly Graph Neural Networks, Deep Reinforcement Learning, and Probabilistic Topic Modeling methods with a focus on their potential incorporatio...
- Proceedings of The third international workshop on eXplainable AI for the Arts (XAIxArts) : Abstract: This third international workshop on explainable AI for the Arts (XAIxArts) brought together a community of researchers in HCI, Interaction Design, AI, explainable AI (XAI), and digital arts...
- Non-Monotonic S4F Standpoint Logic : Abstract: Standpoint logics offer unified modal logic-based formalisms for representing multiple heterogeneous viewpoints. At the same time, many non-monotonic reasoning frameworks can be naturally ca...
- Preference Elicitation for Step-Wise Explanations in Logic Puzzles : Abstract: Step-wise explanations can explain logic puzzles and other satisfaction problems by showing how to derive decisions step by step. Each step consists of a set of constraints that derive an as...
- Using Certifying Constraint Solvers for Generating Step-wise Explanations : Abstract: In the field of Explainable Constraint Solving, it is common to explain to a user why a problem is unsatisfiable. A recently proposed method for this is to compute a sequence of explanation ...
- Generalizing Analogical Inference from Boolean to Continuous Domains : Abstract: Analogical reasoning is a powerful inductive mechanism, widely used in human cognition and increasingly applied in artificial intelligence. Formal frameworks for analogical inference have be...
- Explaining Decentralized Multi-Agent Reinforcement Learning Policies : Abstract: Multi-Agent Reinforcement Learning (MARL) has gained significant interest in recent years, enabling sequential decision-making across multiple agents in various domains. However, most existi...
- SITA: A Framework for Structure-to-Instance Theorem Autoformalization : Abstract: While large language models (LLMs) have shown progress in mathematical reasoning, they still face challenges in formalizing theorems that arise from instantiating abstract structures in conc...
- Massively Parallel Proof-Number Search for Impartial Games and Beyond : Abstract: Proof-Number Search is a best-first search algorithm with many successful applications, especially in game solving. As large-scale computing clusters become increasingly accessible, parallel...
- Beyond Verification: Abductive Explanations for Post-AI Assessment of Privacy Leakage : Abstract: Privacy leakage in AI-based decision processes poses significant risks, particularly when sensitive information can be inferred. We propose a formal framework to audit privacy leakage using ...
- FactGuard: Event-Centric and Commonsense-Guided Fake News Detection : Abstract: Fake news detection methods based on writing style have achieved remarkable progress. However, as adversaries increasingly imitate the style of authentic news, the effectiveness of such appr...
- Fixed-Persona SLMs with Modular Memory: Scalable NPC Dialogue on Consumer Hardware : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, yet their applicability to dialogue systems in computer games remains limited. This limi...
- Bidirectional Bounded-Suboptimal Heuristic Search with Consistent Heuristics : Abstract: Recent advancements in bidirectional heuristic search have yielded significant theoretical insights and novel algorithms. While most previous work has concentrated on optimal search methods,...
- Causal-HalBench: Uncovering LVLMs Object Hallucinations Through Causal Intervention : Abstract: Large Vision-Language Models (LVLMs) often suffer from object hallucination, making erroneous judgments about the presence of objects in images. We propose this primar- ily stems from spurio...
- Temporal Properties of Conditional Independence in Dynamic Bayesian Networks : Abstract: Dynamic Bayesian networks (DBNs) are compact graphical representations used to model probabilistic systems where interdependent random variables and their distributions evolve over time. In ...
- Beyond Single-Step Updates: Reinforcement Learning of Heuristics with Limited-Horizon Search : Abstract: Many sequential decision-making problems can be formulated as shortest-path problems, where the objective is to reach a goal state from a given starting state. Heuristic search is a standard...
- PepTriX: A Framework for Explainable Peptide Analysis through Protein Language Models : Abstract: Peptide classification tasks, such as predicting toxicity and HIV inhibition, are fundamental to bioinformatics and drug discovery. Traditional approaches rely heavily on handcrafted encodin...
- ProgRAG: Hallucination-Resistant Progressive Retrieval and Reasoning over Knowledge Graphs : Abstract: Large Language Models (LLMs) demonstrate strong reasoning capabilities but struggle with hallucinations and limited transparency. Recently, KG-enhanced LLMs that integrate knowledge graphs (...
- Bridging Synthetic and Real Routing Problems via LLM-Guided Instance Generation and Progressive Adaptation : Abstract: Recent advances in Neural Combinatorial Optimization (NCO) methods have significantly improved the capability of neural solvers to handle synthetic routing instances. Nonetheless, existing n...
- MTP: Exploring Multimodal Urban Traffic Profiling with Modality Augmentation and Spectrum Fusion : Abstract: With rapid urbanization in the modern era, traffic signals from various sensors have been playing a significant role in monitoring the states of cities, which provides a strong foundation in...
- Advanced Black-Box Tuning of Large Language Models with Limited API Calls : Abstract: Black-box tuning is an emerging paradigm for adapting large language models (LLMs) to better achieve desired behaviors, particularly when direct access to model parameters is unavailable. Cu...
- Two Constraint Compilation Methods for Lifted Planning : Abstract: We study planning in a fragment of PDDL with qualitative state-trajectory constraints, capturing safety requirements, task ordering conditions, and intermediate sub-goals commonly found in r...
- DenoGrad: Deep Gradient Denoising Framework for Enhancing the Performance of Interpretable AI Models : Abstract: The performance of Machine Learning (ML) models, particularly those operating within the Interpretable Artificial Intelligence (Interpretable AI) framework, is significantly affected by the ...
- RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge ...
- Intilligence Foundation Model: A New Perspective to Approach Artificial General Intelligence : Abstract: We propose a new perspective for approaching artificial general intelligence (AGI) through an intelligence foundation model (IFM). Unlike existing foundation models (FMs), which specialize i...
- Balancing Centralized Learning and Distributed Self-Organization: A Hybrid Model for Embodied Morphogenesis : Abstract: We investigate how to couple a learnable brain-like'' controller to a cell-like'' Gray--Scott substrate to steer pattern formation with minimal effort. A compact convolutional policy is embe...
- Enhancing the Medical Context-Awareness Ability of LLMs via Multifaceted Self-Refinement Learning : Abstract: Large language models (LLMs) have shown great promise in the medical domain, achieving strong performance on several benchmarks. However, they continue to underperform in real-world medical ...
- Radiology Workflow-Guided Hierarchical Reinforcement Fine-Tuning for Medical Report Generation : Abstract: Radiologists compose diagnostic reports through a structured workflow: they describe visual findings, summarize them into impressions, and carefully refine statements in clinically critical ...
- Efficient Thought Space Exploration through Strategic Intervention : Abstract: While large language models (LLMs) demonstrate emerging reasoning capabilities, current inference-time expansion methods incur prohibitive computational costs by exhaustive sampling. Through...
- Beyond ReAct: A Planner-Centric Framework for Complex Tool-Augmented LLM Reasoning : Abstract: Existing tool-augmented large language models (LLMs) encounter significant challenges when processing complex queries. Current frameworks such as ReAct are prone to local optimization traps ...
- ChEmREF: Evaluating Language Model Readiness for Chemical Emergency Response : Abstract: Emergency responders managing hazardous material HAZMAT incidents face critical, time-sensitive decisions, manually navigating extensive chemical guidelines. We investigate whether today's l...
- SPAN: Benchmarking and Improving Cross-Calendar Temporal Reasoning of Large Language Models : Abstract: We introduce SPAN, a cross-calendar temporal reasoning benchmark, which requires LLMs to perform intra-calendar temporal reasoning and inter-calendar temporal conversion. SPAN features ten c...
- Adaptive Hyperbolic Kernels: Modulated Embedding in de Branges-Rovnyak Spaces : Abstract: Hierarchical data pervades diverse machine learning applications, including natural language processing, computer vision, and social network analysis. Hyperbolic space, characterized by its ...
- OIDA-QA: A Multimodal Benchmark for Analyzing the Opioid Industry Documents Archive : Abstract: The opioid crisis represents a significant moment in public health that reveals systemic shortcomings across regulatory systems, healthcare practices, corporate governance, and public policy...
- Learning to Pose Problems: Reasoning-Driven and Solver-Adaptive Data Synthesis for Large Reasoning Models : Abstract: Data synthesis for training large reasoning models offers a scalable alternative to limited, human-curated datasets, enabling the creation of high-quality data. However, existing approaches ...
- CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D : Abstract: AI systems are increasingly able to autonomously conduct realistic software engineering tasks, and may soon be deployed to automate machine learning (ML) R&D itself. Frontier AI systems may ...
- Boosting In-Silicon Directed Evolution with Fine-Tuned Protein Language Model and Tree Search : Abstract: Protein evolution through amino acid sequence mutations is a cornerstone of life sciences. While current in-silicon directed evolution algorithms focus on designing search strategies, they o...
- EgoEMS: A High-Fidelity Multimodal Egocentric Dataset for Cognitive Assistance in Emergency Medical Services : Abstract: Emergency Medical Services (EMS) are critical to patient survival in emergencies, but first responders often face intense cognitive demands in high-stakes situations. AI cognitive assistants...
- Quantum Artificial Intelligence (QAI): Foundations, Architectural Elements, and Future Directions : Abstract: Mission critical (MC) applications such as defense operations, energy management, cybersecurity, and aerospace control require reliable, deterministic, and low-latency decision making under ...
- Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems : Abstract: Adversarial patches have emerged as a popular privacy-preserving approach for resisting AI-driven surveillance systems. However, their conspicuous appearance makes them difficult to deploy i...
- Robust Watermarking on Gradient Boosting Decision Trees : Abstract: Gradient Boosting Decision Trees (GBDTs) are widely used in industry and academia for their high accuracy and efficiency, particularly on structured data. However, watermarking GBDT models r...
- SlideBot: A Multi-Agent Framework for Generating Informative, Reliable, Multi-Modal Presentations : Abstract: Large Language Models (LLMs) have shown immense potential in education, automating tasks like quiz generation and content summarization. However, generating effective presentation slides int...
- Why Open Small AI Models Matter for Interactive Art : Abstract: This position paper argues for the importance of open small AI models in creative independence for interactive art practices. Deployable locally, these models offer artists vital control ove...
- AI Annotation Orchestration: Evaluating LLM verifiers to Improve the Quality of LLM Annotations in Learning Analytics : Abstract: Large Language Models (LLMs) are increasingly used to annotate learning interactions, yet concerns about reliability limit their utility. We test whether verification-oriented orchestration-...
- ProbLog4Fairness: A Neurosymbolic Approach to Modeling and Mitigating Bias : Abstract: Operationalizing definitions of fairness is difficult in practice, as multiple definitions can be incompatible while each being arguably desirable. Instead, it may be easier to directly desc...
- Echoing: Identity Failures when LLM Agents Talk to Each Other : Abstract: As large language model (LLM) based agents interact autonomously with one another, a new class of failures emerges that cannot be predicted from single agent performance: behavioral drifts i...
- Rebellion: Noise-Robust Reasoning Training for Audio Reasoning Models : Abstract: Instilling reasoning capabilities in large models (LMs) using reasoning training (RT) significantly improves LMs' performances. Thus Audio Reasoning Models (ARMs), i.e., audio LMs that can r...
- Cogent argument extensions are weakly admissible but not vice versa : Abstract: In this research note, we show the relationship between two non-admissible argumentation framework semantics: cogent and weakly admissible semantics. We prove that, while cogent extensions a...
- Proceedings of the Second International Workshop on Next-Generation Language Models for Knowledge Representation and Reasoning (NeLaMKRR 2025) : Abstract: Reasoning is an essential component of human intelligence in that it plays a fundamental role in our ability to think critically, support responsible decisions, and solve challenging problem...
- SynthTools: A Framework for Scaling Synthetic Tools for Agent Development : Abstract: AI agents increasingly rely on external tools to solve complex, long-horizon tasks. Advancing such agents requires reproducible evaluation and large-scale training in controllable, diverse, ...
- Variable Neighborhood Search for the Electric Vehicle Routing Problem : Abstract: The Electric Vehicle Routing Problem (EVRP) extends the classical Vehicle Routing Problem (VRP) to reflect the growing use of electric and hybrid vehicles in logistics. Due to the variety of...
- An Efficient and Almost Optimal Solver for the Joint Routing-Assignment Problem via Partial JRA and Large-{\alpha} Optimization : Abstract: The Joint Routing-Assignment (JRA) optimization problem simultaneously determines the assignment of items to placeholders and a Hamiltonian cycle that visits each node pair exactly once, wit...
Research Sources: 427 | Generated: 11/14/2025
