AI RESEARCH PAPERS & ACADEMIC SOURCES
- Sound Source Localization for Spatial Mapping of Surgical Actions in Dynamic Scenes : Abstract: Purpose: Surgical scene understanding is key to advancing computer-aided and intelligent surgical systems. Current approaches predominantly rely on visual data or end-to-end learning, which ...
- NVSim: Novel View Synthesis Simulator for Large Scale Indoor Navigation : Abstract: We present NVSim, a framework that automatically constructs large-scale, navigable indoor simulators from only common image sequences, overcoming the cost and scalability limitations of trad...
- GroundLoc: Efficient Large-Scale Outdoor LiDAR-Only Localization : Abstract: In this letter, we introduce GroundLoc, a LiDAR-only localization pipeline designed to localize a mobile robot in large-scale outdoor environments using prior maps. GroundLoc employs a Bird'...
- CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects : Abstract: Customized text-to-video generation aims to generate high-quality videos guided by text prompts and subject references. Current approaches for personalizing text-to-video generation suffer f...
- UMCFuse: A Unified Multiple Complex Scenes Infrared and Visible Image Fusion Framework : Abstract: Infrared and visible image fusion has emerged as a prominent research area in computer vision. However, little attention has been paid to the fusion task in complex scenes, leading to sub-op...
- Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond : Abstract: General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments...
- RETTA: Retrieval-Enhanced Test-Time Adaptation for Zero-Shot Video Captioning : Abstract: Despite the significant progress of fully-supervised video captioning, zero-shot methods remain much less explored. In this paper, we propose a novel zero-shot video captioning framework nam...
- RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text : Abstract: In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works t...
- Unsupervised Monocular Depth Estimation Based on Hierarchical Feature-Guided Diffusion : Abstract: Unsupervised monocular depth estimation has received widespread attention because of its capability to train without ground truth. In real-world scenarios, the images may be blurry or noisy ...
- MTFL: Multi-Timescale Feature Learning for Weakly-Supervised Anomaly Detection in Surveillance Videos : Abstract: Detection of anomaly events is relevant for public safety and requires a combination of fine-grained motion information and contextual events at variable time-scales. To this end, we propose...
- Topology-Preserving Image Segmentation with Spatial-Aware Persistent Feature Matching : Abstract: Topological correctness is critical for segmentation of tubular structures, which pervade in biomedical images. Existing topological segmentation loss functions are primarily based on the pe...
- Federated Learning with Partially Labeled Data: A Conditional Distillation Approach : Abstract: In medical imaging, developing generalized segmentation models that can handle multiple organs and lesions is crucial. However, the scarcity of fully annotated datasets and strict privacy re...
- Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy : Abstract: We introduce Long-VITA, a simple yet effective large multi-modal model for long-context visual-language understanding tasks. It is adept at concurrently processing and analyzing modalities o...
- Faces of Fairness: Examining Bias in Facial Expression Recognition Datasets and Models : Abstract: Building AI systems, including Facial Expression Recognition (FER), involves two critical aspects: data and model design. Both components significantly influence bias and fairness in FER tas...
- Frequency-Aware Vision Transformers for High-Fidelity Super-Resolution of Earth System Models : Abstract: Super-resolution (SR) is crucial for enhancing the spatial fidelity of Earth System Model (ESM) outputs, allowing fine-scale structures vital to climate science to be recovered from coarse s...
- Polygonal network disorder and the turning distance : Abstract: The turning distance is a well-studied metric for measuring the similarity between two polygons. This metric is constructed by taking an $L^p$ distance between step functions which track eac...
- DynCIM: Dynamic Curriculum for Imbalanced Multimodal Learning : Abstract: Multimodal learning integrates complementary information from diverse modalities to enhance the decision-making process. However, the potential of multimodal collaboration remains under-expl...
- Superpowering Open-Vocabulary Object Detectors for X-ray Vision : Abstract: Open-vocabulary object detection (OvOD) is set to revolutionize security screening by enabling systems to recognize any item in X-ray scans. However, developing effective OvOD models for X-r...
- LiDAR Remote Sensing Meets Weak Supervision: Concepts, Methods, and Perspectives : Abstract: Light detection and ranging (LiDAR) remote sensing encompasses two major directions: data interpretation and parameter inversion. However, both directions rely heavily on costly and labor-in...
- FaceCloak: Learning to Protect Face Templates : Abstract: Generative models can reconstruct face images from encoded representations (templates) bearing remarkable likeness to the original face, raising security and privacy concerns. We present \te...
- Does CLIP perceive art the same way we do? : Abstract: CLIP has emerged as a powerful multimodal model capable of connecting images and text through joint embeddings, but to what extent does it 'see' the same way humans do - especially when inte...
- DArFace: Deformation Aware Robustness for Low Quality Face Recognition : Abstract: Facial recognition systems have achieved remarkable success by leveraging deep neural networks, advanced loss functions, and large-scale datasets. However, their performance often deteriorat...
- Long-RVOS: A Comprehensive Benchmark for Long-term Referring Video Object Segmentation : Abstract: Referring video object segmentation (RVOS) aims to identify, track and segment the objects in a video based on language descriptions, which has received great attention in recent years. Howe...
- CPathAgent: An Agent-based Foundation Model for Interpretable High-Resolution Pathology Image Analysis Mimicking Pathologists' Diagnostic Logic : Abstract: Recent advances in computational pathology have led to the emergence of numerous foundation models. These models typically rely on general-purpose encoders with multi-instance learning for w...
- MoPFormer: Motion-Primitive Transformer for Wearable-Sensor Activity Recognition : Abstract: Human Activity Recognition (HAR) with wearable sensors is challenged by limited interpretability, which significantly impacts cross-dataset generalization. To address this challenge, we prop...
- GS4: Generalizable Sparse Splatting Semantic SLAM : Abstract: Traditional SLAM algorithms excel at camera tracking, but typically produce incomplete and low-resolution maps that are not tightly integrated with semantics prediction. Recent work integrat...
- AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning : Abstract: Controllable captioning is essential for precise multimodal alignment and instruction following, yet existing models often lack fine-grained control and reliable evaluation protocols. To add...
- Multispectral State-Space Feature Fusion: Bridging Shared and Cross-Parametric Interactions for Object Detection : Abstract: Modern multispectral feature fusion for object detection faces two critical limitations: (1) Excessive preference for local complementary features over cross-modal shared semantics adversely...
- Acoustic Neural 3D Reconstruction Under Pose Drift : Abstract: We consider the problem of optimizing neural implicit surfaces for 3D reconstruction using acoustic images collected with drifting sensor poses. The accuracy of current state-of-the-art 3D a...
- GaussianFusion: Gaussian-Based Multi-Sensor Fusion for End-to-End Autonomous Driving : Abstract: Multi-sensor fusion is crucial for improving the performance and robustness of end-to-end autonomous driving systems. Existing methods predominantly adopt either attention-based flatten fusi...
- Radar and Event Camera Fusion for Agile Robot Ego-Motion Estimation : Abstract: Achieving reliable ego motion estimation for agile robots, e.g., aerobatic aircraft, remains challenging because most robot sensors fail to respond timely and clearly to highly dynamic robot...
- Adaptive Keyframe Selection for Scalable 3D Scene Reconstruction in Dynamic Environments : Abstract: In this paper, we propose an adaptive keyframe selection method for improved 3D scene reconstruction in dynamic environments. The proposed method integrates two complementary modules: an err...
- Listening without Looking: Modality Bias in Audio-Visual Captioning : Abstract: Audio-visual captioning aims to generate holistic scene descriptions by jointly modeling sound and vision. While recent methods have improved performance through sophisticated modality fusio...
- ZTRS: Zero-Imitation End-to-end Autonomous Driving with Trajectory Scoring : Abstract: End-to-end autonomous driving maps raw sensor inputs directly into ego-vehicle trajectories to avoid cascading errors from perception modules and to leverage rich semantic cues. Existing fra...
- Evaluating Long-Term Memory for Long-Context Question Answering : Abstract: In order for large language models to achieve true conversational continuity and benefit from experiential learning, they need memory. While research has focused on the development of comple...
- BitSkip: An Empirical Analysis of Quantization and Early Exit Composition : Abstract: The pursuit of efficient Large Language Models (LLMs) has led to increasingly complex techniques like extreme quantization and dynamic routing. While individual benefits of these methods are...
- Beyond Understanding: Evaluating the Pragmatic Gap in LLMs' Cultural Processing of Figurative Language : Abstract: We present a comprehensive evaluation of the ability of large language models (LLMs) to process culturally grounded language, specifically to understand and pragmatically use figurative expr...
- How Pragmatics Shape Articulation: A Computational Case Study in STEM ASL Discourse : Abstract: Most state-of-the-art sign language models are trained on interpreter or isolated vocabulary data, which overlooks the variability that characterizes natural dialogue. However, human communi...
- Temporal Blindness in Multi-Turn LLM Agents: Misaligned Tool Use vs. Human Time Perception : Abstract: Large language model agents are increasingly used in multi-turn conversational settings to interact with and execute tasks in dynamic environments. However, a key limitation is their tempora...
- Language Models for Longitudinal Clinical Prediction : Abstract: We explore a lightweight framework that adapts frozen large language models to analyze longitudinal clinical data. The approach integrates patient history and context within the language mod...
- AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages : Abstract: Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing hallucinations in LLMs. Despite the recent rel...
- Leveraging LLMs for Early Alzheimer's Prediction : Abstract: We present a connectome-informed LLM framework that encodes dynamic fMRI connectivity as temporal sequences, applies robust normalization, and maps these data into a representation suitable ...
- M-Eval: A Heterogeneity-Based Framework for Multi-evidence Validation in Medical RAG Systems : Abstract: Retrieval-augmented Generation (RAG) has demonstrated potential in enhancing medical question-answering systems through the integration of large language models (LLMs) with external medical ...
- PICOs-RAG: PICO-supported Query Rewriting for Retrieval-Augmented Generation in Evidence-Based Medicine : Abstract: Evidence-based medicine (EBM) research has always been of paramount importance. It is important to find appropriate medical theoretical support for the needs from physicians or patients to r...
- META-RAG: Meta-Analysis-Inspired Evidence-Re-Ranking Method for Retrieval-Augmented Generation in Evidence-Based Medicine : Abstract: Evidence-based medicine (EBM) holds a crucial role in clinical application. Given suitable medical articles, doctors effectively reduce the incidence of misdiagnoses. Researchers find it eff...
- TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents : Abstract: The task of information extraction (IE) is to extract structured knowledge from text. However, it is often not straightforward to utilize IE output due to the mismatch between the IE ontolog...
- Success and Cost Elicit Convention Formation for Efficient Communication : Abstract: Humans leverage shared conversational context to become increasingly successful and efficient at communicating over time. One manifestation of this is the formation of ad hoc linguistic conv...
- Pie: A Programmable Serving System for Emerging LLM Applications : Abstract: Emerging large language model (LLM) applications involve diverse reasoning strategies and agentic workflows, straining the capabilities of existing serving systems built on a monolithic toke...
- Challenging Multilingual LLMs: A New Taxonomy and Benchmark for Unraveling Hallucination in Translation : Abstract: Large Language Models (LLMs) have advanced machine translation but remain vulnerable to hallucinations. Unfortunately, existing MT benchmarks are not capable of exposing failures in multilin...
- Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures : Abstract: To date, there exist almost no culturally-specific evaluation benchmarks for large language models (LLMs) that cover a large number of languages and cultures. In this paper, we present Globa...
- RegSpeech12: A Regional Corpus of Bengali Spontaneous Speech Across Dialects : Abstract: The Bengali language, spoken extensively across South Asia and among diasporic communities, exhibits considerable dialectal diversity shaped by geography, culture, and history. Phonological ...
- Squrve: A Unified and Modular Framework for Complex Real-World Text-to-SQL Tasks : Abstract: Text-to-SQL technology has evolved rapidly, with diverse academic methods achieving impressive results. However, deploying these techniques in real-world systems remains challenging due to l...
- Reinforcement Learning for Long-Horizon Multi-Turn Search Agents : Abstract: Large Language Model (LLM) agents can leverage multiple turns and tools to solve complex tasks, with prompt-based approaches achieving strong performance. This work demonstrates that Reinfor...
- Exploring the Influence of Relevant Knowledge for Natural Language Generation Interpretability : Abstract: This paper explores the influence of external knowledge integration in Natural Language Generation (NLG), focusing on a commonsense generation task. We extend the CommonGen dataset by creati...
- HACK: Hallucinations Along Certainty and Knowledge Axes : Abstract: Hallucinations in LLMs present a critical barrier to their reliable usage. Existing research usually categorizes hallucination by their external properties rather than by the LLMs' underlyin...
- Towards Transparent Reasoning: What Drives Faithfulness in Large Language Models? : Abstract: Large Language Models (LLMs) often produce explanations that do not faithfully reflect the factors driving their predictions. In healthcare settings, such unfaithfulness is especially proble...
- Abjad AI at NADI 2025: CATT-Whisper: Multimodal Diacritic Restoration Using Text and Speech Representations : Abstract: In this work, we tackle the Diacritic Restoration (DR) task for Arabic dialectal sentences using a multimodal approach that combines both textual and speech information. We propose a model t...
- Evaluating LLMs on Generating Age-Appropriate Child-Like Conversations : Abstract: Large Language Models (LLMs), predominantly trained on adult conversational data, face significant challenges when generating authentic, child-like dialogue for specialized applications. We ...
- Can LLMs Translate Human Instructions into a Reinforcement Learning Agent's Internal Emergent Symbolic Representation? : Abstract: Emergent symbolic representations are critical for enabling developmental learning agents to plan and generalize across tasks. In this work, we investigate whether large language models (LLM...
- MERGE: Minimal Expression-Replacement GEneralization Test for Natural Language Inference : Abstract: In recent years, many generalization benchmarks have shown language models' lack of robustness in natural language inference (NLI). However, manually creating new benchmarks is costly, while...
- Lookahead Tree-Based Rollouts for Enhanced Trajectory-Level Exploration in Reinforcement Learning with Verifiable Rewards : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR), particularly with algorithms like Group Relative Policy Optimization (GRPO), has proven highly effective in enhancing the reasoning cap...
- Text Simplification with Sentence Embeddings : Abstract: Sentence embeddings can be decoded to give approximations of the original texts used to create them. We explore this effect in the context of text simplification, demonstrating that reconstr...
- Comprehensive and Efficient Distillation for Lightweight Sentiment Analysis Models : Abstract: Recent efforts leverage knowledge distillation techniques to develop lightweight and practical sentiment analysis models. These methods are grounded in human-written instructions and large-s...
- SynthWorlds: Controlled Parallel Worlds for Disentangling Reasoning and Knowledge in Language Models : Abstract: Evaluating the reasoning ability of language models (LMs) is complicated by their extensive parametric world knowledge, where benchmark performance often reflects factual recall rather than ...
- LuxIT: A Luxembourgish Instruction Tuning Dataset from Monolingual Seed Data : Abstract: The effectiveness of instruction-tuned Large Language Models (LLMs) is often limited in low-resource linguistic settings due to a lack of high-quality training data. We introduce LuxIT, a no...
- SPARTA: Evaluating Reasoning Segmentation Robustness through Black-Box Adversarial Paraphrasing in Text Autoencoder Latent Space : Abstract: Multimodal large language models (MLLMs) have shown impressive capabilities in vision-language tasks such as reasoning segmentation, where models generate segmentation masks based on textual...
- Talk2Ref: A Dataset for Reference Prediction from Scientific Talks : Abstract: Scientific talks are a growing medium for disseminating research, and automatically identifying relevant literature that grounds or enriches a talk would be highly valuable for researchers a...
- CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration? : Abstract: Accurate confidence calibration in Large Language Models (LLMs) is critical for safe use in high-stakes domains, where clear verbalized confidence enhances user trust. Traditional methods th...
- Lev\'ee d'ambigu\"it\'es par grammaires locales : Abstract: Many words are ambiguous in terms of their part of speech (POS). However, when a word appears in a text, this ambiguity is generally much reduced. Disambiguating POS involves using context t...
- Dark & Stormy: Modeling Humor in the Worst Sentences Ever Written : Abstract: Textual humor is enormously diverse and computational studies need to account for this range, including intentionally bad humor. In this paper, we curate and analyze a novel corpus of senten...
- Open Korean Historical Corpus: A Millennia-Scale Diachronic Collection of Public Domain Texts : Abstract: The history of the Korean language is characterized by a discrepancy between its spoken and written forms and a pivotal shift from Chinese characters to the Hangul alphabet. However, this li...
- BEST-RQ-Based Self-Supervised Learning for Whisper Domain Adaptation : Abstract: Automatic Speech Recognition (ASR) systems, despite large multilingual training, struggle in out-of-domain and low-resource scenarios where labeled data is scarce. We propose BEARD (BEST-RQ ...
- ReplicationBench: Can AI Agents Replicate Astrophysics Research Papers? : Abstract: Frontier AI agents show increasing promise as scientific research assistants, and may eventually be useful for extended, open-ended research workflows. However, in order to use agents for no...
- ReForm: Reflective Autoformalization with Prospective Bounded Sequence Optimization : Abstract: Autoformalization, which translates natural language mathematics into machine-verifiable formal statements, is critical for using formal mathematical reasoning to solve math problems stated ...
- Diffusion LLM with Native Variable Generation Lengths: Let [EOS] Lead the Way : Abstract: Diffusion-based large language models (dLLMs) have exhibited substantial potential for parallel text generation, which may enable more efficient generation compared to autoregressive models....
- Long-Context Modeling with Dynamic Hierarchical Sparse Attention for On-Device LLMs : Abstract: The quadratic cost of attention hinders the scalability of long-context LLMs, especially in resource-constrained settings. Existing static sparse methods such as sliding windows or global to...
- Relative Scaling Laws for LLMs : Abstract: Scaling laws describe how language models improve with additional data, parameters, and compute. While widely used, they are typically measured on aggregate test sets. Aggregate evaluations ...
- "Mm, Wat?" Detecting Other-initiated Repair Requests in Dialogue : Abstract: Maintaining mutual understanding is a key component in human-human conversation to avoid conversation breakdowns, in which repair, particularly Other-Initiated Repair (OIR, when one speaker ...
- OpenReward: Learning to Reward Long-form Agentic Tasks via Reinforcement Learning : Abstract: Reward models (RMs) have become essential for aligning large language models (LLMs), serving as scalable proxies for human evaluation in both training and inference. However, existing RMs st...
- Quantifying the Effects of Word Length, Frequency, and Predictability on Dyslexia : Abstract: We ask where, and under what conditions, dyslexic reading costs arise in a large-scale naturalistic reading dataset. Using eye-tracking aligned to word-level features (word length, frequency...
- Optimizing Retrieval for RAG via Reinforced Contrastive Learning : Abstract: As retrieval-augmented generation (RAG) becomes increasingly widespread, the role of information retrieval (IR) is shifting from retrieving information for human users to retrieving contextu...
- Evolving Diagnostic Agents in a Virtual Clinical Environment : Abstract: In this paper, we present a framework for training large language models (LLMs) as diagnostic agents with reinforcement learning, enabling them to manage multi-turn diagnostic processes, ada...
- MQM Re-Annotation: A Technique for Collaborative Evaluation of Machine Translation : Abstract: Human evaluation of machine translation is in an arms race with translation model quality: as our models get better, our evaluation methods need to be improved to ensure that quality gains a...
- SPICE: Self-Play In Corpus Environments Improves Reasoning : Abstract: Self-improving systems require environmental interaction for continuous adaptation. We introduce SPICE (Self-Play In Corpus Environments), a reinforcement learning framework where a single m...
- AgentFrontier: Expanding the Capability Frontier of LLM Agents with ZPD-Guided Data Synthesis : Abstract: Training large language model agents on tasks at the frontier of their capabilities is key to unlocking advanced reasoning. We introduce a data synthesis approach inspired by the educational...
- WebLeaper: Empowering Efficiency and Efficacy in WebAgent via Enabling Info-Rich Seeking : Abstract: Large Language Model (LLM)-based agents have emerged as a transformative approach for open-ended problem solving, with information seeking (IS) being a core capability that enables autonomou...
- MetricX-25 and GemSpanEval: Google Translate Submissions to the WMT25 Evaluation Shared Task : Abstract: In this paper, we present our submissions to the unified WMT25 Translation Evaluation Shared Task. For the Quality Score Prediction subtask, we create a new generation of MetricX with improv...
- RoboOmni: Proactive Robot Manipulation in Omni-modal Context : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have driven rapid progress in Vision-Language-Action (VLA) models for robotic manipulation. Although effective in many scenarios, ...
- emg2speech: synthesizing speech from electromyography using self-supervised speech models : Abstract: We present a neuromuscular speech interface that translates electromyographic (EMG) signals collected from orofacial muscles during speech articulation directly into audio. We show that self...
- Automatically Benchmarking LLM Code Agents through Agent-Driven Annotation and Evaluation : Abstract: Recent advances in code agents have enabled automated software development at the project level, supported by large language models (LLMs) and widely adopted tools. However, existing benchma...
- Latent Sketchpad: Sketching Visual Thoughts to Elicit Multimodal Reasoning in MLLMs : Abstract: While Multimodal Large Language Models (MLLMs) excel at visual understanding, they often struggle in complex scenarios that require visual planning and imagination. Inspired by how humans us...
- STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence : Abstract: Despite rapid progress in Multi-modal Large Language Models and Large Audio-Language Models, existing audio benchmarks largely test semantics that can be recovered from text captions, maskin...
- Evaluation of Geographical Distortions in Language Models : Abstract: Language models now constitute essential tools for improving efficiency for many professional tasks such as writing, coding, or learning. For this reason, it is imperative to identify inhere...
- Zero-Shot Tokenizer Transfer : Abstract: Language models (LMs) are bound to their tokenizer, which maps raw text to a sequence of vocabulary items (tokens). This restricts their flexibility: for example, LMs trained primarily on En...
- Discourse Features Enhance Detection of Document-Level Machine-Generated Content : Abstract: The availability of high-quality APIs for Large Language Models (LLMs) has facilitated the widespread creation of Machine-Generated Content (MGC), posing challenges such as academic plagiari...
- Face the Facts! Evaluating RAG-based Fact-checking Pipelines in Realistic Settings : Abstract: Natural Language Processing and Generation systems have recently shown the potential to complement and streamline the costly and time-consuming job of professional fact-checkers. In this wor...
- NeedleInATable: Exploring Long-Context Capability of Large Language Models towards Long-Structured Tables : Abstract: Processing structured tabular data, particularly large and lengthy tables, constitutes a fundamental yet challenging task for large language models (LLMs). However, existing long-context ben...
- The Hawthorne Effect in Reasoning Models: Evaluating and Steering Test Awareness : Abstract: Reasoning-focused LLMs sometimes alter their behavior when they detect that they are being evaluated, which can lead them to optimize for test-passing performance or to comply more readily w...
- Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector : Abstract: LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mi...
- AdaRewriter: Unleashing the Power of Prompting-based Conversational Query Reformulation via Test-Time Adaptation : Abstract: Prompting-based conversational query reformulation has emerged as a powerful approach for conversational search, refining ambiguous user queries into standalone search queries. Best-of-N ref...
- SANSKRITI: A Comprehensive Benchmark for Evaluating Language Models' Knowledge of Indian Culture : Abstract: Language Models (LMs) are indispensable tools shaping modern workflows, but their global effectiveness depends on understanding local socio-cultural contexts. To address this, we introduce S...
- RareFlow: Physics-Aware Flow-Matching for Cross-Sensor Super-Resolution of Rare-Earth Features : Abstract: Super-resolution (SR) for remote sensing imagery often fails under out-of-distribution (OOD) conditions, such as rare geomorphic features captured by diverse sensors, producing visually plau...
- TRELLISWorld: Training-Free World Generation from Object Generators : Abstract: Text-driven 3D scene generation holds promise for a wide range of applications, from virtual prototyping to AR/VR and simulation. However, existing methods are often constrained to single-ob...
- Improving Visual Discriminability of CLIP for Training-Free Open-Vocabulary Semantic Segmentation : Abstract: Extending CLIP models to semantic segmentation remains challenging due to the misalignment between their image-level pre-training objectives and the pixel-level visual understanding required...
- TurboPortrait3D: Single-step diffusion-based fast portrait novel-view synthesis : Abstract: We introduce TurboPortrait3D: a method for low-latency novel-view synthesis of human portraits. Our approach builds on the observation that existing image-to-3D models for portrait generatio...
- PlanarGS: High-Fidelity Indoor 3D Gaussian Splatting Guided by Vision-Language Planar Priors : Abstract: Three-dimensional Gaussian Splatting (3DGS) has recently emerged as an efficient representation for novel-view synthesis, achieving impressive visual quality. However, in scenes dominated by...
- Adaptive Training of INRs via Pruning and Densification : Abstract: Encoding input coordinates with sinusoidal functions into multilayer perceptrons (MLPs) has proven effective for implicit neural representations (INRs) of low-dimensional signals, enabling t...
- Reasoning Visual Language Model for Chest X-Ray Analysis : Abstract: Vision-language models (VLMs) have shown strong promise for medical image analysis, but most remain opaque, offering predictions without the transparent, stepwise reasoning clinicians rely o...
- Efficient Cost-and-Quality Controllable Arbitrary-scale Super-resolution with Fourier Constraints : Abstract: Cost-and-Quality (CQ) controllability in arbitrary-scale super-resolution is crucial. Existing methods predict Fourier components one by one using a recurrent neural network. However, this a...
- TeleEgo: Benchmarking Egocentric AI Assistants in the Wild : Abstract: Egocentric AI assistants in real-world settings must process multi-modal inputs (video, audio, text), respond in real time, and retain evolving long-term memory. However, existing benchmarks...
- AdvBlur: Adversarial Blur for Robust Diabetic Retinopathy Classification and Cross-Domain Generalization : Abstract: Diabetic retinopathy (DR) is a leading cause of vision loss worldwide, yet early and accurate detection can significantly improve treatment outcomes. While numerous Deep learning (DL) models...
- Towards the Automatic Segmentation, Modeling and Meshing of the Aortic Vessel Tree from Multicenter Acquisitions: An Overview of the SEG.A. 2023 Segmentation of the Aorta Challenge : Abstract: The automated analysis of the aortic vessel tree (AVT) from computed tomography angiography (CTA) holds immense clinical potential, but its development has been impeded by a lack of shared, ...
- AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts : Abstract: Despite rapid advancements in text-to-image (T2I) models, their safety mechanisms are vulnerable to adversarial prompts, which maliciously generate unsafe images. Current red-teaming methods...
- Enhancing CLIP Robustness via Cross-Modality Alignment : Abstract: Vision-language models (VLMs) such as CLIP demonstrate strong generalization in zero-shot classification but remain highly vulnerable to adversarial perturbations. Existing methods primarily...
- Beyond Objects: Contextual Synthetic Data Generation for Fine-Grained Classification : Abstract: Text-to-image (T2I) models are increasingly used for synthetic dataset generation, but generating effective synthetic training data for classification remains challenging. Fine-tuning a T2I ...
- OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation : Abstract: Recent advancements in diffusion-based text synthesis have demonstrated significant performance in inserting and editing text within images via inpainting. However, despite the potential of ...
- UHKD: A Unified Framework for Heterogeneous Knowledge Distillation via Frequency-Domain Representations : Abstract: Knowledge distillation (KD) is an effective model compression technique that transfers knowledge from a high-performance teacher to a lightweight student, reducing cost while maintaining acc...
- DogMo: A Large-Scale Multi-View RGB-D Dataset for 4D Canine Motion Recovery : Abstract: We present DogMo, a large-scale multi-view RGB-D video dataset capturing diverse canine movements for the task of motion recovery from images. DogMo comprises 1.2k motion sequences collected...
- ETC: training-free diffusion models acceleration with Error-aware Trend Consistency : Abstract: Diffusion models have achieved remarkable generative quality but remain bottlenecked by costly iterative sampling. Recent training-free methods accelerate diffusion process by reusing model ...
- MSRANetV2: An Explainable Deep Learning Architecture for Multi-class Classification of Colorectal Histopathological Images : Abstract: Colorectal cancer (CRC) is a leading worldwide cause of cancer-related mortality, and the role of prompt precise detection is of paramount interest in improving patient outcomes. Conventiona...
- Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2 : Abstract: Recent studies reveal the vulnerability of the image segmentation foundation model SAM to adversarial examples. Its successor, SAM2, has attracted significant attention due to its strong gen...
- CLFSeg: A Fuzzy-Logic based Solution for Boundary Clarity and Uncertainty Reduction in Medical Image Segmentation : Abstract: Accurate polyp and cardiac segmentation for early detection and treatment is essential for the diagnosis and treatment planning of cancer-like diseases. Traditional convolutional neural netw...
- MC-SJD : Maximal Coupling Speculative Jacobi Decoding for Autoregressive Visual Generation Acceleration : Abstract: While autoregressive (AR) modeling has recently emerged as a new paradigm in visual generation, its practical adoption is severely constrained by the slow inference speed of per-token genera...
- Beyond Inference Intervention: Identity-Decoupled Diffusion for Face Anonymization : Abstract: Face anonymization aims to conceal identity information while preserving non-identity attributes. Mainstream diffusion models rely on inference-time interventions such as negative guidance o...
- SCOPE: Saliency-Coverage Oriented Token Pruning for Efficient Multimodel LLMs : Abstract: Multimodal Large Language Models (MLLMs) typically process a large number of visual tokens, leading to considerable computational overhead, even though many of these tokens are redundant. Ex...
- Benchmarking Microsaccade Recognition with Event Cameras: A Novel Dataset and Evaluation : Abstract: Microsaccades are small, involuntary eye movements vital for visual perception and neural processing. Traditional microsaccade studies typically use eye trackers or frame-based analysis, whi...
- Delving into Cascaded Instability: A Lipschitz Continuity View on Image Restoration and Object Detection Synergy : Abstract: To improve detection robustness in adverse conditions (e.g., haze and low light), image restoration is commonly applied as a pre-processing step to enhance image quality for the detector. Ho...
- DeshadowMamba: Deshadowing as 1D Sequential Similarity : Abstract: Recent deep models for image shadow removal often rely on attention-based architectures to capture long-range dependencies. However, their fixed attention patterns tend to mix illumination c...
- Adaptive Knowledge Transferring with Switching Dual-Student Framework for Semi-Supervised Medical Image Segmentation : Abstract: Teacher-student frameworks have emerged as a leading approach in semi-supervised medical image segmentation, demonstrating strong performance across various tasks. However, the learning effe...
- Decoupling What to Count and Where to See for Referring Expression Counting : Abstract: Referring Expression Counting (REC) extends class-level object counting to the fine-grained subclass-level, aiming to enumerate objects matching a textual expression that specifies both the ...
- Stroke Lesion Segmentation in Clinical Workflows: A Modular, Lightweight, and Deployment-Ready Tool : Abstract: Deep learning frameworks such as nnU-Net achieve state-of-the-art performance in brain lesion segmentation but remain difficult to deploy clinically due to heavy dependencies and monolithic ...
- A Luminance-Aware Multi-Scale Network for Polarization Image Fusion with a Multi-Scene Dataset : Abstract: Polarization image fusion combines S0 and DOLP images to reveal surface roughness and material properties through complementary texture features, which has important applications in camoufla...
- When are radiology reports useful for training medical image classifiers? : Abstract: Medical images used to train machine learning models are often accompanied by radiology reports containing rich expert annotations. However, relying on these reports as inputs for clinical p...
- Unsupervised Detection of Post-Stroke Brain Abnormalities : Abstract: Post-stroke MRI not only delineates focal lesions but also reveals secondary structural changes, such as atrophy and ventricular enlargement. These abnormalities, increasingly recognised as ...
- GenTrack: A New Generation of Multi-Object Tracking : Abstract: This paper introduces a novel multi-object tracking (MOT) method, dubbed GenTrack, whose main contributions include: a hybrid tracking approach employing both stochastic and deterministic ma...
- A Hybrid Approach for Visual Multi-Object Tracking : Abstract: This paper proposes a visual multi-object tracking method that jointly employs stochastic and deterministic mechanisms to ensure identifier consistency for unknown and time-varying target nu...
- 50 Years of Water Body Monitoring: The Case of Qaraaoun Reservoir, Lebanon : Abstract: The sustainable management of the Qaraaoun Reservoir, the largest surface water body in Lebanon located in the Bekaa Plain, depends on reliable monitoring of its storage volume despite frequ...
- XAI Evaluation Framework for Semantic Segmentation : Abstract: Ensuring transparency and trust in artificial intelligence (AI) models is essential, particularly as they are increasingly applied in safety-critical and high-stakes domains. Explainable AI ...
- Deeply-Conditioned Image Compression via Self-Generated Priors : Abstract: Learned image compression (LIC) has shown great promise for achieving high rate-distortion performance. However, current LIC methods are often limited in their capability to model the comple...
- A Critical Study towards the Detection of Parkinsons Disease using ML Technologies : Abstract: The proposed solution is Deep Learning Technique that will be able classify three types of tea leaves diseases from which two diseases are caused by the pests and one due to pathogens (infec...
- Kineo: Calibration-Free Metric Motion Capture From Sparse RGB Cameras : Abstract: Markerless multiview motion capture is often constrained by the need for precise camera calibration, limiting accessibility for non-experts and in-the-wild captures. Existing calibration-fre...
- Decoupled MeanFlow: Turning Flow Models into Flow Maps for Accelerated Sampling : Abstract: Denoising generative models, such as diffusion and flow-based models, produce high-quality samples but require many denoising steps due to discretization error. Flow maps, which estimate the...
- Fast and accurate neural reflectance transformation imaging through knowledge distillation : Abstract: Reflectance Transformation Imaging (RTI) is very popular for its ability to visually analyze surfaces by enhancing surface details through interactive relighting, starting from only a few te...
- OSWorld-MCP: Benchmarking MCP Tool Invocation In Computer-Use Agents : Abstract: With advances in decision-making and reasoning capabilities, multimodal agents show strong potential in computer application scenarios. Past evaluations have mainly assessed GUI interaction ...
- Physics-Inspired Gaussian Kolmogorov-Arnold Networks for X-ray Scatter Correction in Cone-Beam CT : Abstract: Cone-beam CT (CBCT) employs a flat-panel detector to achieve three-dimensional imaging with high spatial resolution. However, CBCT is susceptible to scatter during data acquisition, which in...
- A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries : Abstract: The rapid advancement of generative AI has enabled the creation of highly realistic forged facial images, posing significant threats to AI security, digital media integrity, and public trust...
- Eye-Tracking, Mouse Tracking, Stimulus Tracking,and Decision-Making Datasets in Digital Pathology : Abstract: Interpretation of giga-pixel whole-slide images (WSIs) is an important but difficult task for pathologists. Their diagnostic accuracy is estimated to average around 70%. Adding a second path...
- Group Relative Attention Guidance for Image Editing : Abstract: Recently, image editing based on Diffusion-in-Transformer models has undergone rapid development. However, existing editing methods often lack effective control over the degree of editing, l...
- SAGE: Structure-Aware Generative Video Transitions between Diverse Clips : Abstract: Video transitions aim to synthesize intermediate frames between two clips, but naive approaches such as linear blending introduce artifacts that limit professional use or break temporal cohe...
- MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection : Abstract: Infrastructure-based perception plays a crucial role in intelligent transportation systems, offering global situational awareness and enabling cooperative autonomy. However, existing camera-...
- Routing Matters in MoE: Scaling Diffusion Transformers with Explicit Routing Guidance : Abstract: Mixture-of-Experts (MoE) has emerged as a powerful paradigm for scaling model capacity while preserving computational efficiency. Despite its notable success in large language models (LLMs),...
- Uniform Discrete Diffusion with Metric Path for Video Generation : Abstract: Continuous-space video generation has advanced rapidly, while discrete approaches lag behind due to error accumulation and long-context inconsistency. In this work, we revisit discrete gener...
- Inter-turbine Modelling of Wind-Farm Power using Multi-task Learning : Abstract: Because of the global need to increase power production from renewable energy resources, developments in the online monitoring of the associated infrastructure is of interest to reduce opera...
- Federated Structured Sparse PCA for Anomaly Detection in IoT Networks : Abstract: Although federated learning has gained prominence as a privacy-preserving framework tailored for distributed Internet of Things (IoT) environments, current federated principal component anal...
- Data Fusion of Deep Learned Molecular Embeddings for Property Prediction : Abstract: Data-driven approaches such as deep learning can result in predictive models for material properties with exceptional accuracy and efficiency. However, in many applications, data is sparse, ...
- Clustering-Based Low-Rank Matrix Approximation for Medical Image Compression : Abstract: Medical images are inherently high-resolution and contain locally varying structures crucial for diagnosis. Efficient compression must preserve diagnostic fidelity while minimizing redundanc...
- Turbocharging Gaussian Process Inference with Approximate Sketch-and-Project : Abstract: Gaussian processes (GPs) play an essential role in biostatistics, scientific machine learning, and Bayesian optimization for their ability to provide probabilistic predictions and model unce...
- JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model : Abstract: Large language models (LLMs) have revolutionized natural language processing and are increasingly applied to other sequential data types, including genetic sequences. However, adapting LLMs ...
- CT-OT Flow: Estimating Continuous-Time Dynamics from Discrete Temporal Snapshots : Abstract: In many real-world settings--e.g., single-cell RNA sequencing, mobility sensing, and environmental monitoring--data are observed only as temporally aggregated snapshots collected over finite...
- Causal Spatio-Temporal Prediction: An Effective and Efficient Multi-Modal Approach : Abstract: Spatio-temporal prediction plays a crucial role in intelligent transportation, weather forecasting, and urban planning. While integrating multi-modal data has shown potential for enhancing p...
- Why Diffusion Models Don't Memorize: The Role of Implicit Dynamical Regularization in Training : Abstract: Diffusion models have achieved remarkable success across a wide range of generative tasks. A key challenge is understanding the mechanisms that prevent their memorization of training data an...
- URB - Urban Routing Benchmark for RL-equipped Connected Autonomous Vehicles : Abstract: Connected Autonomous Vehicles (CAVs) promise to reduce congestion in future urban networks, potentially by optimizing their routing decisions. Unlike for human drivers, these decisions can b...
- Structured Reinforcement Learning for Combinatorial Decision-Making : Abstract: Reinforcement learning (RL) is increasingly applied to real-world problems involving complex and structured decisions, such as routing, scheduling, and assortment planning. These settings ch...
- DeepRTE: Pre-trained Attention-based Neural Network for Radiative Transfer : Abstract: In this paper, we propose a novel neural network approach, termed DeepRTE, to address the steady-state Radiative Transfer Equation (RTE). The RTE is a differential-integral equation that gov...
- Practical Bayes-Optimal Membership Inference Attacks : Abstract: We develop practical and theoretically grounded membership inference attacks (MIAs) against both independent and identically distributed (i.i.d.) data and graph-structured data. Building on ...
- Advancing Compositional Awareness in CLIP with Efficient Fine-Tuning : Abstract: Vision-language models like CLIP have demonstrated remarkable zero-shot capabilities in classification and retrieval. However, these models often struggle with compositional reasoning - the ...
- Two-Stage Learning of Stabilizing Neural Controllers via Zubov Sampling and Iterative Domain Expansion : Abstract: Learning-based neural network (NN) control policies have shown impressive empirical performance. However, obtaining stability guarantees and estimates of the region of attraction of these le...
- RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases : Abstract: Recent advances have demonstrated the effectiveness of graph-based learning on relational databases (RDBs) for predictive tasks. Such approaches require transforming RDBs into graphs, a proc...
- Trade-offs in Data Memorization via Strong Data Processing Inequalities : Abstract: Recent research demonstrated that training large language models involves memorization of a significant fraction of training data. Such memorization can lead to privacy violations when train...
- GeoClip: Geometry-Aware Clipping for Differentially Private SGD : Abstract: Differentially private stochastic gradient descent (DP-SGD) is the most widely used method for training machine learning models with provable privacy guarantees. A key challenge in DP-SGD is...
- CausalPFN: Amortized Causal Effect Estimation via In-Context Learning : Abstract: Causal effect estimation from observational data is fundamental across various applications. However, selecting an appropriate estimator from dozens of specialized methods demands substantia...
- Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning : Abstract: Machine Unlearning (MU) aims to update Machine Learning (ML) models following requests to remove training samples and their influences on a trained model efficiently without retraining the o...
- Riemannian-Geometric Fingerprints of Generative Models : Abstract: Recent breakthroughs and rapid integration of generative models (GMs) have sparked interest in the problem of model attribution and their fingerprints. For instance, service providers need r...
- PPFL-RDSN: Privacy-Preserving Federated Learning-based Residual Dense Spatial Networks for Encrypted Lossy Image Reconstruction : Abstract: Reconstructing high-quality images from low-resolution inputs using Residual Dense Spatial Networks (RDSNs) is crucial yet challenging. It is even more challenging in centralized training wh...
- Sample Complexity Bounds for Linear Constrained MDPs with a Generative Model : Abstract: We consider infinite-horizon $\gamma$-discounted (linear) constrained Markov decision processes (CMDPs) where the objective is to find a policy that maximizes the expected cumulative reward ...
- High-Energy Concentration for Federated Learning in Frequency Domain : Abstract: Federated Learning (FL) presents significant potential for collaborative optimization without data sharing. Since synthetic data is sent to the server, leveraging the popular concept of data...
- Assessing the robustness of heterogeneous treatment effects in survival analysis under informative censoring : Abstract: Dropout is common in clinical studies, with up to half of patients leaving early due to side effects or other reasons. When dropout is informative (i.e., dependent on survival time), it intr...
- MARS-M: When Variance Reduction Meets Matrices : Abstract: Matrix-based preconditioned optimizers, such as Muon, have recently been shown to be more efficient than scalar-based optimizers for training large-scale neural networks, including large lan...
- Tractable Shapley Values and Interactions via Tensor Networks : Abstract: We show how to replace the O(2^n) coalition enumeration over n features behind Shapley values and Shapley-style interaction indices with a few-evaluation scheme on a tensor-network (TN) surr...
- SeeDNorm: Self-Rescaled Dynamic Normalization : Abstract: Normalization layer constitutes an essential component in neural networks. In transformers, the predominantly used RMSNorm constrains vectors to a unit hypersphere, followed by dimension-wis...
- Towards Personalized Treatment Plan: Geometrical Model-Agnostic Approach to Counterfactual Explanations : Abstract: In our article, we describe a method for generating counterfactual explanations in high-dimensional spaces using four steps that involve fitting our dataset to a model, finding the decision ...
- RL-AUX: Reinforcement Learning for Auxiliary Task Generation : Abstract: Auxiliary Learning (AL) is a special case of Multi-task Learning (MTL) in which a network trains on auxiliary tasks to improve performance on its main task. This technique is used to improve...
- SGFusion: Stochastic Geographic Gradient Fusion in Federated Learning : Abstract: This paper proposes Stochastic Geographic Gradient Fusion (SGFusion), a novel training algorithm to leverage the geographic information of mobile users in Federated Learning (FL). SGFusion m...
- Minimax Optimal Transfer Learning for Kernel-based Nonparametric Regression : Abstract: In recent years, transfer learning has garnered significant attention in the machine learning community. Its ability to leverage knowledge from related studies to improve generalization perf...
- Says Who? Effective Zero-Shot Annotation of Focalization : Abstract: Focalization describes the way in which access to narrative information is restricted or controlled based on the knowledge available to knowledge of the narrator. It is encoded via a wide ra...
- Global Optimization of Gaussian Process Acquisition Functions Using a Piecewise-Linear Kernel Approximation : Abstract: Bayesian optimization relies on iteratively constructing and optimizing an acquisition function. The latter turns out to be a challenging, non-convex optimization problem itself. Despite the...
- Forecasting Outside the Box: Application-Driven Optimal Pointwise Forecasts for Stochastic Optimization : Abstract: We study a class of two-stage stochastic programs, namely, those with fixed recourse matrix and fixed costs, and linear second stage. We show that, under mild assumptions, the problem can be...
- Program Evaluation with Remotely Sensed Outcomes : Abstract: Economists often estimate treatment effects in experiments using remotely sensed variables (RSVs), e.g., satellite images or mobile phone activity, in place of directly measured economic out...
- Unveiling Concept Attribution in Diffusion Models : Abstract: Diffusion models have shown remarkable abilities in generating realistic and high-quality images from text prompts. However, a trained model remains largely black-box; little do we know abou...
- MDP3: A Training-free Approach for List-wise Frame Selection in Video-LLMs : Abstract: Video large language models (Video-LLMs) have made significant progress in understanding videos. However, processing multiple frames leads to lengthy visual token sequences, presenting chall...
- Detecting Neurocognitive Disorders through Analyses of Topic Evolution and Cross-modal Consistency in Visual-Stimulated Narratives : Abstract: Early detection of neurocognitive disorders (NCDs) is crucial for timely intervention and disease management. Given that language impairments manifest early in NCD progression, visual-stimul...
- CAUSAL3D: A Comprehensive Benchmark for Causal Learning from Visual Data : Abstract: True intelligence hinges on the ability to uncover and leverage hidden causal relations. Despite significant progress in AI and computer vision (CV), there remains a lack of benchmarks for a...
- Hybrid Deep Learning Model to Estimate Cognitive Effort from fNIRS Signals : Abstract: This study estimates cognitive effort based on functional near-infrared spectroscopy data and performance scores using a hybrid DeepNet model. The estimation of cognitive effort enables educ...
- Global urban visual perception varies across demographics and personalities : Abstract: Understanding people's preferences is crucial for urban planning, yet current approaches often combine responses from multi-cultural populations, obscuring demographic differences and riskin...
- Attention-based clustering : Abstract: Transformers have emerged as a powerful neural network architecture capable of tackling a wide range of learning tasks. In this work, we provide a theoretical analysis of their ability to au...
- Securing Transfer-Learned Networks with Reverse Homomorphic Encryption : Abstract: The growing body of literature on training-data reconstruction attacks raises significant concerns about deploying neural network classifiers trained on sensitive data. However, differential...
- Acoustic and Machine Learning Methods for Speech-Based Suicide Risk Assessment: A Systematic Review : Abstract: Suicide remains a public health challenge, necessitating improved detection methods to facilitate timely intervention and treatment. This systematic review evaluates the role of Artificial I...
- Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation : Abstract: Recent progress in Sign Language Translation (SLT) has focussed primarily on improving the representational capacity of large language models to incorporate Sign Language features. This work...
- Linear regression with overparameterized linear neural networks: Tight upper and lower bounds for implicit $\ell^1$-regularization : Abstract: Modern machine learning models are often trained in a setting where the number of parameters exceeds the number of training samples. To understand the implicit bias of gradient descent in su...
- Doubly-Robust Estimation of Counterfactual Policy Mean Embeddings : Abstract: Estimating the distribution of outcomes under counterfactual policies is critical for decision-making in domains such as recommendation, advertising, and healthcare. We propose and analyze a...
- Offline RL by Reward-Weighted Fine-Tuning for Conversation Optimization : Abstract: Offline reinforcement learning (RL) is a variant of RL where the policy is learned from a previously collected dataset of trajectories and rewards. In our work, we propose a practical approa...
- Seeding neural network quantum states with tensor network states : Abstract: We find an efficient approach to approximately convert matrix product states (MPSs) into restricted Boltzmann machine wave functions consisting of a multinomial hidden unit through a canonic...
- Taxonomy and Trends in Reinforcement Learning for Robotics and Control Systems: A Structured Review : Abstract: Reinforcement learning (RL) has become a foundational approach for enabling intelligent robotic behavior in dynamic and uncertain environments. This work presents an in-depth review of RL pr...
- Inferring Group Intent as a Cooperative Game. An NLP-based Framework for Trajectory Analysis using Graph Transformer Neural Network : Abstract: This paper studies group target trajectory intent as the outcome of a cooperative game where the complex-spatio trajectories are modeled using an NLP-based generative model. In our framework...
- Breaking the Benchmark: Revealing LLM Bias via Minimal Contextual Augmentation : Abstract: Large Language Models have been shown to demonstrate stereotypical biases in their representations and behavior due to the discriminative nature of the data that they have been trained on. D...
- Understanding Fairness and Prediction Error through Subspace Decomposition and Influence Analysis : Abstract: Machine learning models have achieved widespread success but often inherit and amplify historical biases, resulting in unfair outcomes. Traditional fairness methods typically impose constrai...
- Score-based constrained generative modeling via Langevin diffusions with boundary conditions : Abstract: Score-based generative models based on stochastic differential equations (SDEs) achieve impressive performance in sampling from unknown distributions, but often fail to satisfy underlying co...
- Auto-Adaptive PINNs with Applications to Phase Transitions : Abstract: We propose an adaptive sampling method for the training of Physics Informed Neural Networks (PINNs) which allows for sampling based on an arbitrary problem-specific heuristic which may depen...
- Kernelized Sparse Fine-Tuning with Bi-level Parameter Competition for Vision Models : Abstract: Parameter-efficient fine-tuning (PEFT) aims to adapt pre-trained vision models to downstream tasks. Among PEFT paradigms, sparse tuning achieves remarkable performance by adjusting only the ...
- Language-Conditioned Representations and Mixture-of-Experts Policy for Robust Multi-Task Robotic Manipulation : Abstract: Perceptual ambiguity and task conflict limit multitask robotic manipulation via imitation learning. We propose a framework combining a Language-Conditioned Visual Representation (LCVR) modul...
- Copula-Stein Discrepancy: A Generator-Based Stein Operator for Archimedean Dependence : Abstract: Kernel Stein discrepancies (KSDs) have become a principal tool for goodness-of-fit testing, but standard KSDs are often insensitive to higher-order dependency structures, such as tail depend...
- Deep Learning-Enhanced Calibration of the Heston Model: A Unified Framework : Abstract: The Heston stochastic volatility model is a widely used tool in financial mathematics for pricing European options. However, its calibration remains computationally intensive and sensitive t...
- Enhancing Pre-trained Representation Classifiability can Boost its Interpretability : Abstract: The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requireme...
- Self-Concordant Perturbations for Linear Bandits : Abstract: We study the adversarial linear bandits problem and present a unified algorithmic framework that bridges Follow-the-Regularized-Leader (FTRL) and Follow-the-Perturbed-Leader (FTPL) methods, ...
- Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames : Abstract: Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for th...
- Beyond Neural Incompatibility: Easing Cross-Scale Knowledge Transfer in Large Language Models through Latent Semantic Alignment : Abstract: Large Language Models (LLMs) encode vast amounts of knowledge in their massive parameters, which is accessible to locate, trace, and analyze. Despite advances in neural interpretability, it ...
- What Can Be Recovered Under Sparse Adversarial Corruption? Assumption-Free Theory for Linear Measurements : Abstract: Let $\mathbf{A} \in \mathbb{R}^{m \times n}$ be an arbitrary, known matrix and $\mathbf{e}$ a $q$-sparse adversarial vector. Given $\mathbf{y} = \mathbf{A} x^* + \mathbf{e}$ and $q$, we seek...
- A comparison between joint and dual UKF implementations for state estimation and leak localization in water distribution networks : Abstract: The sustainability of modern cities highly depends on efficient water distribution management, including effective pressure control and leak detection and localization. Accurate information ...
- Forecasting precipitation in the Arctic using probabilistic machine learning informed by causal climate drivers : Abstract: Understanding and forecasting precipitation events in the Arctic maritime environments, such as Bear Island and Ny-{\AA}lesund, is crucial for assessing climate risk and developing early war...
- From Memorization to Reasoning in the Spectrum of Loss Curvature : Abstract: We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a...
- UtilGen: Utility-Centric Generative Data Augmentation with Dual-Level Task Adaptation : Abstract: Data augmentation using generative models has emerged as a powerful paradigm for enhancing performance in computer vision tasks. However, most existing augmentation approaches primarily focu...
- HergNet: a Fast Neural Surrogate Model for Sound Field Predictions via Superposition of Plane Waves : Abstract: We present a novel neural network architecture for the efficient prediction of sound fields in two and three dimensions. The network is designed to automatically satisfy the Helmholtz equati...
- Towards actionable hypotension prediction- predicting catecholamine therapy initiation in the intensive care unit : Abstract: Hypotension in critically ill ICU patients is common and life-threatening. Escalation to catecholamine therapy marks a key management step, with both undertreatment and overtreatment posing ...
- Problem-Parameter-Free Decentralized Bilevel Optimization : Abstract: Decentralized bilevel optimization has garnered significant attention due to its critical role in solving large-scale machine learning problems. However, existing methods often rely on prior...
- Attack on a PUF-based Secure Binary Neural Network : Abstract: Binarized Neural Networks (BNNs) deployed on memristive crossbar arrays provide energy-efficient solutions for edge computing but are susceptible to physical attacks due to memristor nonvola...
- Nearest Neighbor Matching as Least Squares Density Ratio Estimation and Riesz Regression : Abstract: This study proves that Nearest Neighbor (NN) matching can be interpreted as an instance of Riesz regression for automatic debiased machine learning. Lin et al. (2023) shows that NN matching ...
- ARIMA_PLUS: Large-scale, Accurate, Automatic and Interpretable In-Database Time Series Forecasting and Anomaly Detection in Google BigQuery : Abstract: Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) eff...
- Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations : Abstract: The theory of training deep networks has become a central question of modern machine learning and has inspired many practical advancements. In particular, the gradient descent (GD) optimizat...
- Unsupervised Machine-Learning Pipeline for Data-Driven Defect Detection and Characterisation: Application to Displacement Cascades : Abstract: Neutron irradiation produces, within a few picoseconds, displacement cascades that are sequences of atomic collisions generating point and extended defects which subsequently affects the lon...
- Dual-Mind World Models: A General Framework for Learning in Dynamic Wireless Networks : Abstract: Despite the popularity of reinforcement learning (RL) in wireless networks, existing approaches that rely on model-free RL (MFRL) and model-based RL (MBRL) are data inefficient and short-sig...
- Enforcing boundary conditions for physics-informed neural operators : Abstract: Machine-learning based methods like physics-informed neural networks and physics-informed neural operators are becoming increasingly adept at solving even complex systems of partial differen...
- Comparison of generalised additive models and neural networks in applications: A systematic review : Abstract: Neural networks have become a popular tool in predictive modelling, more commonly associated with machine learning and artificial intelligence than with statistics. Generalised Additive Mode...
- Statistical physics of deep learning: Optimal learning of a multi-layer perceptron near interpolation : Abstract: For three decades statistical physics has been providing a framework to analyse neural networks. A long-standing question remained on its capacity to tackle deep learning models capturing ri...
- Coreset for Robust Geometric Median: Eliminating Size Dependency on Outliers : Abstract: We study the robust geometric median problem in Euclidean space $\mathbb{R}^d$, with a focus on coreset construction.A coreset is a compact summary of a dataset $P$ of size $n$ that approxim...
- A Single-Loop First-Order Algorithm for Linearly Constrained Bilevel Optimization : Abstract: We study bilevel optimization problems where the lower-level problems are strongly convex and have coupled linear constraints. To overcome the potential non-smoothness of the hyper-objective...
- Generative View Stitching : Abstract: Autoregressive video diffusion models are capable of long rollouts that are stable and consistent with history, but they are unable to guide the current generation with conditioning from the...
- Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits : Abstract: Probabilistic circuits (PCs) are a promising avenue for probabilistic modeling, as they permit a wide range of exact and efficient inference routines. Recent ``deep-learning-style'' implemen...
- Online (Non-)Convex Learning via Tempered Optimism : Abstract: Optimistic Online Learning aims to exploit experts conveying reliable information to predict the future. However, such implicit optimism may be challenged when it comes to practical crafting...
- Datasheets for Machine Learning Sensors : Abstract: Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applicat...
- FedMAP: Personalised Federated Learning for Real Large-Scale Healthcare Systems : Abstract: Federated learning (FL) promises to enable collaborative machine learning across healthcare sites whilst preserving data privacy. Practical deployment remains limited by statistical heteroge...
- TIDMAD: Time Series Dataset for Discovering Dark Matter with AI Denoising : Abstract: Dark matter makes up approximately 85% of total matter in our universe, yet it has never been directly observed in any laboratory on Earth. The origin of dark matter is one of the most impor...
- DeltaPhi: Physical States Residual Learning for Neural Operators in Data-Limited PDE Solving : Abstract: The limited availability of high-quality training data poses a major obstacle in data-driven PDE solving, where expensive data collection and resolution constraints severely impact the abili...
- Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling : Abstract: Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems. While deep learning has shown state-of-the-art AD performanc...
- RWKV-edge: Deeply Compressed RWKV for Resource-Constrained Devices : Abstract: To deploy LLMs on resource-contained platforms such as mobile robots and smartphones, non-transformers LLMs have achieved major breakthroughs. Recently, a novel RNN-based LLM family, Repenta...
- Geometry matters: insights from Ollivier Ricci Curvature and Ricci Flow into representational alignment through Ollivier-Ricci Curvature and Ricci Flow : Abstract: Representational similarity analysis (RSA) is widely used to analyze the alignment between humans and neural networks; however, conclusions based on this approach can be misleading without c...
- Physics-Informed Latent Neural Operator for Real-time Predictions of time-dependent parametric PDEs : Abstract: Deep operator network (DeepONet) has shown significant promise as surrogate models for systems governed by partial differential equations (PDEs), enabling accurate mappings between infinite-...
- Selecting Critical Scenarios of DER Adoption in Distribution Grids Using Bayesian Optimization : Abstract: We develop a new methodology to select scenarios of DER adoption most critical for distribution grids. Anticipating risks of future voltage and line flow violations due to additional PV adop...
- Learning Provably Improves the Convergence of Gradient Descent : Abstract: Learn to Optimize (L2O) trains deep neural network-based solvers for optimization, achieving success in accelerating convex problems and improving non-convex solutions. However, L2O lacks ri...
- GST-UNet: A Neural Framework for Spatiotemporal Causal Inference with Time-Varying Confounding : Abstract: Estimating causal effects from spatiotemporal observational data is essential in public health, environmental science, and policy evaluation, where randomized experiments are often infeasibl...
- Learning to Coordinate with Experts : Abstract: When deployed in the real world, AI agents will inevitably face challenges that exceed their individual capabilities. Leveraging assistance from experts, whether humans or highly capable AI ...
- $\beta$-DQN: Improving Deep Q-Learning By Evolving the Behavior : Abstract: While many sophisticated exploration methods have been proposed, their lack of generality and high computational cost often lead researchers to favor simpler methods like $\epsilon$-greedy. ...
- A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning : Abstract: Multi-source transfer learning provides an effective solution to data scarcity in real-world supervised learning scenarios by leveraging multiple source tasks. In this field, existing works ...
- ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources : Abstract: Multimodal deep learning systems are deployed in dynamic scenarios due to the robustness afforded by multiple sensing modalities. Nevertheless, they struggle with varying compute resource av...
- FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching : Abstract: We introduce FragFM, a novel hierarchical framework via fragment-level discrete flow matching for efficient molecular graph generation. FragFM generates molecules at the fragment level, leve...
- Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm : Abstract: IIn this paper we propose and investigate a new class of Generalized Exponentiated Gradient (GEG) algorithms using Mirror Descent (MD) updates, and applying the Bregman divergence with a two...
- Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality : Abstract: Benchmarks are essential for unified evaluation and reproducibility. The rapid rise of Artificial Intelligence for Software Engineering (AI4SE) has produced numerous benchmarks for tasks suc...
- Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms : Abstract: This paper introduces a broad class of Mirror Descent (MD) and Generalized Exponentiated Gradient (GEG) algorithms derived from trace-form entropies defined via deformed logarithms. Leveragi...
- Multimodal 3D Genome Pre-training : Abstract: Deep learning techniques have driven significant progress in various analytical tasks within 3D genomics in computational biology. However, a holistic understanding of 3D genomics knowledge ...
- Offline Learning and Forgetting for Reasoning with Large Language Models : Abstract: Leveraging inference-time search in large language models has proven effective in further enhancing a trained model's capability to solve complex mathematical and reasoning problems. However...
- BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text : Abstract: Large language models (LLMs) hold great promise for medical applications and are evolving rapidly, with new models being released at an accelerated pace. However, benchmarking on large-scale...
- The Logical Expressiveness of Temporal GNNs via Two-Dimensional Product Logics : Abstract: In recent years, the expressive power of various neural architectures -- including graph neural networks (GNNs), transformers, and recurrent neural networks -- has been characterised using t...
- A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation : Abstract: Aiming to generalize the well-trained gaze estimation model to new target domains, Cross-domain Gaze Estimation (CDGE) is developed for real-world application scenarios. Existing CDGE method...
- Do Language Models Use Their Depth Efficiently? : Abstract: Modern LLMs are increasingly deep, and depth correlates with performance, albeit with diminishing returns. However, do these models use their depth efficiently? Do they compose more features...
- STree: Speculative Tree Decoding for Hybrid State-Space Models : Abstract: Speculative decoding is a technique to leverage hardware concurrency in order to enable multiple steps of token generation in a single forward pass, thus improving the efficiency of large-sc...
- MixAT: Combining Continuous and Discrete Adversarial Training for LLMs : Abstract: Despite recent efforts in Large Language Model (LLM) safety and alignment, current adversarial attacks on frontier LLMs can still consistently force harmful generations. Although adversarial...
- OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions : Abstract: In this paper, we introduce Online Multimodal Conversational Response Generation (OMCRG), a novel task designed to produce synchronized verbal and non-verbal listener feedback online, based ...
- FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design : Abstract: Designing analog circuits from performance specifications is a complex, multi-stage process encompassing topology selection, parameter inference, and layout feasibility. We introduce FALCON,...
- PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings : Abstract: Visual persuasion, which uses visual elements to influence cognition and behaviors, is crucial in fields such as advertising and political communication. With recent advancements in artifici...
- REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving : Abstract: While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovatio...
- Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies : Abstract: The art and science of Quranic recitation (Tajweed), a discipline governed by meticulous phonetic, rhythmic, and theological principles, confronts substantial educational challenges in today...
- NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models : Abstract: Characterizing the cellular properties of neurons is fundamental to understanding their function in the brain. In this quest, the generation of bio-realistic models is central towards integr...
- Evaluating AI-Powered Learning Assistants in Engineering Higher Education: Student Engagement, Ethical Challenges, and Policy Implications : Abstract: As generative AI becomes increasingly integrated into higher education, understanding how students engage with these technologies is essential for responsible adoption. This study evaluates ...
- BNMusic: Blending Environmental Noises into Personalized Music : Abstract: While being disturbed by environmental noises, the acoustic masking technique is a conventional way to reduce the annoyance in audio engineering that seeks to cover up the noises with other ...
- LittleBit: Ultra Low-Bit Quantization via Latent Factorization : Abstract: Deploying large language models (LLMs) often faces challenges from substantial memory and computational costs. Quantization offers a solution, yet performance degradation in the sub-1-bit re...
- Thermometry of simulated Bose--Einstein condensates using machine learning : Abstract: Precise determination of thermodynamic parameters in ultracold Bose gases remains challenging due to the destructive nature of conventional measurement techniques and inherent experimental u...
- GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning : Abstract: Medical visual question answering aims to support clinical decision-making by enabling models to answer natural language questions based on medical images. While recent advances in multi-mod...
- FoGE: Fock Space inspired encoding for graph prompting : Abstract: Recent results show that modern Large Language Models (LLM) are indeed capable of understanding and answering questions about structured data such as graphs. This new paradigm can lead to so...
- Adversarially-Aware Architecture Design for Robust Medical AI Systems : Abstract: Adversarial attacks pose a severe risk to AI systems used in healthcare, capable of misleading models into dangerous misclassifications that can delay treatments or cause misdiagnoses. These...
- DiNo and RanBu: Lightweight Predictions from Shallow Random Forests : Abstract: Random Forest ensembles are a strong baseline for tabular prediction tasks, but their reliance on hundreds of deep trees often results in high inference latency and memory demands, limiting ...
- A machine learning framework integrating seed traits and plasma parameters for predicting germination uplift in crops : Abstract: Cold plasma (CP) is an eco-friendly method to enhance seed germination, yet outcomes remain difficult to predict due to complex seed--plasma--environment interactions. This study introduces ...
- Quantum Machine Learning for Image Classification: A Hybrid Model of Residual Network with Quantum Support Vector Machine : Abstract: Recently, there has been growing attention on combining quantum machine learning (QML) with classical deep learning approaches, as computational techniques are key to improving the performan...
- AI-Driven Carbon Monitoring: Transformer-Based Reconstruction of Atmospheric CO2 in Canadian Poultry Regions : Abstract: Accurate mapping of column-averaged CO2 (XCO2) over agricultural landscapes is essential for guiding emission mitigation strategies. We present a Spatiotemporal Vision Transformer with Wavel...
- DBLoss: Decomposition-based Loss Function for Time Series Forecasting : Abstract: Time series forecasting holds significant value in various domains such as economics, traffic, energy, and AIOps, as accurate predictions facilitate informed decision-making. However, the ex...
- Informed Initialization for Bayesian Optimization and Active Learning : Abstract: Bayesian Optimization is a widely used method for optimizing expensive black-box functions, relying on probabilistic surrogate models such as Gaussian Processes. The quality of the surrogate...
- MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection : Abstract: Sarcasm is a specific type of irony which involves discerning what is said from what is meant. Detecting sarcasm depends not only on the literal content of an utterance but also on non-verba...
- Relaxed Sequence Sampling for Diverse Protein Design : Abstract: Protein design using structure prediction models such as AlphaFold2 has shown remarkable success, but existing approaches like relaxed sequence optimization (RSO) rely on single-path gradien...
- Revealing the Potential of Learnable Perturbation Ensemble Forecast Model for Tropical Cyclone Prediction : Abstract: Tropical cyclones (TCs) are highly destructive and inherently uncertain weather systems. Ensemble forecasting helps quantify these uncertainties, yet traditional systems are constrained by h...
- Learning Interpretable Features in Audio Latent Spaces via Sparse Autoencoders : Abstract: While sparse autoencoders (SAEs) successfully extract interpretable features from language models, applying them to audio generation faces unique challenges: audio's dense nature requires co...
- How do simple rotations affect the implicit bias of Adam? : Abstract: Adaptive gradient methods such as Adam and Adagrad are widely used in machine learning, yet their effect on the generalization of learned models -- relative to methods like gradient descent ...
- A Physics-informed Multi-resolution Neural Operator : Abstract: The predictive accuracy of operator learning frameworks depends on the quality and quantity of available training data (input-output function pairs), often requiring substantial amounts of h...
- Combining SHAP and Causal Analysis for Interpretable Fault Detection in Industrial Processes : Abstract: Industrial processes generate complex data that challenge fault detection systems, often yielding opaque or underwhelming results despite advanced machine learning techniques. This study tac...
- ScaLoRA: Optimally Scaled Low-Rank Adaptation for Efficient High-Rank Fine-Tuning : Abstract: As large language models (LLMs) continue to scale in size, the computational overhead has become a major bottleneck for task-specific fine-tuning. While low-rank adaptation (LoRA) effectivel...
- GIFT: Group-relative Implicit Fine Tuning Integrates GRPO with DPO and UNA : Abstract: I propose \textbf{G}roup-relative \textbf{I}mplicit \textbf{F}ine \textbf{T}uning (GIFT), a novel reinforcement learning framework for aligning LLMs. Instead of directly maximizing cumulativ...
- Artificial Intelligence Based Predictive Maintenance for Electric Buses : Abstract: Predictive maintenance (PdM) is crucial for optimizing efficiency and minimizing downtime of electric buses. While these vehicles provide environmental benefits, they pose challenges for PdM...
- Geometry-Inspired Unified Framework for Discounted and Average Reward MDPs : Abstract: The theoretical analysis of Markov Decision Processes (MDPs) is commonly split into two cases - the average-reward case and the discounted-reward case - which, while sharing similarities, ar...
- Improving the Straight-Through Estimator with Zeroth-Order Information : Abstract: We study the problem of training neural networks with quantized parameters. Learning low-precision quantized parameters by enabling computation of gradients via the Straight-Through Estimato...
- Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments : Abstract: Federated Learning (FL) allows for the training of Machine Learning models in a collaborative manner without the need to share sensitive data. However, it remains vulnerable to Gradient Leak...
- A data free neural operator enabling fast inference of 2D and 3D Navier Stokes equations : Abstract: Ensemble simulations of high-dimensional flow models (e.g., Navier Stokes type PDEs) are computationally prohibitive for real time applications. Neural operators enable fast inference but ar...
- A Pragmatic Way to Measure Chain-of-Thought Monitorability : Abstract: While Chain-of-Thought (CoT) monitoring offers a unique opportunity for AI safety, this opportunity could be lost through shifts in training practices or model architecture. To help preserve...
- Synergistic Neural Forecasting of Air Pollution with Stochastic Sampling : Abstract: Air pollution remains a leading global health and environmental risk, particularly in regions vulnerable to episodic air pollution spikes due to wildfires, urban haze and dust storms. Accura...
- Optimal Arm Elimination Algorithms for Combinatorial Bandits : Abstract: Combinatorial bandits extend the classical bandit framework to settings where the learner selects multiple arms in each round, motivated by applications such as online recommendation and ass...
- Predicting Barge Tow Size on Inland Waterways Using Vessel Trajectory Derived Features: Proof of Concept : Abstract: Accurate, real-time estimation of barge quantity on inland waterways remains a critical challenge due to the non-self-propelled nature of barges and the limitations of existing monitoring sy...
- Efficient Global-Local Fusion Sampling for Physics-Informed Neural Networks : Abstract: The accuracy of Physics-Informed Neural Networks (PINNs) critically depends on the placement of collocation points, as the PDE loss is approximated through sampling over the solution domain....
- GraphNet: A Large-Scale Computational Graph Dataset for Tensor Compiler Research : Abstract: We introduce GraphNet, a dataset of 2.7K real-world deep learning computational graphs with rich metadata, spanning six major task categories across multiple deep learning frameworks. To eva...
- Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection : Abstract: This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a f...
- Mitigating Negative Transfer via Reducing Environmental Disagreement : Abstract: Unsupervised Domain Adaptation~(UDA) focuses on transferring knowledge from a labeled source domain to an unlabeled target domain, addressing the challenge of \emph{domain shift}. Significan...
- Low-N Protein Activity Optimization with FolDE : Abstract: Proteins are traditionally optimized through the costly construction and measurement of many mutants. Active Learning-assisted Directed Evolution (ALDE) alleviates that cost by predicting th...
- Information-Theoretic Discrete Diffusion : Abstract: We present an information-theoretic framework for discrete diffusion models that yields principled estimators of log-likelihood using score-matching losses. Inspired by the I-MMSE identity f...
- Graph-Guided Concept Selection for Efficient Retrieval-Augmented Generation : Abstract: Graph-based RAG constructs a knowledge graph (KG) from text chunks to enhance retrieval in Large Language Model (LLM)-based question answering. It is especially beneficial in domains such as...
- Causal Convolutional Neural Networks as Finite Impulse Response Filters : Abstract: This study investigates the behavior of Causal Convolutional Neural Networks (CNNs) with quasi-linear activation functions when applied to time-series data characterized by multimodal freque...
- Fixed Point Neural Acceleration and Inverse Surrogate Model for Battery Parameter Identification : Abstract: The rapid expansion of electric vehicles has intensified the need for accurate and efficient diagnosis of lithium-ion batteries. Parameter identification of electrochemical battery models is...
- Identifiable learning of dissipative dynamics : Abstract: Complex dissipative systems appear across science and engineering, from polymers and active matter to learning algorithms. These systems operate far from equilibrium, where energy dissipatio...
- EddyFormer: Accelerated Neural Simulations of Three-Dimensional Turbulence at Scale : Abstract: Computationally resolving turbulence remains a central challenge in fluid dynamics due to its multi-scale interactions. Fully resolving large-scale turbulence through direct numerical simula...
- V-SAT: Video Subtitle Annotation Tool : Abstract: The surge of audiovisual content on streaming platforms and social media has heightened the demand for accurate and accessible subtitles. However, existing subtitle generation methods primar...
- SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning : Abstract: Federated Learning has seen an increased deployment in real-world scenarios recently, as it enables the distributed training of machine learning models without explicit data sharing between ...
- Unlocking Out-of-Distribution Generalization in Dynamics through Physics-Guided Augmentation : Abstract: In dynamical system modeling, traditional numerical methods are limited by high computational costs, while modern data-driven approaches struggle with data scarcity and distribution shifts. ...
- PRIVET: Privacy Metric Based on Extreme Value Theory : Abstract: Deep generative models are often trained on sensitive data, such as genetic sequences, health data, or more broadly, any copyrighted, licensed or protected content. This raises critical conc...
- Sparse Optimistic Information Directed Sampling : Abstract: Many high-dimensional online decision-making problems can be modeled as stochastic sparse linear bandits. Most existing algorithms are designed to achieve optimal worst-case regret in either...
- Temporal Knowledge Graph Hyperedge Forecasting: Exploring Entity-to-Category Link Prediction : Abstract: Temporal Knowledge Graphs have emerged as a powerful way of not only modeling static relationships between entities but also the dynamics of how relations evolve over time. As these informat...
- SALS: Sparse Attention in Latent Space for KV cache Compression : Abstract: Large Language Models capable of handling extended contexts are in high demand, yet their inference remains challenging due to substantial Key-Value cache size and high memory bandwidth requ...
- EDC: Equation Discovery for Classification : Abstract: Equation Discovery techniques have shown considerable success in regression tasks, where they are used to discover concise and interpretable models (\textit{Symbolic Regression}). In this pa...
- What do vision-language models see in the context? Investigating multimodal in-context learning : Abstract: In-context learning (ICL) enables Large Language Models (LLMs) to learn tasks from demonstration examples without parameter updates. Although it has been extensively studied in LLMs, its eff...
- Filtering instances and rejecting predictions to obtain reliable models in healthcare : Abstract: Machine Learning (ML) models are widely used in high-stakes domains such as healthcare, where the reliability of predictions is critical. However, these models often fail to account for unce...
- A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport : Abstract: Synthetic data offers a promising solution to the privacy and accessibility challenges of using smart card data in public transport research. Despite rapid progress in generative modeling, t...
- APEX: Approximate-but-exhaustive search for ultra-large combinatorial synthesis libraries : Abstract: Make-on-demand combinatorial synthesis libraries (CSLs) like Enamine REAL have significantly enabled drug discovery efforts. However, their large size presents a challenge for virtual screen...
- Fill in the Blanks: Accelerating Q-Learning with a Handful of Demonstrations in Sparse Reward Settings : Abstract: Reinforcement learning (RL) in sparse-reward environments remains a significant challenge due to the lack of informative feedback. We propose a simple yet effective method that uses a small ...
- Methodology for Comparing Machine Learning Algorithms for Survival Analysis : Abstract: This study presents a comparative methodological analysis of six machine learning models for survival analysis (MLSA). Using data from nearly 45,000 colorectal cancer patients in the Hospita...
- MIMIC-Sepsis: A Curated Benchmark for Modeling and Learning from Sepsis Trajectories in the ICU : Abstract: Sepsis is a leading cause of mortality in intensive care units (ICUs), yet existing research often relies on outdated datasets, non-reproducible preprocessing pipelines, and limited coverage...
- Physics-Informed Extreme Learning Machine (PIELM): Opportunities and Challenges : Abstract: We are very delighted to see the fast development of physics-informed extreme learning machine (PIELM) in recent years for higher computation efficiency and accuracy in physics-informed mach...
- A Novel XAI-Enhanced Quantum Adversarial Networks for Velocity Dispersion Modeling in MaNGA Galaxies : Abstract: Current quantum machine learning approaches often face challenges balancing predictive accuracy, robustness, and interpretability. To address this, we propose a novel quantum adversarial fra...
- Semi-supervised and unsupervised learning for health indicator extraction from guided waves in aerospace composite structures : Abstract: Health indicators (HIs) are central to diagnosing and prognosing the condition of aerospace composite structures, enabling efficient maintenance and operational safety. However, extracting r...
- Symbolic Snapshot Ensembles : Abstract: Inductive logic programming (ILP) is a form of logical machine learning. Most ILP algorithms learn a single hypothesis from a single training run. Ensemble methods train an ILP algorithm mul...
- Pearl: A Foundation Model for Placing Every Atom in the Right Location : Abstract: Accurately predicting the three-dimensional structures of protein-ligand complexes remains a fundamental challenge in computational drug discovery that limits the pace and success of therape...
- Eigenfunction Extraction for Ordered Representation Learning : Abstract: Recent advances in representation learning reveal that widely used objectives, such as contrastive and non-contrastive, implicitly perform spectral decomposition of a contextual kernel, indu...
- Energy Efficient Exact and Approximate Systolic Array Architecture for Matrix Multiplication : Abstract: Deep Neural Networks (DNNs) require highly efficient matrix multiplication engines for complex computations. This paper presents a systolic array architecture incorporating novel exact and a...
- JiuTian Chuanliu: A Large Spatiotemporal Model for General-purpose Dynamic Urban Sensing : Abstract: As a window for urban sensing, human mobility contains rich spatiotemporal information that reflects both residents' behavior preferences and the functions of urban areas. The analysis of hu...
- Beyond Normality: Reliable A/B Testing with Non-Gaussian Data : Abstract: A/B testing has become the cornerstone of decision-making in online markets, guiding how platforms launch new features, optimize pricing strategies, and improve user experience. In practice,...
- VIKING: Deep variational inference with stochastic projections : Abstract: Variational mean field approximations tend to struggle with contemporary overparametrized deep neural networks. Where a Bayesian treatment is usually associated with high-quality predictions...
- In Search of the Unknown Unknowns: A Multi-Metric Distance Ensemble for Out of Distribution Anomaly Detection in Astronomical Surveys : Abstract: Distance-based methods involve the computation of distance values between features and are a well-established paradigm in machine learning. In anomaly detection, anomalies are identified by ...
- Bayesian neural networks with interpretable priors from Mercer kernels : Abstract: Quantifying the uncertainty in the output of a neural network is essential for deployment in scientific or engineering applications where decisions must be made under limited or noisy data. ...
- Re-envisioning Euclid Galaxy Morphology: Identifying and Interpreting Features with Sparse Autoencoders : Abstract: Sparse Autoencoders (SAEs) can efficiently identify candidate monosemantic features from pretrained neural networks for galaxy morphology. We demonstrate this on Euclid Q1 images using both ...
- Testing-driven Variable Selection in Bayesian Modal Regression : Abstract: We propose a Bayesian variable selection method in the framework of modal regression for heavy-tailed responses. An efficient expectation-maximization algorithm is employed to expedite param...
- Zero-Shot Cross-Lingual Transfer using Prefix-Based Adaptation : Abstract: With the release of new large language models (LLMs) like Llama and Mistral, zero-shot cross-lingual transfer has become increasingly feasible due to their multilingual pretraining and stron...
- All in one timestep: Enhancing Sparsity and Energy efficiency in Multi-level Spiking Neural Networks : Abstract: Spiking Neural Networks (SNNs) are one of the most promising bio-inspired neural networks models and have drawn increasing attention in recent years. The event-driven communication mechanism...
- Causal Ordering for Structure Learning From Time Series : Abstract: Predicting causal structure from time series data is crucial for understanding complex phenomena in physiology, brain connectivity, climate dynamics, and socio-economic behaviour. Causal dis...
- The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets : Abstract: We study the parameter complexity of robust memorization for $\mathrm{ReLU}$ networks: the number of parameters required to interpolate any given dataset with $\epsilon$-separation between d...
- InteractComp: Evaluating Search Agents With Ambiguous Queries : Abstract: Language agents have demonstrated remarkable potential in web search and information retrieval. However, these search agents assume user queries are complete and unambiguous, an assumption t...
- Multi-Agent Scenario Generation in Roundabouts with a Transformer-enhanced Conditional Variational Autoencoder : Abstract: With the increasing integration of intelligent driving functions into serial-produced vehicles, ensuring their functionality and robustness poses greater challenges. Compared to traditional ...
- Learning to Drive Safely with Hybrid Options : Abstract: Out of the many deep reinforcement learning approaches for autonomous driving, only few make use of the options (or skills) framework. That is surprising, as this framework is naturally suit...
- Dissecting Role Cognition in Medical LLMs via Neuronal Ablation : Abstract: Large language models (LLMs) have gained significant traction in medical decision support systems, particularly in the context of medical question answering and role-playing simulations. A...
- Fast algorithms enabling optimization and deep learning for photoacoustic tomography in a circular detection geometry : Abstract: The inverse source problem arising in photoacoustic tomography and in several other coupled-physics modalities is frequently solved by iterative algorithms. Such algorithms are based on the ...
- Repurposing Synthetic Data for Fine-grained Search Agent Supervision : Abstract: LLM-based search agents are increasingly trained on entity-centric synthetic data to solve complex, knowledge-intensive tasks. However, prevailing training methods like Group Relative Policy...
- ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking : Abstract: Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional para...
- AgentFold: Long-Horizon Web Agents with Proactive Context Management : Abstract: LLM-based web agents show immense promise for information seeking, yet their effectiveness on long-horizon tasks is hindered by a fundamental trade-off in context management. Prevailing ReAc...
- Greedy Sampling Is Provably Efficient for RLHF : Abstract: Reinforcement Learning from Human Feedback (RLHF) has emerged as a key technique for post-training large language models. Despite its empirical success, the theoretical understanding of RLHF...
- Tongyi DeepResearch Technical Report : Abstract: We present Tongyi DeepResearch, an agentic large language model, which is specifically designed for long-horizon, deep information-seeking research tasks. To incentivize autonomous deep rese...
- Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents : Abstract: Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we arg...
- ComboBench: Can LLMs Manipulate Physical Devices to Play Virtual Reality Games? : Abstract: Virtual Reality (VR) games require players to translate high-level semantic actions into precise device manipulations using controllers and head-mounted displays (HMDs). While humans intuiti...
- Does Object Binding Naturally Emerge in Large Pretrained Vision Transformers? : Abstract: Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual featu...
- Mining Large Independent Sets on Massive Graphs : Abstract: The Maximum Independent Set problem is fundamental for extracting conflict-free structure from large graphs, with applications in scheduling, recommendation, and network analysis. However, e...
- A Survey on Large Language Model-Based Game Agents : Abstract: Game environments provide rich, controllable settings that stimulate many aspects of real-world complexity. As such, game agents offer a valuable testbed for exploring capabilities relevant ...
- 3D-Prover: Diversity Driven Theorem Proving With Determinantal Point Processes : Abstract: A key challenge in automated formal reasoning is the intractable search space, which grows exponentially with the depth of the proof. This branching is caused by the large number of candidat...
- TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models : Abstract: Large language models (LLMs) have demonstrated their effectiveness in multivariate time series classification (MTSC). Effective adaptation of LLMs for MTSC necessitates informative data repr...
- Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning : Abstract: Humans leverage rich internal models of the world to reason about the future, imagine counterfactuals, and adapt flexibly to new situations. In Reinforcement Learning (RL), world models aim ...
- Partner Modelling Emerges in Recurrent Agents (But Only When It Matters) : Abstract: Humans are remarkably adept at collaboration, able to infer the strengths and weaknesses of new partners in order to work successfully towards shared goals. To build AI systems with this cap...
- VIRAL: Vision-grounded Integration for Reward design And Learning : Abstract: The alignment between humans and machines is a critical challenge in artificial intelligence today. Reinforcement learning, which aims to maximize a reward function, is particularly vulnerab...
- The Confidence Paradox: Can LLM Know When It's Wrong : Abstract: Document Visual Question Answering (DocVQA) models often produce overconfident or ethically misaligned responses, especially under uncertainty. Existing models like LayoutLMv3, UDOP, and DON...
- Memory Mosaics at scale : Abstract: Memory Mosaics [Zhang et al., 2025], networks of associative memories, have demonstrated appealing compositional and in-context learning capabilities on medium-scale networks (GPT-2 scale) a...
- Freeze and Conquer: Reusable Ansatz for Solving the Traveling Salesman Problem : Abstract: In this paper we present a variational algorithm for the Traveling Salesman Problem (TSP) that combines (i) a compact encoding of permutations, which reduces the qubit requirement too, (ii) ...
- Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments : Abstract: We investigate practical algorithms for inconsistency-tolerant query answering over prioritized knowledge bases, which consist of a logical theory, a set of facts, and a priority relation be...
- Retrieval-Augmented Generation-based Relation Extraction : Abstract: Information Extraction (IE) is a transformative process that converts unstructured text data into a structured format by employing entity and relation extraction (RE) methodologies. The iden...
- Navigation with VLM framework: Towards Going to Any Language : Abstract: Navigating towards fully open language goals and exploring open scenes in an intelligent way have always raised significant challenges. Recently, Vision Language Models (VLMs) have demonstra...
- GRS: Generating Robotic Simulation Tasks from Real-World Images : Abstract: We introduce GRS (Generating Robotic Simulation tasks), a system addressing real-to-sim for robotic simulations. GRS creates digital twin simulations from single RGB-D observations with solv...
- One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models : Abstract: For large language models (LLMs), sparse autoencoders (SAEs) have been shown to decompose intermediate representations that often are not interpretable directly into sparse sums of interpret...
- Learned, Lagged, LLM-splained: LLM Responses to End User Security Questions : Abstract: Answering end user security questions is challenging. While large language models (LLMs) like GPT, LLAMA, and Gemini are far from error-free, they have shown promise in answering a variety o...
- Provable Scaling Laws for the Test-Time Compute of Large Language Models : Abstract: We propose two simple, principled and practical algorithms that enjoy provable scaling laws for the test-time compute of large language models (LLMs). The first one is a two-stage knockout-s...
- Neural USD: An object-centric framework for iterative editing and control : Abstract: Amazing progress has been made in controllable generative modeling, especially over the last few years. However, some challenges remain. One of them is precise and iterative object editing. ...
- SafeVision: Efficient Image Guardrail with Robust Policy Adherence and Explainability : Abstract: With the rapid proliferation of digital media, the need for efficient and transparent safeguards against unsafe content is more critical than ever. Traditional image guardrail models, constr...
- An efficient probabilistic hardware architecture for diffusion-like models : Abstract: The proliferation of probabilistic AI has promoted proposals for specialized stochastic computers. Despite promising efficiency gains, these proposals have failed to gain traction because th...
- Diffusion Adaptive Text Embedding for Text-to-Image Diffusion Models : Abstract: Text-to-image diffusion models rely on text embeddings from a pre-trained text encoder, but these embeddings remain fixed across all diffusion timesteps, limiting their adaptability to the g...
- HyperGraphX: Graph Transductive Learning with Hyperdimensional Computing and Message Passing : Abstract: We present a novel algorithm, \hdgc, that marries graph convolution with binding and bundling operations in hyperdimensional computing for transductive graph learning. For prediction accurac...
- STNet: Spectral Transformation Network for Solving Operator Eigenvalue Problem : Abstract: Operator eigenvalue problems play a critical role in various scientific fields and engineering applications, yet numerical methods are hindered by the curse of dimensionality. Recent deep le...
- Mars-Bench: A Benchmark for Evaluating Foundation Models for Mars Science Tasks : Abstract: Foundation models have enabled rapid progress across many specialized domains by leveraging large-scale pre-training on unlabeled data, demonstrating strong generalization to a variety of do...
- Training-Free Safe Text Embedding Guidance for Text-to-Image Diffusion Models : Abstract: Text-to-image models have recently made significant advances in generating realistic and semantically coherent images, driven by advanced diffusion models and large-scale web-crawled dataset...
- Lifecycle-Aware code generation: Leveraging Software Engineering Phases in LLMs : Abstract: Recent progress in large language models (LLMs) has advanced automatic code generation, yet most approaches rely on direct, single-step translation from problem descriptions to code, disrega...
- Teaching LLMs to Abstain via Fine-Grained Semantic Confidence Reward : Abstract: Mitigating hallucinations in Large Language Models (LLMs) is critical for their reliable deployment. Existing methods typically fine-tune LLMs to abstain from answering questions beyond thei...
- SpecKD: Speculative Decoding for Effective Knowledge Distillation of LLMs : Abstract: Knowledge Distillation (KD) has become a cornerstone technique for compressing Large Language Models (LLMs) into smaller, more efficient student models. However, conventional KD approaches t...
- NeuroPathNet: Dynamic Path Trajectory Learning for Brain Functional Connectivity Analysis : Abstract: Understanding the evolution of brain functional networks over time is of great significance for the analysis of cognitive mechanisms and the diagnosis of neurological diseases. Existing meth...
- Spatio-temporal Multivariate Time Series Forecast with Chosen Variables : Abstract: Spatio-Temporal Multivariate time series Forecast (STMF) uses the time series of $n$ spatially distributed variables in a period of recent past to forecast their values in a period of near f...
- Improved Accuracy of Robot Localization Using 3-D LiDAR in a Hippocampus-Inspired Model : Abstract: Boundary Vector Cells (BVCs) are a class of neurons in the brains of vertebrates that encode environmental boundaries at specific distances and allocentric directions, playing a central role...
- ResNet: Enabling Deep Convolutional Neural Networks through Residual Learning : Abstract: Convolutional Neural Networks (CNNs) has revolutionized computer vision, but training very deep networks has been challenging due to the vanishing gradient problem. This paper explores Resid...
- Geometric Algorithms for Neural Combinatorial Optimization with Constraints : Abstract: Self-Supervised Learning (SSL) for Combinatorial Optimization (CO) is an emerging paradigm for solving combinatorial problems using neural networks. In this paper, we address a central chall...
- Causal-Aware Generative Adversarial Networks with Reinforcement Learning : Abstract: The utility of tabular data for tasks ranging from model training to large-scale data analysis is often constrained by privacy concerns or regulatory hurdles. While existing data generation ...
- Learning from History: A Retrieval-Augmented Framework for Spatiotemporal Prediction : Abstract: Accurate and long-term spatiotemporal prediction for complex physical systems remains a fundamental challenge in scientific computing. While deep learning models, as powerful parametric appr...
- SynAD: Enhancing Real-World End-to-End Autonomous Driving Models through Synthetic Data Integration : Abstract: Recent advancements in deep learning and the availability of high-quality real-world driving datasets have propelled end-to-end autonomous driving. Despite this progress, relying solely on r...
- PULSE: Privileged Knowledge Transfer from Electrodermal Activity to Low-Cost Sensors for Stress Monitoring : Abstract: Electrodermal activity (EDA), the primary signal for stress detection, requires costly hardware often unavailable in real-world wearables. In this paper, we propose PULSE, a framework that u...
- FALQON: Accelerating LoRA Fine-tuning with Low-Bit Floating-Point Arithmetic : Abstract: Low-bit floating-point (FP) formats, such as FP8, provide significant acceleration and memory savings in model training thanks to native hardware support on modern GPUs and NPUs. However, we...
- Covert Surveillance in Smart Devices: A SCOUR Framework Analysis of Youth Privacy Implications : Abstract: This paper investigates how smart devices covertly capture private conversations and discusses in more in-depth the implications of this for youth privacy. Using a structured review guided b...
- Learning Parameterized Skills from Demonstrations : Abstract: We present DEPS, an end-to-end algorithm for discovering parameterized skills from expert demonstrations. Our method learns parameterized skill policies jointly with a meta-policy that selec...
- Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation : Abstract: We present MGAudio, a novel flow-based framework for open-domain video-to-audio generation, which introduces model-guided dual-role alignment as a central design principle. Unlike prior appr...
- Taming the Tail: NoI Topology Synthesis for Mixed DL Workloads on Chiplet-Based Accelerators : Abstract: Heterogeneous chiplet-based systems improve scaling by disag-gregating CPUs/GPUs and emerging technologies (HBM/DRAM).However this on-package disaggregation introduces a latency inNetwork-on...
- LagMemo: Language 3D Gaussian Splatting Memory for Multi-modal Open-vocabulary Multi-goal Visual Navigation : Abstract: Navigating to a designated goal using visual information is a fundamental capability for intelligent robots. Most classical visual navigation methods are restricted to single-goal, single-mo...
- Compositional Image Synthesis with Inference-Time Scaling : Abstract: Despite their impressive realism, modern text-to-image models still struggle with compositionality, often failing to render accurate object counts, attributes, and spatial relations. To addr...
- VC4VG: Optimizing Video Captions for Text-to-Video Generation : Abstract: Recent advances in text-to-video (T2V) generation highlight the critical role of high-quality video-text pairs in training models capable of producing coherent and instruction-aligned videos...
- Beyond Line-Level Filtering for the Pretraining Corpora of LLMs : Abstract: While traditional line-level filtering techniques, such as line-level deduplication and trailing-punctuation filters, are commonly used, these basic methods can sometimes discard valuable co...
- Ko-MuSR: A Multistep Soft Reasoning Benchmark for LLMs Capable of Understanding Korean : Abstract: We present Ko-MuSR, the first benchmark to comprehensively evaluate multistep, soft reasoning in long Korean narratives while minimizing data contamination. Built following MuSR, Ko-MuSR fea...
- Enhancing Vision-Language Models for Autonomous Driving through Task-Specific Prompting and Spatial Reasoning : Abstract: This technical report presents our solution for the RoboSense Challenge at IROS 2025, which evaluates Vision-Language Models (VLMs) on autonomous driving scene understanding across perceptio...
- Self-supervised Synthetic Pretraining for Inference of Stellar Mass Embedded in Dense Gas : Abstract: Stellar mass is a fundamental quantity that determines the properties and evolution of stars. However, estimating stellar masses in star-forming regions is challenging because young stars ar...
- SymMaP: Improving Computational Efficiency in Linear Solvers through Symbolic Preconditioning : Abstract: Matrix preconditioning is a critical technique to accelerate the solution of linear systems, where performance heavily depends on the selection of preconditioning parameters. Traditional par...
- MuSaG: A Multimodal German Sarcasm Dataset with Full-Modal Annotations : Abstract: Sarcasm is a complex form of figurative language in which the intended meaning contradicts the literal one. Its prevalence in social media and popular culture poses persistent challenges for...
- Closing Gaps: An Imputation Analysis of ICU Vital Signs : Abstract: As more Intensive Care Unit (ICU) data becomes available, the interest in developing clinical prediction models to improve healthcare protocols increases. However, the lack of data quality s...
- PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling : Abstract: Reward models (RMs) are central to reinforcement learning from human feedback (RLHF), providing the critical supervision signals that align large language models (LLMs) with human preference...
- MAGNET: A Multi-Graph Attentional Network for Code Clone Detection : Abstract: Code clone detection is a fundamental task in software engineering that underpins refactoring, debugging, plagiarism detection, and vulnerability analysis. Existing methods often rely on sin...
- Enabling Near-realtime Remote Sensing via Satellite-Ground Collaboration of Large Vision-Language Models : Abstract: Large vision-language models (LVLMs) have recently demonstrated great potential in remote sensing (RS) tasks (e.g., disaster monitoring) conducted by low Earth orbit (LEO) satellites. Howeve...
- Trajectory Design for UAV-Based Low-Altitude Wireless Networks in Unknown Environments: A Digital Twin-Assisted TD3 Approach : Abstract: Unmanned aerial vehicles (UAVs) are emerging as key enablers for low-altitude wireless network (LAWN), particularly when terrestrial networks are unavailable. In such scenarios, the environm...
- DynaRend: Learning 3D Dynamics via Masked Future Rendering for Robotic Manipulation : Abstract: Learning generalizable robotic manipulation policies remains a key challenge due to the scarcity of diverse real-world training data. While recent approaches have attempted to mitigate this ...
- Survey and Tutorial of Reinforcement Learning Methods in Process Systems Engineering : Abstract: Sequential decision making under uncertainty is central to many Process Systems Engineering (PSE) challenges, where traditional methods often face limitations related to controlling and opti...
- Training-free Source Attribution of AI-generated Images via Resynthesis : Abstract: Synthetic image source attribution is a challenging task, especially in data scarcity conditions requiring few-shot or zero-shot classification capabilities. We present a new training-free o...
- ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model : Abstract: The limited capacity for fine-grained visual perception presents a critical bottleneck for Vision-Language Models (VLMs) in real-world applications. Addressing this is challenging due to the...
- Transformers can do Bayesian Clustering : Abstract: Bayesian clustering accounts for uncertainty but is computationally demanding at scale. Furthermore, real-world datasets often contain missing values, and simple imputation ignores the assoc...
- Critique-RL: Training Language Models for Critiquing through Two-Stage Reinforcement Learning : Abstract: Training critiquing language models to assess and provide feedback on model outputs is a promising way to improve LLMs for complex reasoning tasks. However, existing approaches typically rel...
- Few-Shot Remote Sensing Image Scene Classification with CLIP and Prompt Learning : Abstract: Remote sensing applications increasingly rely on deep learning for scene classification. However, their performance is often constrained by the scarcity of labeled data and the high cost of ...
- Beyond MCQ: An Open-Ended Arabic Cultural QA Benchmark with Dialect Variants : Abstract: Large Language Models (LLMs) are increasingly used to answer everyday questions, yet their performance on culturally grounded and dialectal content remains uneven across languages. We propos...
- LongWeave: A Long-Form Generation Benchmark Bridging Real-World Relevance and Verifiability : Abstract: Generating long, informative, and factual outputs remains a major challenge for Large Language Models (LLMs). Existing benchmarks for long-form generation typically assess real-world queries...
- Perception Learning: A Formal Separation of Sensory Representation Learning from Decision Learning : Abstract: We introduce Perception Learning (PeL), a paradigm that optimizes an agent's sensory interface $f_\phi:\mathcal{X}\to\mathcal{Z}$ using task-agnostic signals, decoupled from downstream decis...
- Metadata-Driven Retrieval-Augmented Generation for Financial Question Answering : Abstract: Retrieval-Augmented Generation (RAG) struggles on long, structured financial filings where relevant evidence is sparse and cross-referenced. This paper presents a systematic investigation of...
- MiniOneRec: An Open-Source Framework for Scaling Generative Recommendation : Abstract: The recent success of large language models (LLMs) has renewed interest in whether recommender systems can achieve similar scaling benefits. Conventional recommenders, dominated by massive e...
- Can LLMs Write Faithfully? An Agent-Based Evaluation of LLM-generated Islamic Content : Abstract: Large language models are increasingly used for Islamic guidance, but risk misquoting texts, misapplying jurisprudence, or producing culturally inconsistent responses. We pilot an evaluation...
- Rethinking Visual Intelligence: Insights from Video Pretraining : Abstract: Large language models (LLMs) have demonstrated that large-scale pretraining enables systems to adapt rapidly to new problems with little supervision in the language domain. This success, how...
- Charting the European LLM Benchmarking Landscape: A New Taxonomy and a Set of Best Practices : Abstract: While new benchmarks for large language models (LLMs) are being developed continuously to catch up with the growing capabilities of new models and AI in general, using and evaluating LLMs in...
- Iterative Critique-Refine Framework for Enhancing LLM Personalization : Abstract: Personalized text generation requires models not only to produce coherent text but also to align with a target user's style, tone, and topical focus. Existing retrieval-augmented approaches ...
- Mitigating Hallucination in Large Language Models (LLMs): An Application-Oriented Survey on RAG, Reasoning, and Agentic Systems : Abstract: Hallucination remains one of the key obstacles to the reliable deployment of large language models (LLMs), particularly in real-world applications. Among various mitigation strategies, Retri...
- Sample-efficient and Scalable Exploration in Continuous-Time RL : Abstract: Reinforcement learning algorithms are typically designed for discrete-time dynamics, even though the underlying real-world control systems are often continuous in time. In this paper, we stu...
- A word association network methodology for evaluating implicit biases in LLMs compared to humans : Abstract: As Large language models (LLMs) become increasingly integrated into our lives, their inherent social biases remain a pressing concern. Detecting and evaluating these biases can be challengin...
- Diffusion Models for Wireless Transceivers: From Pilot-Efficient Channel Estimation to AI-Native 6G Receivers : Abstract: With the development of artificial intelligence (AI) techniques, implementing AI-based techniques to improve wireless transceivers becomes an emerging research topic. Within this context, AI...
- Online neural fusion of distortionless differential beamformers for robust speech enhancement : Abstract: Fixed beamforming is widely used in practice since it does not depend on the estimation of noise statistics and provides relatively stable performance. However, a single beamformer cannot ad...
- Design and Optimization of Cloud Native Homomorphic Encryption Workflows for Privacy-Preserving ML Inference : Abstract: As machine learning (ML) models become increasingly deployed through cloud infrastructures, the confidentiality of user data during inference poses a significant security challenge. Homomorp...
- Local Performance vs. Out-of-Distribution Generalization: An Empirical Analysis of Personalized Federated Learning in Heterogeneous Data Environments : Abstract: In the context of Federated Learning with heterogeneous data environments, local models tend to converge to their own local model optima during local training steps, deviating from the overa...
- Audio Signal Processing Using Time Domain Mel-Frequency Wavelet Coefficient : Abstract: Extracting features from the speech is the most critical process in speech signal processing. Mel Frequency Cepstral Coefficients (MFCC) are the most widely used features in the majority of ...
- Quantum-Resistant Networks Using Post-Quantum Cryptography : Abstract: Quantum networks rely on both quantum and classical channels for coordinated operation. Current architectures employ entanglement distribution and key exchange over quantum channels but ofte...
- LoRA-DA: Data-Aware Initialization for Low-Rank Adaptation via Asymptotic Analysis : Abstract: With the widespread adoption of LLMs, LoRA has become a dominant method for PEFT, and its initialization methods have attracted increasing attention. However, existing methods have notable l...
- DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment : Abstract: Training time-series forecast models requires aligning the conditional distribution of model forecasts with that of the label sequence. The standard direct forecast (DF) approach resorts to ...
- From Observability Data to Diagnosis: An Evolving Multi-agent System for Incident Management in Cloud Systems : Abstract: Incident management (IM) is central to the reliability of large-scale cloud systems. Yet manual IM, where on-call engineers examine metrics, logs, and traces is labor-intensive and error-pro...
- BMGQ: A Bottom-up Method for Generating Complex Multi-hop Reasoning Questions from Semi-structured Data : Abstract: Building training-ready multi-hop question answering (QA) datasets that truly stress a model's retrieval and reasoning abilities remains highly challenging recently. While there have been a ...
- BLM$_1$: A Boundless Large Model for Cross-Space, Cross-Task, and Cross-Embodiment Learning : Abstract: Multimodal large language models (MLLMs) have advanced vision-language reasoning and are increasingly deployed in embodied agents. However, significant limitations remain: MLLMs generalize p...
- UniPlanner: A Unified Motion Planning Framework for Autonomous Vehicle Decision-Making Systems via Multi-Dataset Integration : Abstract: Motion planning is a critical component of autonomous vehicle decision-making systems, directly determining trajectory safety and driving efficiency. While deep learning approaches have adva...
- MGA: Memory-Driven GUI Agent for Observation-Centric Interaction : Abstract: The rapid progress of Large Language Models (LLMs) and their multimodal extensions (MLLMs) has enabled agentic systems capable of perceiving and acting across diverse environments. A challen...
- MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools : Abstract: Large Language Models (LLMs) increasingly rely on external tools to perform complex, realistic tasks, yet their ability to utilize the rapidly expanding Model Contextual Protocol (MCP) ecosy...
- Investigating Intra-Abstraction Policies For Non-exact Abstraction Algorithms : Abstract: One weakness of Monte Carlo Tree Search (MCTS) is its sample efficiency which can be addressed by building and using state and/or action abstractions in parallel to the tree search such that...
- Verifying Large Language Models' Reasoning Paths via Correlation Matrix Rank : Abstract: Despite the strong reasoning ability of large language models~(LLMs), they are prone to errors and hallucinations. As a result, how to check their outputs effectively and efficiently has bec...
- Retrieval and Argumentation Enhanced Multi-Agent LLMs for Judgmental Forecasting : Abstract: Judgmental forecasting is the task of making predictions about future events based on human judgment. This task can be seen as a form of claim verification, where the claim corresponds to a ...
- Generative Large Language Models (gLLMs) in Content Analysis: A Practical Guide for Communication Research : Abstract: Generative Large Language Models (gLLMs), such as ChatGPT, are increasingly being used in communication research for content analysis. Studies show that gLLMs can outperform both crowd worke...
- VDSAgents: A PCS-Guided Multi-Agent System for Veridical Data Science Automation : Abstract: Large language models (LLMs) become increasingly integrated into data science workflows for automated system design. However, these LLM-driven data science systems rely solely on the interna...
- A Unified Geometric Space Bridging AI Models and the Human Brain : Abstract: For decades, neuroscientists and computer scientists have pursued a shared ambition: to understand intelligence and build it. Modern artificial neural networks now rival humans in language, ...
- An N-of-1 Artificial Intelligence Ecosystem for Precision Medicine : Abstract: Artificial intelligence in medicine is built to serve the average patient. By minimizing error across large datasets, most systems deliver strong aggregate accuracy yet falter at the margins...
- Policy Cards: Machine-Readable Runtime Governance for Autonomous AI Agents : Abstract: Policy Cards are introduced as a machine-readable, deployment-layer standard for expressing operational, regulatory, and ethical constraints for AI agents. The Policy Card sits with the agen...
- Improving LLM Reasoning via Dependency-Aware Query Decomposition and Logic-Parallel Content Expansion : Abstract: The integration of Large Language Models (LLMs) into real-time Web applications, such as AI-powered search and conversational agents, presents a fundamental Web infrastructure challenge: rec...
- APTBench: Benchmarking Agentic Potential of Base LLMs During Pre-Training : Abstract: With the rapid development of LLM-based agents, there is a growing trend to incorporate agent-specific data into the pre-training stage of LLMs, aiming to better align LLMs with real-world a...
- OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows : Abstract: Computer-using agents powered by Vision-Language Models (VLMs) have demonstrated human-like capabilities in operating digital environments like mobile platforms. While these agents hold grea...
- Human-Level Reasoning: A Comparative Study of Large Language Models on Logical and Abstract Reasoning : Abstract: Evaluating reasoning ability in Large Language Models (LLMs) is important for advancing artificial intelligence, as it transcends mere linguistic task performance. It involves understanding ...
- Law in Silico: Simulating Legal Society with LLM-Based Agents : Abstract: Since real-world legal experiments are often costly or infeasible, simulating legal societies with Artificial Intelligence (AI) systems provides an effective alternative for verifying and de...
- Affordance Representation and Recognition for Autonomous Agents : Abstract: The autonomy of software agents is fundamentally dependent on their ability to construct an actionable internal world model from the structured data that defines their digital environment, s...
- Adaptive Surrogate Gradients for Sequential Reinforcement Learning in Spiking Neural Networks : Abstract: Neuromorphic computing systems are set to revolutionize energy-constrained robotics by achieving orders-of-magnitude efficiency gains, while enabling native temporal processing. Spiking Neur...
- From Cross-Task Examples to In-Task Prompts: A Graph-Based Pseudo-Labeling Framework for In-context Learning : Abstract: The capability of in-context learning (ICL) enables large language models (LLMs) to perform novel tasks without parameter updates by conditioning on a few input-output examples. However, col...
- Generative AI for Healthcare: Fundamentals, Challenges, and Perspectives : Abstract: Generative Artificial Intelligence (GenAI) is taking the world by storm. It promises transformative opportunities for advancing and disrupting existing practices, including healthcare. From ...
- FunReason-MT Technical Report: Overcoming the Complexity Barrier in Multi-Turn Function Calling : Abstract: Function calling (FC) empowers large language models (LLMs) and autonomous agents to interface with external tools, a critical capability for solving complex, real-world problems. As this ab...
- Advancing site-specific disease and pest management in precision agriculture: From reasoning-driven foundation models to adaptive, feedback-based learning : Abstract: Site-specific disease management (SSDM) in crops has advanced rapidly through machine and deep learning (ML and DL) for real-time computer vision. Research evolved from handcrafted feature e...
- OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs : Abstract: Agentic tool use has gained traction with the rise of agentic tool calling, yet most existing work overlooks the complexity of multi-turn tool interactions. We introduce OrchDAG, a synthetic...
- Bridging Tool Dependencies and Domain Knowledge: A Graph-Based Framework for In-Context Planning : Abstract: We present a framework for uncovering and exploiting dependencies among tools and documents to enhance exemplar artifact generation. Our method begins by constructing a tool knowledge graph ...
- Feedback Lunch: Deep Feedback Codes for Wiretap Channels : Abstract: We consider reversely-degraded wiretap channels, for which the secrecy capacity is zero if there is no channel feedback. This work focuses on a seeded modular code design for the Gaussian wi...
- An Enhanced Dual Transformer Contrastive Network for Multimodal Sentiment Analysis : Abstract: Multimodal Sentiment Analysis (MSA) seeks to understand human emotions by jointly analyzing data from multiple modalities typically text and images offering a richer and more accurate interp...
- Short Ticketing Detection Framework Analysis Report : Abstract: This report presents a comprehensive analysis of an unsupervised multi-expert machine learning framework for detecting short ticketing fraud in railway systems. The study introduces an A/B/C...
- Genotype-Phenotype Integration through Machine Learning and Personalized Gene Regulatory Networks for Cancer Metastasis Prediction : Abstract: Metastasis is the leading cause of cancer-related mortality, yet most predictive models rely on shallow architectures and neglect patient-specific regulatory mechanisms. Here, we integrate c...
- Speeding Up MACE: Low-Precision Tricks for Equivarient Force Fields : Abstract: Machine-learning force fields can deliver accurate molecular dynamics (MD) at high computational cost. For SO(3)-equivariant models such as MACE, there is little systematic evidence on wheth...
- From Detection to Discovery: A Closed-Loop Approach for Simultaneous and Continuous Medical Knowledge Expansion and Depression Detection on Social Media : Abstract: Social media user-generated content (UGC) provides real-time, self-reported indicators of mental health conditions such as depression, offering a valuable source for predictive analytics. Wh...
- AI-Driven Development of a Publishing Imprint: Xynapse Traces : Abstract: Xynapse Traces is an experimental publishing imprint created via a fusion of human and algorithmic methods using a configuration-driven architecture and a multi-model AI integration framewor...
- Chain of Execution Supervision Promotes General Reasoning in Large Language Models : Abstract: Building robust and general reasoning ability is a central goal in the development of large language models (LLMs). Recent efforts increasingly turn to code as a rich training source, given ...
- NUM2EVENT: Interpretable Event Reasoning from Numerical time-series : Abstract: Large language models (LLMs) have recently demonstrated impressive multimodal reasoning capabilities, yet their understanding of purely numerical time-series signals remains limited. Existin...
- Beyond Pairwise: Empowering LLM Alignment With Ranked Choice Modeling : Abstract: Alignment of large language models (LLMs) has predominantly relied on pairwise preference optimization, where annotators select the better of two responses to a prompt. While simple, this ap...
- LLMComp: A Language Modeling Paradigm for Error-Bounded Scientific Data Compression : Abstract: The rapid growth of high-resolution scientific simulations and observation systems is generating massive spatiotemporal datasets, making efficient, error-bounded compression increasingly imp...
- Noise is All You Need: Solving Linear Inverse Problems by Noise Combination Sampling with Diffusion Models : Abstract: Pretrained diffusion models have demonstrated strong capabilities in zero-shot inverse problem solving by incorporating observation information into the generation process of the diffusion m...
- Monotone and Separable Set Functions: Characterizations and Neural Models : Abstract: Motivated by applications for set containment problems, we consider the following fundamental problem: can we design set-to-vector functions so that the natural partial order on sets is pres...
- Help the machine to help you: an evaluation in the wild of egocentric data cleaning via skeptical learning : Abstract: Any digital personal assistant, whether used to support task performance, answer questions, or manage work and daily life, including fitness schedules, requires high-quality annotations to f...
- Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation : Abstract: Flight delay prediction has become a key focus in air traffic management, as delays highlight inefficiencies that impact overall network performance. This paper presents a lightweight large ...
- Combining Textual and Structural Information for Premise Selection in Lean : Abstract: Premise selection is a key bottleneck for scaling theorem proving in large formal libraries. Yet existing language-based methods often treat premises in isolation, ignoring the web of depend...
- Bridging Function Approximation and Device Physics via Negative Differential Resistance Networks : Abstract: Achieving fully analog neural computation requires hardware that can natively implement both linear and nonlinear operations with high efficiency. While analogue matrix-vector multiplication...
- Integrating Genomics into Multimodal EHR Foundation Models : Abstract: This paper introduces an innovative Electronic Health Record (EHR) foundation model that integrates Polygenic Risk Scores (PRS) as a foundational data modality, moving beyond traditional EHR...
- Structure-Aware Fusion with Progressive Injection for Multimodal Molecular Representation Learning : Abstract: Multimodal molecular models often suffer from 3D conformer unreliability and modality collapse, limiting their robustness and generalization. We propose MuMo, a structured multimodal fusion ...
- Spatially Aware Linear Transformer (SAL-T) for Particle Jet Tagging : Abstract: Transformers are very effective in capturing both global and local correlations within high-energy particle collisions, but they present deployment challenges in high-data-throughput environ...
- VisCoder2: Building Multi-Language Visualization Coding Agents : Abstract: Large language models (LLMs) have recently enabled coding agents capable of generating, executing, and revising visualization code. However, existing models often fail in practical workflows...
- SAND: A Self-supervised and Adaptive NAS-Driven Framework for Hardware Trojan Detection : Abstract: The globalized semiconductor supply chain has made Hardware Trojans (HT) a significant security threat to embedded systems, necessitating the design of efficient and adaptable detection mech...
- RoGBot: Relationship-Oblivious Graph-based Neural Network with Contextual Knowledge for Bot Detection : Abstract: Detecting automated accounts (bots) among genuine users on platforms like Twitter remains a challenging task due to the evolving behaviors and adaptive strategies of such accounts. While rec...
- Efficient Low Rank Attention for Long-Context Inference in Large Language Models : Abstract: As the length of input text grows, the key-value (KV) cache in LLMs imposes prohibitive GPU memory costs and limits long-context inference on resource constrained devices. Existing approache...
- Beyond Hidden-Layer Manipulation: Semantically-Aware Logit Interventions for Debiasing LLMs : Abstract: We proposed Static and Dynamic -- two zero-shot logits-layer debiasing methods. Dynamic reduces bias by up to 70% with minimal fluency loss. Logits intervention outperforms hidden-layer appr...
- The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models : Abstract: Although large language models (LLMs) have achieved revolutionary breakthroughs in many fields, their large model size and high computational cost pose significant challenges for practical d...
- Error Adjustment Based on Spatiotemporal Correlation Fusion for Traffic Forecasting : Abstract: Deep neural networks (DNNs) play a significant role in an increasing body of research on traffic forecasting due to their effectively capturing spatiotemporal patterns embedded in traffic da...
- Aligning Diffusion Language Models via Unpaired Preference Optimization : Abstract: Diffusion language models (dLLMs) are an emerging alternative to autoregressive (AR) generators, but aligning them to human preferences is challenging because sequence log-likelihoods are in...
- Quanvolutional Neural Networks for Pneumonia Detection: An Efficient Quantum-Assisted Feature Extraction Paradigm : Abstract: Pneumonia poses a significant global health challenge, demanding accurate and timely diagnosis. While deep learning, particularly Convolutional Neural Networks (CNNs), has shown promise in m...
- Agentsway -- Software Development Methodology for AI Agents-based Teams : Abstract: The emergence of Agentic AI is fundamentally transforming how software is designed, developed, and maintained. Traditional software development methodologies such as Agile, Kanban, ShapeUp, ...
- Transformers from Compressed Representations : Abstract: Compressed file formats are the corner stone of efficient data storage and transmission, yet their potential for representation learning remains largely underexplored. We introduce TEMPEST (...
- Optimize Any Topology: A Foundation Model for Shape- and Resolution-Free Structural Topology Optimization : Abstract: Structural topology optimization (TO) is central to engineering design but remains computationally intensive due to complex physics and hard constraints. Existing deep-learning methods are l...
- Traffic flow forecasting, STL decomposition, Hybrid model, LSTM, ARIMA, XGBoost, Intelligent transportation systems : Abstract: Accurate traffic flow forecasting is essential for intelligent transportation systems and urban traffic management. However, single model approaches often fail to capture the complex, nonlin...
- What Work is AI Actually Doing? Uncovering the Drivers of Generative AI Adoption : Abstract: Purpose: The rapid integration of artificial intelligence (AI) systems like ChatGPT, Claude AI, etc., has a deep impact on how work is done. Predicting how AI will reshape work requires unde...
- Sparsity and Superposition in Mixture of Experts : Abstract: Mixture of Experts (MoE) models have become central to scaling large language models, yet their mechanistic differences from dense networks remain poorly understood. Previous work has explor...
- MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers : Abstract: The Model Context Protocol (MCP) has emerged as a standardized interface enabling seamless integration between Large Language Models (LLMs) and external data sources and tools. While MCP sig...
- RefleXGen:The unexamined code is not worth using : Abstract: Security in code generation remains a pivotal challenge when applying large language models (LLMs). This paper introduces RefleXGen, an innovative method that significantly enhances code sec...
- QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents : Abstract: Modern coding agents integrated into IDEs combine powerful tools and system-level actions, exposing a high-stakes attack surface. Existing Indirect Prompt Injection (IPI) studies focus mainl...
- Beyond Prompt Engineering: Neuro-Symbolic-Causal Architecture for Robust Multi-Objective AI Agents : Abstract: Large language models show promise as autonomous decision-making agents, yet their deployment in high-stakes domains remains fraught with risk. Without architectural safeguards, LLM agents e...
- Parallel BiLSTM-Transformer networks for forecasting chaotic dynamics : Abstract: The nonlinear nature of chaotic systems results in extreme sensitivity to initial conditions and highly intricate dynamical behaviors, posing fundamental challenges for accurately predicting...
- On the Societal Impact of Machine Learning : Abstract: This PhD thesis investigates the societal impact of machine learning (ML). ML increasingly informs consequential decisions and recommendations, significantly affecting many aspects of our li...
- Debiasing Reward Models by Representation Learning with Guarantees : Abstract: Recent alignment techniques, such as reinforcement learning from human feedback, have been widely adopted to align large language models with human preferences by learning and leveraging rew...
- Explaining Robustness to Catastrophic Forgetting Through Incremental Concept Formation : Abstract: Catastrophic forgetting remains a central challenge in continual learning, where models are required to integrate new knowledge over time without losing what they have previously learned. In...
- TDFlow: Agentic Workflows for Test Driven Software Engineering : Abstract: We introduce TDFlow, a novel test-driven agentic workflow that frames repository-scale software engineering as a test-resolution task, specifically designed to solve human-written tests. Giv...
- Explainable Detection of AI-Generated Images with Artifact Localization Using Faster-Than-Lies and Vision-Language Models for Edge Devices : Abstract: The increasing realism of AI-generated imagery poses challenges for verifying visual authenticity. We present an explainable image authenticity detection system that combines a lightweight c...
- CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting : Abstract: Humans can effortlessly count diverse objects by perceiving visual repetition and structural relationships rather than relying on class identity. However, most existing counting models fail ...
- A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras : Abstract: The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activ...
- CRADLE Bench: A Clinician-Annotated Benchmark for Multi-Faceted Mental Health Crisis and Safety Risk Detection : Abstract: Detecting mental health crisis situations such as suicide ideation, rape, domestic violence, child abuse, and sexual harassment is a critical yet underexplored challenge for language models....
- A Neural Model for Contextual Biasing Score Learning and Filtering : Abstract: Contextual biasing improves automatic speech recognition (ASR) by integrating external knowledge, such as user-specific phrases or entities, during decoding. In this work, we use an attentio...
- Can LLMs Narrate Tabular Data? An Evaluation Framework for Natural Language Representations of Text-to-SQL System Outputs : Abstract: In modern industry systems like multi-turn chat agents, Text-to-SQL technology bridges natural language (NL) questions and database (DB) querying. The conversion of tabular DB results into N...
- A PDE-Informed Latent Diffusion Model for 2-m Temperature Downscaling : Abstract: This work presents a physics-conditioned latent diffusion model tailored for dynamical downscaling of atmospheric data, with a focus on reconstructing high-resolution 2-m temperature fields....
- OraPlan-SQL: A Planning-Centric Framework for Complex Bilingual NL2SQL Reasoning : Abstract: We present OraPlan-SQL, our system for the Archer NL2SQL Evaluation Challenge 2025, a bilingual benchmark requiring complex reasoning such as arithmetic, commonsense, and hypothetical infere...
- PRO: Enabling Precise and Robust Text Watermark for Open-Source LLMs : Abstract: Text watermarking for large language models (LLMs) enables model owners to verify text origin and protect intellectual property. While watermarking methods for closed-source LLMs are relativ...
- Evaluating the effectiveness of LLM-based interoperability : Abstract: Background: Systems of systems are becoming increasingly dynamic and heterogeneous, and this adds pressure on the long-standing challenge of interoperability. Besides its technical aspect, i...
- RS-ORT: A Reduced-Space Branch-and-Bound Algorithm for Optimal Regression Trees : Abstract: Mixed-integer programming (MIP) has emerged as a powerful framework for learning optimal decision trees. Yet, existing MIP approaches for regression tasks are either limited to purely binary...
- Group Interventions on Deep Networks for Causal Discovery in Subsystems : Abstract: Causal discovery uncovers complex relationships between variables, enhancing predictions, decision-making, and insights into real-world systems, especially in nonlinear multivariate time ser...
- DynaStride: Dynamic Stride Windowing with MMCoT for Instructional Multi-Scene Captioning : Abstract: Scene-level captioning in instructional videos can enhance learning by requiring an understanding of both visual cues and temporal structure. By aligning visual cues with textual guidance, t...
- Key and Value Weights Are Probably All You Need: On the Necessity of the Query, Key, Value weight Triplet in Decoder-Only Transformers : Abstract: The Query, Key, Value weight triplet is a building block of current attention mechanisms in state-of-the-art LLMs. We theoretically investigate whether this triplet can be reduced, proving u...
- Agent-based Automated Claim Matching with Instruction-following LLMs : Abstract: We present a novel agent-based approach for the automated claim matching task with instruction-following LLMs. We propose a two-step pipeline that first generates prompts with LLMs, to then ...
- MFiSP: A Multimodal Fire Spread Prediction Framework : Abstract: The 2019-2020 Black Summer bushfires in Australia devastated 19 million hectares, destroyed 3,000 homes, and lasted seven months, demonstrating the escalating scale and urgency of wildfire t...
- Scalable GPU-Based Integrity Verification for Large Machine Learning Models : Abstract: We present a security framework that strengthens distributed machine learning by standardizing integrity protections across CPU and GPU platforms and significantly reducing verification over...
- Modeling Biological Multifunctionality with Echo State Networks : Abstract: In this work, a three-dimensional multicomponent reaction-diffusion model has been developed, combining excitable-system dynamics with diffusion processes and sharing conceptual features wit...
- Auto prompting without training labels: An LLM cascade for product quality assessment in e-commerce catalogs : Abstract: We introduce a novel, training free cascade for auto-prompting Large Language Models (LLMs) to assess product quality in e-commerce. Our system requires no training labels or model fine-tuni...
- ChessQA: Evaluating Large Language Models for Chess Understanding : Abstract: Chess provides an ideal testbed for evaluating the reasoning, modeling, and abstraction capabilities of large language models (LLMs), as it has well-defined structure and objective ground tr...
- Uncovering the Potential Risks in Unlearning: Danger of English-only Unlearning in Multilingual LLMs : Abstract: There have been a couple of studies showing that attempting to erase multilingual knowledge using only English data is insufficient for multilingual LLMs. However, their analyses remain high...
- Game-TARS: Pretrained Foundation Models for Scalable Generalist Multimodal Game Agents : Abstract: We present Game-TARS, a generalist game agent trained with a unified, scalable action space anchored to human-aligned native keyboard-mouse inputs. Unlike API- or GUI-based approaches, this ...
- AI and the Decentering of Disciplinary Creativity : Abstract: This paper examines the role of artificial intelligence in scientific problem-solving, with a focus on its implications for disciplinary creativity. Drawing on recent work in the philosophy ...
- Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability : Abstract: Multi-environment POMDPs (ME-POMDPs) extend standard POMDPs with discrete model uncertainty. ME-POMDPs represent a finite set of POMDPs that share the same state, action, and observation spa...
- Test-Time Tuned Language Models Enable End-to-end De Novo Molecular Structure Generation from MS/MS Spectra : Abstract: Tandem Mass Spectrometry enables the identification of unknown compounds in crucial fields such as metabolomics, natural product discovery and environmental analysis. However, current method...
- Evaluating In Silico Creativity: An Expert Review of AI Chess Compositions : Abstract: The rapid advancement of Generative AI has raised significant questions regarding its ability to produce creative and novel outputs. Our recent work investigates this question within the dom...
- Why Foundation Models in Pathology Are Failing : Abstract: In non-medical domains, foundation models (FMs) have revolutionized computer vision and language processing through large-scale self-supervised and multimodal learning. Consequently, their r...
- ReCAP: Recursive Context-Aware Reasoning and Planning for Large Language Model Agents : Abstract: Long-horizon tasks requiring multi-step reasoning and dynamic re-planning remain challenging for large language models (LLMs). Sequential prompting methods are prone to context drift, loss o...
- Decentralized Multi-Agent Goal Assignment for Path Planning using Large Language Models : Abstract: Coordinating multiple autonomous agents in shared environments under decentralized conditions is a long-standing challenge in robotics and artificial intelligence. This work addresses the pr...
- From Benchmarks to Business Impact: Deploying IBM Generalist Agent in Enterprise Production : Abstract: Agents are rapidly advancing in automating digital work, but enterprises face a harder challenge: moving beyond prototypes to deployed systems that deliver measurable business value. This pa...
- Generating Creative Chess Puzzles : Abstract: While Generative AI rapidly advances in various domains, generating truly creative, aesthetic, and counter-intuitive outputs remains a challenge. This paper presents an approach to tackle th...
- Hybrid Modeling, Sim-to-Real Reinforcement Learning, and Large Language Model Driven Control for Digital Twins : Abstract: This work investigates the use of digital twins for dynamical system modeling and control, integrating physics-based, data-driven, and hybrid approaches with both traditional and AI-driven c...
- Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges : Abstract: Agentic AI systems powered by large language models (LLMs) and endowed with planning, tool use, memory, and autonomy, are emerging as powerful, flexible platforms for automation. Their abili...
- Latent Chain-of-Thought for Visual Reasoning : Abstract: Chain-of-thought (CoT) reasoning is critical for improving the interpretability and reliability of Large Vision-Language Models (LVLMs). However, existing training algorithms such as SFT, PP...
- Decentralized Causal Discovery using Judo Calculus : Abstract: We describe a theory and implementation of an intuitionistic decentralized framework for causal discovery using judo calculus, which is formally defined as j-stable causal inference using j-...
- The Sign Estimator: LLM Alignment in the Face of Choice Heterogeneity : Abstract: Traditional LLM alignment methods are vulnerable to heterogeneity in human preferences. Fitting a na\"ive probabilistic model to pairwise comparison data (say over prompt-completion pairs) y...
- Learning Individual Movement Shifts After Urban Disruptions with Social Infrastructure Reliance : Abstract: Shifts in individual movement patterns following disruptive events can reveal changing demands for community resources. However, predicting such shifts before disruptive events remains chall...
- Discovering Heuristics with Large Language Models (LLMs) for Mixed-Integer Programs: Single-Machine Scheduling : Abstract: Our study contributes to the scheduling and combinatorial optimization literature with new heuristics discovered by leveraging the power of Large Language Models (LLMs). We focus on the sing...
- OneCast: Structured Decomposition and Modular Generation for Cross-Domain Time Series Forecasting : Abstract: Cross-domain time series forecasting is a valuable task in various web applications. Despite its rapid advancement, achieving effective generalization across heterogeneous time series data r...
- LLMLogAnalyzer: A Clustering-Based Log Analysis Chatbot using Large Language Models : Abstract: System logs are a cornerstone of cybersecurity, supporting proactive breach prevention and post-incident investigations. However, analyzing vast amounts of diverse log data remains significa...
- Modeling Electric Vehicle Car-Following Behavior: Classical vs Machine Learning Approach : Abstract: The increasing adoption of electric vehicles (EVs) necessitates an understanding of their driving behavior to enhance traffic safety and develop smart driving systems. This study compares cl...
- HistoLens: An Interactive XAI Toolkit for Verifying and Mitigating Flaws in Vision-Language Models for Histopathology : Abstract: For doctors to truly trust artificial intelligence, it can't be a black box. They need to understand its reasoning, almost as if they were consulting a colleague. We created HistoLens1 to be...
Research Sources: 555 | Generated: 10/30/2025
