AI RESEARCH PAPERS & ACADEMIC SOURCES
- $\mathbf{S^2LM}$: Towards Semantic Steganography via Large Language Models : Abstract: Although steganography has made significant advancements in recent years, it still struggles to embed semantically rich, sentence-level information into carriers. However, in the era of AIGC...
- Canonical Space Representation for 4D Panoptic Segmentation of Articulated Objects : Abstract: Articulated object perception presents significant challenges in computer vision, particularly because most existing methods ignore temporal dynamics despite the inherently dynamic nature of...
- Dense Motion Captioning : Abstract: Recent advances in 3D human motion and language integration have primarily focused on text-to-motion generation, leaving the task of motion understanding relatively unexplored. We introduce ...
- PreResQ-R1: Towards Fine-Grained Rank-and-Score Reinforcement Learning for Visual Quality Assessment via Preference-Response Disentangled Policy Optimization : Abstract: Visual Quality Assessment (QA) seeks to predict human perceptual judgments of visual fidelity. While recent multimodal large language models (MLLMs) show promise in reasoning about image and...
- PALM: A Dataset and Baseline for Learning Multi-subject Hand Prior : Abstract: The ability to grasp objects, signal with gestures, and share emotion through touch all stem from the unique capabilities of human hands. Yet creating high-quality personalized hand avatars ...
- Sharing the Learned Knowledge-base to Estimate Convolutional Filter Parameters for Continual Image Restoration : Abstract: Continual learning is an emerging topic in the field of deep learning, where a model is expected to learn continuously for new upcoming tasks without forgetting previous experiences. This fi...
- Shared Latent Representation for Joint Text-to-Audio-Visual Synthesis : Abstract: We propose a text-to-talking-face synthesis framework leveraging latent speech representations from HierSpeech++. A Text-to-Vec module generates Wav2Vec2 embeddings from text, which jointly ...
- The Potential of Copernicus Satellites for Disaster Response: Retrieving Building Damage from Sentinel-1 and Sentinel-2 : Abstract: Natural disasters demand rapid damage assessment to guide humanitarian response. Here, we investigate whether medium-resolution Earth observation images from the Copernicus program can suppo...
- Photo Dating by Facial Age Aggregation : Abstract: We introduce a novel method for Photo Dating which estimates the year a photograph was taken by leveraging information from the faces of people present in the image. To facilitate this resea...
- EventFlow: Real-Time Neuromorphic Event-Driven Classification of Two-Phase Boiling Flow Regimes : Abstract: Flow boiling is an efficient heat transfer mechanism capable of dissipating high heat loads with minimal temperature variation, making it an ideal thermal management method. However, sudden ...
- Semantic-Guided Natural Language and Visual Fusion for Cross-Modal Interaction Based on Tiny Object Detection : Abstract: This paper introduces a cutting-edge approach to cross-modal interaction for tiny object detection by combining semantic-guided natural language processing with advanced visual recognition b...
- GroupKAN: Rethinking Nonlinearity with Grouped Spline-based KAN Modeling for Efficient Medical Image Segmentation : Abstract: Medical image segmentation requires models that are accurate, lightweight, and interpretable. Convolutional architectures lack adaptive nonlinearity and transparent decision-making, whereas ...
- Visual Spatial Tuning : Abstract: Capturing spatial relationships from visual inputs is a cornerstone of human-like general intelligence. Several previous studies have tried to enhance the spatial awareness of Vision-Languag...
- LG-NuSegHop: A Local-to-Global Self-Supervised Pipeline For Nuclei Instance Segmentation : Abstract: Nuclei segmentation is the cornerstone task in histology image reading, shedding light on the underlying molecular patterns and leading to disease or cancer diagnosis. Yet, it is a laborious...
- UHDRes: Ultra-High-Definition Image Restoration via Dual-Domain Decoupled Spectral Modulation : Abstract: Ultra-high-definition (UHD) images often suffer from severe degradations such as blur, haze, rain, or low-light conditions, which pose significant challenges for image restoration due to the...
- DAFM: Dynamic Adaptive Fusion for Multi-Model Collaboration in Composed Image Retrieval : Abstract: Composed Image Retrieval (CIR) is a cross-modal task that aims to retrieve target images from large-scale databases using a reference image and a modification text. Most existing methods rel...
- Quantifying the Risk of Transferred Black Box Attacks : Abstract: Neural networks have become pervasive across various applications, including security-related products. However, their widespread adoption has heightened concerns regarding vulnerability to ...
- PySlyde: A Lightweight, Open-Source Toolkit for Pathology Preprocessing : Abstract: The integration of artificial intelligence (AI) into pathology is advancing precision medicine by improving diagnosis, treatment planning, and patient outcomes. Digitised whole-slide images ...
- Neural Image Abstraction Using Long Smoothing B-Splines : Abstract: We integrate smoothing B-splines into a standard differentiable vector graphics (DiffVG) pipeline through linear mapping, and show how this can be used to generate smooth and arbitrarily lon...
- EveryDayVLA: A Vision-Language-Action Model for Affordable Robotic Manipulation : Abstract: While Vision-Language-Action (VLA) models map visual inputs and language instructions directly to robot actions, they often rely on costly hardware and struggle in novel or cluttered scenes....
- Thera: Aliasing-Free Arbitrary-Scale Super-Resolution with Neural Heat Fields : Abstract: Recent approaches to arbitrary-scale single image super-resolution (ASR) use neural fields to represent continuous signals that can be sampled at arbitrary resolutions. However, point-wise q...
- FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models : Abstract: Foundation models have exhibited unprecedented capabilities in tackling many domains and tasks. Models such as CLIP are currently widely used to bridge cross-modal representations, and text-...
- On Scaling Up 3D Gaussian Splatting Training : Abstract: 3D Gaussian Splatting (3DGS) is increasingly popular for 3D reconstruction due to its superior visual quality and rendering speed. However, 3DGS training currently occurs on a single GPU, li...
- Dark Transformer: A Video Transformer for Action Recognition in the Dark : Abstract: Recognizing human actions in adverse lighting conditions presents significant challenges in computer vision, with wide-ranging applications in visual surveillance and nighttime driving. Exis...
- SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition : Abstract: Recent studies show that the visual place recognition (VPR) method using pre-trained visual foundation models can achieve promising performance. In our previous work, we propose a novel meth...
- MOSAIC: Generating Consistent, Privacy-Preserving Scenes from Multiple Depth Views in Multi-Room Environments : Abstract: We introduce a diffusion-based approach for generating privacy-preserving digital twins of multi-room indoor environments from depth images only. Central to our approach is a novel Multi-vie...
- Consistency Trajectory Matching for One-Step Generative Super-Resolution : Abstract: Current diffusion-based super-resolution (SR) approaches achieve commendable performance at the cost of high inference overhead. Therefore, distillation techniques are utilized to accelerate...
- ControlGS: Consistent Structural Compression Control for Deployment-Aware Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) is a highly deployable real-time method for novel view synthesis. In practice, it requires a universal, consistent control mechanism that adjusts the trade-off b...
- Dual Teacher-Student Learning for Semi-supervised Medical Image Segmentation : Abstract: Semi-supervised learning reduces the costly manual annotation burden in medical image segmentation. A popular approach is the mean teacher (MT) strategy, which applies consistency regulariza...
- Towards Understanding the Mechanisms of Classifier-Free Guidance : Abstract: Classifier-free guidance (CFG) is a core technique powering state-of-the-art image generation systems, yet its underlying mechanisms remain poorly understood. In this work, we begin by analy...
- Diffusion Denoised Hyperspectral Gaussian Splatting : Abstract: Hyperspectral imaging (HSI) has been widely used in agricultural applications for non-destructive estimation of plant nutrient composition and precise determination of nutritional elements o...
- USIGAN: Unbalanced Self-Information Feature Transport for Weakly Paired Image IHC Virtual Staining : Abstract: Immunohistochemical (IHC) virtual staining is a task that generates virtual IHC images from H\&E images while maintaining pathological semantic consistency with adjacent slices. This task ai...
- GeoSVR: Taming Sparse Voxels for Geometrically Accurate Surface Reconstruction : Abstract: Reconstructing accurate surfaces with radiance fields has achieved remarkable progress in recent years. However, prevailing approaches, primarily based on Gaussian Splatting, are increasingl...
- Generative Autoregressive Transformers for Model-Agnostic Federated MRI Reconstruction : Abstract: While learning-based models hold great promise for MRI reconstruction, single-site models trained on limited local datasets often show poor generalization. This has motivated collaborative t...
- Asymptotically Unbiased Synthetic Control Methods by Moment Matching : Abstract: Synthetic Control Methods (SCMs) have become a fundamental tool for comparative case studies. The core idea behind SCMs is to estimate treatment effects by predicting counterfactual outcomes...
- In-and-Out: Algorithmic Diffusion for Sampling Convex Bodies : Abstract: We present a new random walk for uniformly sampling high-dimensional convex bodies. It achieves state-of-the-art runtime complexity with stronger guarantees on the output than previously kno...
- A Low Rank Neural Representation of Entropy Solutions : Abstract: We construct a new representation of entropy solutions to nonlinear scalar conservation laws with a smooth convex flux function in a single spatial dimension. The representation is a general...
- Comparative Study on Noise-Augmented Training and its Effect on Adversarial Robustness in ASR Systems : Abstract: In this study, we investigate whether noise-augmented training can concurrently improve adversarial robustness in automatic speech recognition (ASR) systems. We conduct a comparative analysi...
- Identifying Drift, Diffusion, and Causal Structure from Temporal Snapshots : Abstract: Stochastic differential equations (SDEs) are a fundamental tool for modelling dynamic processes, including gene regulatory networks (GRNs), contaminant transport, financial markets, and imag...
- Prediction-Powered Adaptive Shrinkage Estimation : Abstract: Prediction-Powered Inference (PPI) is a powerful framework for enhancing statistical estimates by combining limited gold-standard data with machine learning (ML) predictions. While prior wor...
- LLM-Based Emulation of the Radio Resource Control Layer: Towards AI-Native RAN Protocols : Abstract: Integrating Large AI Models (LAMs) into 6G mobile networks is a key enabler of the AI-Native Air Interface (AI-AI), where protocol intelligence must scale beyond handcrafted logic. This pape...
- Generalizable, real-time neural decoding with hybrid state-space models : Abstract: Real-time decoding of neural activity is central to neuroscience and neurotechnology applications, from closed-loop experiments to brain-computer interfaces, where models are subject to stri...
- Performative Validity of Recourse Explanations : Abstract: When applicants get rejected by an algorithmic decision system, recourse explanations provide actionable suggestions for how to change their input features to get a positive evaluation. A cr...
- Parsimonious Gaussian mixture models with piecewise-constant eigenvalue profiles : Abstract: Gaussian mixture models (GMMs) are ubiquitous in statistical learning, particularly for unsupervised problems. While full GMMs suffer from the overparameterization of their covariance matric...
- It's Hard to Be Normal: The Impact of Noise on Structure-agnostic Estimation : Abstract: Structure-agnostic causal inference studies how well one can estimate a treatment effect given black-box machine learning estimates of nuisance functions (like the impact of confounders on t...
- SARC: Sentiment-Augmented Deep Role Clustering for Fake News Detection : Abstract: Fake news detection has been a long-standing research focus in social networks. Recent studies suggest that incorporating sentiment information from both news content and user comments can e...
- Cross-Lingual SynthDocs: A Large-Scale Synthetic Corpus for Any to Arabic OCR and Document Understanding : Abstract: Cross-Lingual SynthDocs is a large-scale synthetic corpus designed to address the scarcity of Arabic resources for Optical Character Recognition (OCR) and Document Understanding (DU). The da...
- GEMMA-SQL: A Novel Text-to-SQL Model Based on Large Language Models : Abstract: Text-to-SQL systems enable users to interact with structured databases using natural language, eliminating the need for specialized programming knowledge. In this work, we introduce GEMMA-SQ...
- Surprisal reveals diversity gaps in image captioning and different scorers change the story : Abstract: We quantify linguistic diversity in image captioning with surprisal variance - the spread of token-level negative log-probabilities within a caption set. On the MSCOCO test set, we compare f...
- Explore Data Left Behind in Reinforcement Learning for Reasoning Language Models : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as an effective approach for improving the reasoning abilities of large language models (LLMs). The Group Relative Policy Op...
- SDS KoPub VDR: A Benchmark Dataset for Visual Document Retrieval in Korean Public Documents : Abstract: Existing benchmarks for visual document retrieval (VDR) largely overlook non-English languages and the structural complexity of official publications. To address this critical gap, we introd...
- AgentExpt: Automating AI Experiment Design with LLM-based Resource Retrieval Agent : Abstract: Large language model agents are becoming increasingly capable at web-centric tasks such as information retrieval, complex reasoning. These emerging capabilities have given rise to surge rese...
- Diagnosing and Mitigating Semantic Inconsistencies in Wikidata's Classification Hierarchy : Abstract: Wikidata is currently the largest open knowledge graph on the web, encompassing over 120 million entities. It integrates data from various domain-specific databases and imports a substantial...
- LoPT: Lossless Parallel Tokenization Acceleration for Long Context Inference of Large Language Model : Abstract: Long context inference scenarios have become increasingly important for large language models, yet they introduce significant computational latency. While prior research has optimized long-s...
- Acquiring Common Chinese Emotional Events Using Large Language Model : Abstract: Knowledge about emotional events is an important kind of knowledge which has been applied to improve the effectiveness of different applications. However, emotional events cannot be easily a...
- Order-Level Attention Similarity Across Language Models: A Latent Commonality : Abstract: In this paper, we explore an important yet previously neglected question: Do context aggregation patterns across Language Models (LMs) share commonalities? While some works have investigated...
- Reasoning-Guided Claim Normalization for Noisy Multilingual Social Media Posts : Abstract: We address claim normalization for multilingual misinformation detection - transforming noisy social media posts into clear, verifiable statements across 20 languages. The key contribution d...
- On Text Simplification Metrics and General-Purpose LLMs for Accessible Health Information, and A Potential Architectural Advantage of The Instruction-Tuned LLM class : Abstract: The increasing health-seeking behavior and digital consumption of biomedical information by the general public necessitate scalable solutions for automatically adapting complex scientific an...
- A Toolbox for Improving Evolutionary Prompt Search : Abstract: Evolutionary prompt optimization has demonstrated effectiveness in refining prompts for LLMs. However, existing approaches lack robust operators and efficient evaluation mechanisms. In this ...
- ManufactuBERT: Efficient Continual Pretraining for Manufacturing : Abstract: While large general-purpose Transformer-based encoders excel at general language understanding, their performance diminishes in specialized domains like manufacturing due to a lack of exposu...
- Mind the Gap... or Not? How Translation Errors and Evaluation Details Skew Multilingual Results : Abstract: Most current large language models (LLMs) support a wide variety of languages in addition to English, including high-resource languages (e.g. German, Chinese, French), as well as low-resourc...
- Effectiveness of Chain-of-Thought in Distilling Reasoning Capability from Large Language Models : Abstract: Chain-of-Thought (CoT) prompting is a widely used method to improve the reasoning capability of Large Language Models (LLMs). More recently, CoT has been leveraged in Knowledge Distillation ...
- Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese : Abstract: Ancient people translated classical Chinese into Japanese by annotating around each character. We abstract this process as sequence tagging tasks and fit them into modern language technologi...
- Reflective Personalization Optimization: A Post-hoc Rewriting Framework for Black-Box Large Language Models : Abstract: The personalization of black-box large language models (LLMs) is a critical yet challenging task. Existing approaches predominantly rely on context injection, where user history is embedded ...
- Listening Between the Lines: Decoding Podcast Narratives with Language Modeling : Abstract: Podcasts have become a central arena for shaping public opinion, making them a vital source for understanding contemporary discourse. Their typically unscripted, multi-themed, and conversati...
- Evaluating Subword Tokenization Techniques for Bengali: A Benchmark Study with BengaliBPE : Abstract: Tokenization is an important first step in Natural Language Processing (NLP) pipelines because it decides how models learn and represent linguistic information. However, current subword toke...
- Large Language Models for Explainable Threat Intelligence : Abstract: As cyber threats continue to grow in complexity, traditional security mechanisms struggle to keep up. Large language models (LLMs) offer significant potential in cybersecurity due to their a...
- Minority-Aware Satisfaction Estimation in Dialogue Systems via Preference-Adaptive Reinforcement Learning : Abstract: User satisfaction in dialogue systems is inherently subjective. When the same response strategy is applied across users, minority users may assign different satisfaction ratings than majorit...
- MIMIC-SR-ICD11: A Dataset for Narrative-Based Diagnosis : Abstract: Disease diagnosis is a central pillar of modern healthcare, enabling early detection and timely intervention for acute conditions while guiding lifestyle adjustments and medication regimens ...
- Automatizaci\'on de Informes Geot\'ecnicos para Macizos Rocosos con IA : Abstract: Geotechnical reports are crucial for assessing the stability of rock formations and ensuring safety in modern engineering. Traditionally, these reports are prepared manually based on field o...
- Quantifying the Climate Risk of Generative AI: Region-Aware Carbon Accounting with G-TRACE and the AI Sustainability Pyramid : Abstract: Generative Artificial Intelligence (GenAI) represents a rapidly expanding digital infrastructure whose energy demand and associated CO2 emissions are emerging as a new category of climate ri...
- Association via Entropy Reduction : Abstract: Prior to recent successes using neural networks, term frequency-inverse document frequency (tf-idf) was clearly regarded as the best choice for identifying documents related to a query. We p...
- Towards Mitigating Hallucinations in Large Vision-Language Models by Refining Textual Embeddings : Abstract: In this work, we identify an inherent bias in prevailing LVLM architectures toward the language modality, largely resulting from the common practice of simply appending visual embeddings to ...
- Wikipedia-based Datasets in Russian Information Retrieval Benchmark RusBEIR : Abstract: In this paper, we present a novel series of Russian information retrieval datasets constructed from the "Did you know..." section of Russian Wikipedia. Our datasets support a range of retrie...
- ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations : Abstract: As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between pers...
- To Word Senses and Beyond: Inducing Concepts with Contextualized Language Models : Abstract: Polysemy and synonymy are two crucial interrelated facets of lexical ambiguity. While both phenomena are widely documented in lexical resources and have been studied extensively in NLP, lead...
- Extracting narrative signals from public discourse: a network-based approach : Abstract: Narratives are key interpretative devices by which humans make sense of political reality. As the significance of narratives for understanding current societal issues such as polarization an...
- NaturalReasoning: Reasoning in the Wild with 2.8M Challenging Questions : Abstract: Scaling reasoning capabilities beyond traditional domains such as math and coding is hindered by the lack of diverse and high-quality questions. To overcome this limitation, we introduce a s...
- Mind the Blind Spots: A Focus-Level Evaluation Framework for LLM Reviews : Abstract: Peer review underpins scientific progress, but it is increasingly strained by reviewer shortages and growing workloads. Large Language Models (LLMs) can automatically draft reviews now, but ...
- Exploring Multimodal Perception in Large Language Models Through Perceptual Strength Ratings : Abstract: This study investigated whether multimodal large language models can achieve human-like sensory grounding by examining their ability to capture perceptual strength ratings across sensory mod...
- MorphTok: Morphologically Grounded Tokenization for Indian Languages : Abstract: Tokenization is a crucial step in NLP, especially with the rise of large language models (LLMs), impacting downstream performance, computational cost, and efficiency. Existing LLMs rely on t...
- CSPLADE: Learned Sparse Retrieval with Causal Language Models : Abstract: In recent years, dense retrieval has been the focus of information retrieval (IR) research. While effective, dense retrieval produces uninterpretable dense vectors, and suffers from the draw...
- Towards Explainable Fake Image Detection with Multi-Modal Large Language Models : Abstract: Progress in image generation raises significant public security concerns. We argue that fake image detection should not operate as a "black box". Instead, an ideal approach must ensure both ...
- TRACE: Textual Relevance Augmentation and Contextual Encoding for Multimodal Hate Detection : Abstract: Social media memes are a challenging domain for hate detection because they intertwine visual and textual cues into culturally nuanced messages. To tackle these challenges, we introduce TRAC...
- Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering : Abstract: Document Visual Question Answering (DocVQA) faces dual challenges in processing lengthy multimodal documents (text, images, tables) and performing cross-modal reasoning. Current document ret...
- DARN: Dynamic Adaptive Regularization Networks for Efficient and Robust Foundation Model Adaptation : Abstract: Foundation models (FMs) offer powerful representations for geospatial analysis, but adapting them effectively remains challenging. Standard adaptation methods, whether full fine-tuning or ef...
- Global 3D Reconstruction of Clouds & Tropical Cyclones : Abstract: Accurate forecasting of tropical cyclones (TCs) remains challenging due to limited satellite observations probing TC structure and difficulties in resolving cloud properties involved in TC i...
- EETnet: a CNN for Gaze Detection and Tracking for Smart-Eyewear : Abstract: Event-based cameras are becoming a popular solution for efficient, low-power eye tracking. Due to the sparse and asynchronous nature of event data, they require less processing power and off...
- 3D Gaussian Point Encoders : Abstract: In this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition...
- Geometry Denoising with Preferred Normal Vectors : Abstract: We introduce a new paradigm for geometry denoising using prior knowledge about the surface normal vector. This prior knowledge comes in the form of a set of preferred normal vectors, which w...
- Self-Supervised Implicit Attention Priors for Point Cloud Reconstruction : Abstract: Recovering high-quality surfaces from irregular point cloud is ill-posed unless strong geometric priors are available. We introduce an implicit self-prior approach that distills a shape-spec...
- Clinical-ComBAT: a diffusion-weighted MRI harmonization method for clinical applications : Abstract: Diffusion-weighted magnetic resonance imaging (DW-MRI) derived scalar maps are effective for assessing neurodegenerative diseases and microstructural properties of white matter in large numb...
- Validating Vision Transformers for Otoscopy: Performance and Data-Leakage Effects : Abstract: This study evaluates the efficacy of vision transformer models, specifically Swin transformers, in enhancing the diagnostic accuracy of ear diseases compared to traditional convolutional neu...
- Learning to Restore Multi-Degraded Images via Ingredient Decoupling and Task-Aware Path Adaptation : Abstract: Image restoration (IR) aims to recover clean images from degraded observations. Despite remarkable progress, most existing methods focus on a single degradation type, whereas real-world imag...
- CLM: Removing the GPU Memory Barrier for 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) is an increasingly popular novel view synthesis approach due to its fast rendering time, and high-quality output. However, scaling 3DGS to large (or intricate) s...
- Challenges in 3D Data Synthesis for Training Neural Networks on Topological Features : Abstract: Topological Data Analysis (TDA) involves techniques of analyzing the underlying structure and connectivity of data. However, traditional methods like persistent homology can be computational...
- GSE: Evaluating Sticker Visual Semantic Similarity via a General Sticker Encoder : Abstract: Stickers have become a popular form of visual communication, yet understanding their semantic relationships remains challenging due to their highly diverse and symbolic content. In this work...
- Pressure2Motion: Hierarchical Motion Synthesis from Ground Pressure with Text Guidance : Abstract: We present Pressure2Motion, a novel motion capture algorithm that synthesizes human motion from a ground pressure sequence and text prompt. It eliminates the need for specialized lighting se...
- Medical Referring Image Segmentation via Next-Token Mask Prediction : Abstract: Medical Referring Image Segmentation (MRIS) involves segmenting target regions in medical images based on natural language descriptions. While achieving promising results, recent approaches ...
- Role-SynthCLIP: A Role Play Driven Diverse Synthetic Data Approach : Abstract: The effectiveness of Contrastive Language-Image Pre-training (CLIP) models critically depends on the semantic diversity and quality of their training data. However, while existing synthetic ...
- SurgiATM: A Physics-Guided Plug-and-Play Model for Deep Learning-Based Smoke Removal in Laparoscopic Surgery : Abstract: During laparoscopic surgery, smoke generated by tissue cauterization can significantly degrade the visual quality of endoscopic frames, increasing the risk of surgical errors and hindering b...
- A Dual-stage Prompt-driven Privacy-preserving Paradigm for Person Re-Identification : Abstract: With growing concerns over data privacy, researchers have started using virtual data as an alternative to sensitive real-world images for training person re-identification (Re-ID) models. Ho...
- Real-World Adverse Weather Image Restoration via Dual-Level Reinforcement Learning with High-Quality Cold Start : Abstract: Adverse weather severely impairs real-world visual perception, while existing vision models trained on synthetic data with fixed parameters struggle to generalize to complex degradations. To...
- SnowyLane: Robust Lane Detection on Snow-covered Rural Roads Using Infrastructural Elements : Abstract: Lane detection for autonomous driving in snow-covered environments remains a major challenge due to the frequent absence or occlusion of lane markings. In this paper, we present a novel, rob...
- Splatography: Sparse multi-view dynamic Gaussian Splatting for filmmaking challenges : Abstract: Deformable Gaussian Splatting (GS) accomplishes photorealistic dynamic 3-D reconstruction from dense multi-view video (MVV) by learning to deform a canonical GS representation. However, in f...
- MUSE: Multi-Scale Dense Self-Distillation for Nucleus Detection and Classification : Abstract: Nucleus detection and classification (NDC) in histopathology analysis is a fundamental task that underpins a wide range of high-level pathology applications. However, existing methods heavil...
- Walk the Lines 2: Contour Tracking for Detailed Segmentation : Abstract: This paper presents Walk the Lines 2 (WtL2), a unique contour tracking algorithm specifically adapted for detailed segmentation of infrared (IR) ships and various objects in RGB.1 This exten...
- FreeControl: Efficient, Training-Free Structural Control via One-Step Attention Extraction : Abstract: Controlling the spatial and semantic structure of diffusion-generated images remains a challenge. Existing methods like ControlNet rely on handcrafted condition maps and retraining, limiting...
- ADPretrain: Advancing Industrial Anomaly Detection via Anomaly Representation Pretraining : Abstract: The current mainstream and state-of-the-art anomaly detection (AD) methods are substantially established on pretrained feature networks yielded by ImageNet pretraining. However, regardless o...
- Automatic segmentation of colorectal liver metastases for ultrasound-based navigated resection : Abstract: Introduction: Accurate intraoperative delineation of colorectal liver metastases (CRLM) is crucial for achieving negative resection margins but remains challenging using intraoperative ultra...
- Cross-domain EEG-based Emotion Recognition with Contrastive Learning : Abstract: Electroencephalogram (EEG)-based emotion recognition is vital for affective computing but faces challenges in feature utilization and cross-domain generalization. This work introduces Emotio...
- Embedding-Space Data Augmentation to Prevent Membership Inference Attacks in Clinical Time Series Forecasting : Abstract: Balancing strong privacy guarantees with high predictive performance is critical for time series forecasting (TSF) tasks involving Electronic Health Records (EHR). In this study, we explore ...
- Attention and Compression is all you need for Controllably Efficient Language Models : Abstract: The quadratic cost of attention in transformers motivated the development of efficient approaches: namely sparse and sliding window attention, convolutions and linear attention. Although the...
- Turning Adversaries into Allies: Reversing Typographic Attacks for Multimodal E-Commerce Product Retrieval : Abstract: Multimodal product retrieval systems in e-commerce platforms rely on effectively combining visual and textual signals to improve search relevance and user experience. However, vision-languag...
- Learning Dynamics from Input-Output Data with Hamiltonian Gaussian Processes : Abstract: Embedding non-restrictive prior knowledge, such as energy conservation laws, in learning-based approaches is a key motive to construct physically consistent models from limited data, relevan...
- SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning : Abstract: Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fund...
- Diffusion-Based Electromagnetic Inverse Design of Scattering Structured Media : Abstract: We present a conditional diffusion model for electromagnetic inverse design that generates structured media geometries directly from target differential scattering cross-section profiles, by...
- Adversarially Robust Multitask Adaptive Control : Abstract: We study adversarially robust multitask adaptive linear quadratic control; a setting where multiple systems collaboratively learn control policies under model uncertainty and adversarial cor...
- Parameter-Efficient Conditioning for Material Generalization in Graph-Based Simulators : Abstract: Graph network-based simulators (GNS) have demonstrated strong potential for learning particle-based physics (such as fluids, deformable solids, and granular flows) while generalizing to unse...
- Synapse: Adaptive Arbitration of Complementary Expertise in Time Series Foundational Models : Abstract: Pre-trained Time Series Foundational Models (TSFMs) represent a significant advance, capable of forecasting diverse time series with complex characteristics, including varied seasonalities, ...
- SiamMM: A Mixture Model Perspective on Deep Unsupervised Learning : Abstract: Recent studies have demonstrated the effectiveness of clustering-based approaches for self-supervised and unsupervised learning. However, the application of clustering is often heuristic, an...
- Precipitation nowcasting of satellite data using physically conditioned neural networks : Abstract: Accurate short-term precipitation forecasts predominantly rely on dense weather-radar networks, limiting operational value in places most exposed to climate extremes. We present TUPANN (Tran...
- SoilX: Calibration-Free Comprehensive Soil Sensing Through Contrastive Cross-Component Learning : Abstract: Precision agriculture demands continuous and accurate monitoring of soil moisture (M) and key macronutrients, including nitrogen (N), phosphorus (P), and potassium (K), to optimize yields an...
- Generalization in Representation Models via Random Matrix Theory: Application to Recurrent Networks : Abstract: We first study the generalization error of models that use a fixed feature representation (frozen intermediate layers) followed by a trainable readout layer. This setting encompasses a range...
- Evaluating LLMs' Reasoning Over Ordered Procedural Steps : Abstract: Reasoning over procedural sequences, where the order of steps directly impacts outcomes, is a critical capability for large language models (LLMs). In this work, we study the task of reconst...
- Communication-Constrained Private Decentralized Online Personalized Mean Estimation : Abstract: We consider the problem of communication-constrained collaborative personalized mean estimation under a privacy constraint in an environment of several agents continuously receiving data acc...
- Machine Learning-Driven Analysis of kSZ Maps to Predict CMB Optical Depth $\tau$ : Abstract: Upcoming measurements of the kinetic Sunyaev-Zel'dovich (kSZ) effect, which results from Cosmic Microwave Background (CMB) photons scattering off moving electrons, offer a powerful probe of ...
- Blind Strong Gravitational Lensing Inversion: Joint Inference of Source and Lens Mass with Score-Based Models : Abstract: Score-based models can serve as expressive, data-driven priors for scientific inverse problems. In strong gravitational lensing, they enable posterior inference of a background galaxy from i...
- Trained on Tokens, Calibrated on Concepts: The Emergence of Semantic Calibration in LLMs : Abstract: Large Language Models (LLMs) often lack meaningful confidence estimates for their outputs. While base LLMs are known to exhibit next-token calibration, it remains unclear whether they can as...
- Prototype Selection Using Topological Data Analysis : Abstract: Recently, there has been an explosion in statistical learning literature to represent data using topological principles to capture structure and relationships. We propose a topological data ...
- Predicting Cognitive Assessment Scores in Older Adults with Cognitive Impairment Using Wearable Sensors : Abstract: Background and Objectives: This paper focuses on using AI to assess the cognitive function of older adults with mild cognitive impairment or mild dementia using physiological data provided b...
- Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning : Abstract: In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Tradi...
- Iterative Layer-wise Distillation for Efficient Compression of Large Language Models : Abstract: This work investigates distillation methods for large language models (LLMs) with the goal of developing compact models that preserve high performance. Several existing approaches are review...
- Early Alzheimer's Disease Detection from Retinal OCT Images: A UK Biobank Study : Abstract: Alterations in retinal layer thickness, measurable using Optical Coherence Tomography (OCT), have been associated with neurodegenerative diseases such as Alzheimer's disease (AD). While prev...
- Follow-Me in Micro-Mobility with End-to-End Imitation Learning : Abstract: Autonomous micro-mobility platforms face challenges from the perspective of the typical deployment environment: large indoor spaces or urban areas that are potentially crowded and highly dyn...
- A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights : Abstract: Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each...
- Another BRIXEL in the Wall: Towards Cheaper Dense Features : Abstract: Vision foundation models achieve strong performance on both global and locally dense downstream tasks. Pretrained on large images, the recent DINOv3 model family is able to produce very fine...
- A differentiable model of supply-chain shocks : Abstract: Modelling how shocks propagate in supply chains is an increasingly important challenge in economics. Its relevance has been highlighted in recent years by events such as Covid-19 and the Rus...
- Context-aware Learned Mesh-based Simulation via Trajectory-Level Meta-Learning : Abstract: Simulating object deformations is a critical challenge across many scientific domains, including robotics, manufacturing, and structural mechanics. Learned Graph Network Simulators (GNSs) of...
- TwinVLA: Data-Efficient Bimanual Manipulation with Twin Single-Arm Vision-Language-Action Models : Abstract: Vision-language-action models (VLAs) trained on large-scale robotic datasets have demonstrated strong performance on manipulation tasks, including bimanual tasks. However, because most publi...
- What's on Your Plate? Inferring Chinese Cuisine Intake from Wearable IMUs : Abstract: Accurate food intake detection is vital for dietary monitoring and chronic disease prevention. Traditional self-report methods are prone to recall bias, while camera-based approaches raise c...
- Language Generation and Identification From Partial Enumeration: Tight Density Bounds and Topological Characterizations : Abstract: The success of large language models (LLMs) has motivated formal theories of language generation and learning. We study the framework of \emph{language generation in the limit}, where an adv...
- Building Specialized Software-Assistant ChatBot with Graph-Based Retrieval-Augmented Generation : Abstract: Digital Adoption Platforms (DAPs) have become essential tools for helping employees navigate complex enterprise software such as CRM, ERP, or HRMS systems. Companies like LemonLearning have ...
- QUESTER: Query Specification for Generative Retrieval : Abstract: Generative Retrieval (GR) differs from the traditional index-then-retrieve pipeline by storing relevance in model parameters and directly generating document identifiers. However, GR often s...
- Steering Language Models with Weight Arithmetic : Abstract: Providing high-quality feedback to Large Language Models (LLMs) on a diverse training distribution can be difficult and expensive, and providing feedback only on a narrow distribution can re...
- How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need? : Abstract: Recent advances in 3D point cloud transformers have led to state-of-the-art results in tasks such as semantic segmentation and reconstruction. However, these models typically rely on dense t...
- A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher? : Abstract: Transformer-based language models of code have achieved state-of-the-art performance across a wide range of software analytics tasks, but their practical deployment remains limited due to hi...
- RNN(p) for Power Consumption Forecasting : Abstract: An elementary Recurrent Neural Network that operates on p time lags, called an RNN(p), is the natural generalisation of a linear autoregressive model ARX(p). It is a powerful forecasting too...
- A Multi-Stage Automated Online Network Data Stream Analytics Framework for IIoT Systems : Abstract: Industry 5.0 aims at maximizing the collaboration between humans and machines. Machines are capable of automating repetitive jobs, while humans handle creative tasks. As a critical component...
- Non-stationary Delayed Online Convex Optimization: From Full-information to Bandit Setting : Abstract: Although online convex optimization (OCO) under arbitrary delays has received increasing attention recently, previous studies focus on stationary environments with the goal of minimizing sta...
- A Closer Look at Deep Learning Methods on Tabular Datasets : Abstract: Tabular data is prevalent across diverse domains in machine learning. With the rapid progress of deep tabular prediction methods, especially pretrained (foundation) models, there is a growin...
- ExGra-Med: Extended Context Graph Alignment for Medical Vision-Language Models : Abstract: State-of-the-art medical multi-modal LLMs (med-MLLMs), such as LLaVA-Med and BioMedGPT, primarily depend on scaling model size and data volume, with training driven largely by autoregressive...
- P1-KAN: an effective Kolmogorov-Arnold network with application to hydraulic valley optimization : Abstract: A new Kolmogorov-Arnold network (KAN) is proposed to approximate potentially irregular functions in high dimensions. We provide error bounds for this approximation, assuming that the Kolmogo...
- LoKO: Low-Rank Kalman Optimizer for Online Fine-Tuning of Large Models : Abstract: Training large models with millions or even billions of parameters from scratch incurs substantial computational costs. Parameter Efficient Fine-Tuning (PEFT) methods, particularly Low-Rank ...
- ProFL: Performative Robust Optimal Federated Learning : Abstract: Performative prediction is a framework that captures distribution shifts that occur during the training of machine learning models due to their deployment. As the trained model is used, data...
- Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem : Abstract: Motivated by engineering applications such as resource allocation in networks and inventory systems, we consider average-reward Reinforcement Learning with unbounded state space and reward f...
- Multiplayer Federated Learning: Reaching Equilibrium with Less Communication : Abstract: Traditional Federated Learning (FL) approaches assume collaborative clients with aligned objectives working towards a shared global model. However, in many real-world scenarios, clients act ...
- Generating Computational Cognitive Models using Large Language Models : Abstract: Computational cognitive models, which formalize theories of cognition, enable researchers to quantify cognitive processes and arbitrate between competing theories by fitting models to behavi...
- Rethinking Approximate Gaussian Inference in Classification : Abstract: In classification tasks, softmax functions are ubiquitously used as output activations to produce predictive probabilities. Such outputs only capture aleatoric uncertainty. To capture episte...
- Two-stage hybrid models for enhancing forecasting accuracy on heterogeneous time series : Abstract: A time series forecasting model--which is typically built on a single time series--is known as a local time series model (tsLM). In contrast, a forecasting model trained on multiple time ser...
- Neural Attention: A Novel Mechanism for Enhanced Expressive Power in Transformer Models : Abstract: Transformer models typically calculate attention matrices using dot products, which have limitations when capturing nonlinear relationships between embedding vectors. We propose Neural Atten...
- Greedy Algorithm for Structured Bandits: A Sharp Characterization of Asymptotic Success / Failure : Abstract: We study the greedy (exploitation-only) algorithm in bandit problems with a known reward structure. We allow arbitrary finite reward structures, while prior work focused on a few specific on...
- Better Neural Network Expressivity: Subdividing the Simplex : Abstract: This work studies the expressivity of ReLU neural networks with a focus on their depth. A sequence of previous works showed that $\lceil \log_2(n+1) \rceil$ hidden layers are sufficient to c...
- When Are Concepts Erased From Diffusion Models? : Abstract: In concept erasure, a model is modified to selectively prevent it from generating a target concept. Despite the rapid development of new methods, it remains unclear how thoroughly these appr...
- Efficient and Unbiased Sampling from Boltzmann Distributions via Variance-Tuned Diffusion Models : Abstract: Score-based diffusion models (SBDMs) are powerful amortized samplers for Boltzmann distributions; however, imperfect score estimates bias downstream Monte Carlo estimates. Classical importan...
- FedFACT: A Provable Framework for Controllable Group-Fairness Calibration in Federated Learning : Abstract: With the emerging application of Federated Learning (FL) in decision-making scenarios, it is imperative to regulate model fairness to prevent disparities across sensitive groups (e.g., femal...
- Inference-Time Hyper-Scaling with KV Cache Compression : Abstract: Inference-time scaling trades efficiency for increased reasoning accuracy by generating longer or more parallel sequences. However, in Transformer LLMs, generation cost is bottlenecked by th...
- Neural at ArchEHR-QA 2025: Agentic Prompt Optimization for Evidence-Grounded Clinical Question Answering : Abstract: Automated question answering (QA) over electronic health records (EHRs) can bridge critical information gaps for clinicians and patients, yet it demands both precise evidence retrieval and f...
- Differentially Private Bilevel Optimization: Efficient Algorithms with Near-Optimal Rates : Abstract: Bilevel optimization, in which one optimization problem is nested inside another, underlies many machine learning applications with a hierarchical structure -- such as meta-learning and hype...
- Optimism Without Regularization: Constant Regret in Zero-Sum Games : Abstract: This paper studies the optimistic variant of Fictitious Play for learning in two-player zero-sum games. While it is known that Optimistic FTRL -- a regularized algorithm with a bounded steps...
- Diverse Mini-Batch Selection in Reinforcement Learning for Efficient Chemical Exploration in de novo Drug Design : Abstract: In many real-world applications, evaluating the quality of instances is costly and time-consuming, e.g., human feedback and physics simulations, in contrast to proposing new instances. In pa...
- Grounded Test-Time Adaptation for LLM Agents : Abstract: Large language model (LLM)-based agents struggle to generalize to novel and complex environments, such as unseen websites or new sets of functions, due to a fundamental mismatch between thei...
- SigmaDock: Untwisting Molecular Docking With Fragment-Based SE(3) Diffusion : Abstract: Determining the binding pose of a ligand to a protein, known as molecular docking, is a fundamental task in drug discovery. Generative approaches promise faster, improved, and more diverse p...
- Quantum Boltzmann Machines for Sample-Efficient Reinforcement Learning : Abstract: We introduce theoretically grounded Continuous Semi-Quantum Boltzmann Machines (CSQBMs) that supports continuous-action reinforcement learning. By combining exponential-family priors over vi...
- FoodRL: A Reinforcement Learning Ensembling Framework For In-Kind Food Donation Forecasting : Abstract: Food banks are crucial for alleviating food insecurity, but their effectiveness hinges on accurately forecasting highly volatile in-kind donations to ensure equitable and efficient resource ...
- Self-Interest and Systemic Benefits: Emergence of Collective Rationality in Mixed Autonomy Traffic Through Deep Reinforcement Learning : Abstract: Autonomous vehicles (AVs) are expected to be commercially available in the near future, leading to mixed autonomy traffic consisting of both AVs and human-driven vehicles (HVs). Although num...
- Multi-Agent Craftax: Benchmarking Open-Ended Multi-Agent Reinforcement Learning at the Hyperscale : Abstract: Progress in multi-agent reinforcement learning (MARL) requires challenging benchmarks that assess the limits of current methods. However, existing benchmarks often target narrow short-horizo...
- Efficient Swap Multicalibration of Elicitable Properties : Abstract: Multicalibration [HJKRR18] is an algorithmic fairness perspective that demands that the predictions of a predictor are correct conditional on themselves and membership in a collection of pot...
- Machine Learning Algorithms in Statistical Modelling Bridging Theory and Application : Abstract: It involves the completely novel ways of integrating ML algorithms with traditional statistical modelling that has changed the way we analyze data, do predictive analytics or make decisions ...
- Leak@$k$: Unlearning Does Not Make LLMs Forget Under Probabilistic Decoding : Abstract: Unlearning in large language models (LLMs) is critical for regulatory compliance and for building ethical generative AI systems that avoid producing private, toxic, illegal, or copyrighted c...
- Structural Properties, Cycloid Trajectories and Non-Asymptotic Guarantees of EM Algorithm for Mixed Linear Regression : Abstract: This work investigates the structural properties, cycloid trajectories, and non-asymptotic convergence guarantees of the Expectation-Maximization (EM) algorithm for two-component Mixed Linea...
- Risk Prediction of Cardiovascular Disease for Diabetic Patients with Machine Learning and Deep Learning Techniques : Abstract: Accurate prediction of cardiovascular disease (CVD) risk is crucial for healthcare institutions. This study addresses the growing prevalence of diabetes and its strong link to heart disease ...
- Less Is More: Generating Time Series with LLaMA-Style Autoregression in Simple Factorized Latent Spaces : Abstract: Generative models for multivariate time series are essential for data augmentation, simulation, and privacy preservation, yet current state-of-the-art diffusion-based approaches are slow and...
- Scaling Up ROC-Optimizing Support Vector Machines : Abstract: The ROC-SVM, originally proposed by Rakotomamonjy, directly maximizes the area under the ROC curve (AUC) and has become an attractive alternative of the conventional binary classification un...
- Unlocking the Black Box: A Five-Dimensional Framework for Evaluating Explainable AI in Credit Risk : Abstract: The financial industry faces a significant challenge modeling and risk portfolios: balancing the predictability of advanced machine learning models, neural network models, and explainability...
- Deep Progressive Training: scaling up depth capacity of zero/one-layer models : Abstract: Model depth is a double-edged sword in deep learning: deeper models achieve higher accuracy but require higher computational cost. To efficiently train models at scale, an effective strategy...
- Peptide2Mol: A Diffusion Model for Generating Small Molecules as Peptide Mimics for Targeted Protein Binding : Abstract: Structure-based drug design has seen significant advancements with the integration of artificial intelligence (AI), particularly in the generation of hit and lead compounds. However, most AI...
- Carbon Price Forecasting with Structural Breaks: A Comparative Study of Deep Learning Models : Abstract: Accurately forecasting carbon prices is essential for informed energy market decision-making, guiding sustainable energy planning, and supporting effective decarbonization strategies. Howeve...
- Usando LLMs para Programar Jogos de Tabuleiro e Varia\c{c}\~oes : Abstract: Creating programs to represent board games can be a time-consuming task. Large Language Models (LLMs) arise as appealing tools to expedite this process, given their capacity to efficiently g...
- QuAnTS: Question Answering on Time Series : Abstract: Text offers intuitive access to information. This can, in particular, complement the density of numerical time series, thereby allowing improved interactions with time series models to enhan...
- Consecutive Preferential Bayesian Optimization : Abstract: Preferential Bayesian optimization allows optimization of objectives that are either expensive or difficult to measure directly, by relying on a minimal number of comparative evaluations don...
- Multimodal Deep Learning for Prediction of Progression-Free Survival in Patients with Neuroendocrine Tumors Undergoing 177Lu-based Peptide Receptor Radionuclide Therapy : Abstract: Peptide receptor radionuclide therapy (PRRT) is an established treatment for metastatic neuroendocrine tumors (NETs), yet long-term disease control occurs only in a subset of patients. Predi...
- Associative Poisoning to Generative Machine Learning : Abstract: The widespread adoption of generative models such as Stable Diffusion and ChatGPT has made them increasingly attractive targets for malicious exploitation, particularly through data poisonin...
- Linear Gradient Prediction with Control Variates : Abstract: We propose a new way of training neural networks, with the goal of reducing training cost. Our method uses approximate predicted gradients instead of the full gradients that require an expen...
- ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy : Abstract: Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $\alpha$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with...
- The Causal Round Trip: Generating Authentic Counterfactuals by Eliminating Information Loss : Abstract: Judea Pearl's vision of Structural Causal Models (SCMs) as engines for counterfactual reasoning hinges on faithful abduction: the precise inference of latent exogenous noise. For decades, op...
- Rethinking Metrics and Diffusion Architecture for 3D Point Cloud Generation : Abstract: As 3D point clouds become a cornerstone of modern technology, the need for sophisticated generative models and reliable evaluation metrics has grown exponentially. In this work, we first exp...
- What Are the Facts? Automated Extraction of Court-Established Facts from Criminal-Court Opinions : Abstract: Criminal justice administrative data contain only a limited amount of information about the committed offense. However, there is an unused source of extensive information in continental Euro...
- Perceptually Aligning Representations of Music via Noise-Augmented Autoencoders : Abstract: We argue that training autoencoders to reconstruct inputs from noised versions of their encodings, when combined with perceptual losses, yields encodings that are structured according to a p...
- A multimodal multiplex of the mental lexicon for multilingual individuals : Abstract: Historically, bilingualism was often perceived as an additional cognitive load that could hinder linguistic and intellectual development. However, over the last three decades, this view has ...
- AI Literacy for Community Colleges: Instructors' Perspectives on Scenario-Based and Interactive Approaches to Teaching AI : Abstract: This research category full paper investigates how community college instructors evaluate interactive, no-code AI literacy resources designed for non-STEM learners. As artificial intelligenc...
- TeaRAG: A Token-Efficient Agentic Retrieval-Augmented Generation Framework : Abstract: Retrieval-Augmented Generation (RAG) utilizes external knowledge to augment Large Language Models' (LLMs) reliability. For flexibility, agentic RAG employs autonomous, multi-round retrieval ...
- AI Assisted AR Assembly: Object Recognition and Computer Vision for Augmented Reality Assisted Assembly : Abstract: We present an AI-assisted Augmented Reality assembly workflow that uses deep learning-based object recognition to identify different assembly components and display step-by-step instructions...
- Sample Complexity of Distributionally Robust Off-Dynamics Reinforcement Learning with Online Interaction : Abstract: Off-dynamics reinforcement learning (RL), where training and deployment transition dynamics are different, can be formulated as learning in a robust Markov decision process (RMDP) where unce...
- Robust Neural Audio Fingerprinting using Music Foundation Models : Abstract: The proliferation of distorted, compressed, and manipulated music on modern media platforms like TikTok motivates the development of more robust audio fingerprinting techniques to identify t...
- Multi-modal Loop Closure Detection with Foundation Models in Severely Unstructured Environments : Abstract: Robust loop closure detection is a critical component of Simultaneous Localization and Mapping (SLAM) algorithms in GNSS-denied environments, such as in the context of planetary exploration....
- ProDER: A Continual Learning Approach for Fault Prediction in Evolving Smart Grids : Abstract: As smart grids evolve to meet growing energy demands and modern operational challenges, the ability to accurately predict faults becomes increasingly critical. However, existing AI-based fau...
- "I Like That You Have to Poke Around": Instructors on How Experiential Approaches to AI Literacy Spark Inquiry and Critical Thinking : Abstract: As artificial intelligence (AI) increasingly shapes decision-making across domains, there is a growing need to support AI literacy among learners beyond computer science. However, many curre...
- APP: Accelerated Path Patching with Task-Specific Pruning : Abstract: Circuit discovery is a key step in many mechanistic interpretability pipelines. Current methods, such as Path Patching, are computationally expensive and have limited in-depth circuit analys...
- Self-adaptive weighting and sampling for physics-informed neural networks : Abstract: Physics-informed deep learning has emerged as a promising framework for solving partial differential equations (PDEs). Nevertheless, training these models on complex problems remains challen...
- SWE-Compass: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models : Abstract: Evaluating large language models (LLMs) for software engineering has been limited by narrow task coverage, language bias, and insufficient alignment with real-world developer workflows. Exis...
- AI Literacy Assessment Revisited: A Task-Oriented Approach Aligned with Real-world Occupations : Abstract: As artificial intelligence (AI) systems become ubiquitous in professional contexts, there is an urgent need to equip workers, often with backgrounds outside of STEM, with the skills to use t...
- On Flow Matching KL Divergence : Abstract: We derive a deterministic, non-asymptotic upper bound on the Kullback-Leibler (KL) divergence of the flow-matching distribution approximation. In particular, if the $L_2$ flow-matching loss ...
- DGTN: Graph-Enhanced Transformer with Diffusive Attention Gating Mechanism for Enzyme DDG Prediction : Abstract: Predicting the effect of amino acid mutations on enzyme thermodynamic stability (DDG) is fundamental to protein engineering and drug design. While recent deep learning approaches have shown ...
- TimeSearch-R: Adaptive Temporal Search for Long-Form Video Understanding via Self-Verification Reinforcement Learning : Abstract: Temporal search aims to identify a minimal set of relevant frames from tens of thousands based on a given query, serving as a foundation for accurate long-form video understanding. Existing ...
- Joint Verification and Refinement of Language Models for Safety-Constrained Planning : Abstract: Large language models possess impressive capabilities in generating programs (e.g., Python) from natural language descriptions to execute robotic tasks. However, these generated programs oft...
- Retrieval Augmented Diffusion Model for Structure-informed Antibody Design and Optimization : Abstract: Antibodies are essential proteins responsible for immune responses in organisms, capable of specifically recognizing antigen molecules of pathogens. Recent advances in generative models have...
- AI Through the Human Lens: Investigating Cognitive Theories in Machine Psychology : Abstract: We investigate whether Large Language Models (LLMs) exhibit human-like cognitive patterns under four established frameworks from psychology: Thematic Apperception Test (TAT), Framing Bias, M...
- Introducing LongCat-Flash-Thinking: A Technical Report : Abstract: We present LongCat-Flash-Thinking, an efficient 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model. Its advanced capabilities are cultivated through a meticulously cr...
- The Less Intelligent the Elements, the More Intelligent the Whole. Or, Possibly Not? : Abstract: The agent-based modelling community has a debate on how ``intelligent'' artificial agents should be, and in what ways their local intelligence relates to the emergence of a collective intell...
- Learning for Interval Prediction of Electricity Demand: A Cluster-based Bootstrapping Approach : Abstract: Accurate predictions of electricity demands are necessary for managing operations in a small aggregation load setting like a Microgrid. Due to low aggregation, the electricity demands can be...
- Characterizing the Training Dynamics of Private Fine-tuning with Langevin diffusion : Abstract: We show that differentially private full fine-tuning (DP-FFT) can distort pre-trained backbone features based on both theoretical and empirical results. We identify the cause of the distorti...
- Tactical Decision Making for Autonomous Trucks by Deep Reinforcement Learning with Total Cost of Operation Based Reward : Abstract: We develop a deep reinforcement learning framework for tactical decision making in an autonomous truck, specifically for Adaptive Cruise Control (ACC) and lane change maneuvers in a highway ...
- FunOTTA: On-the-Fly Adaptation on Cross-Domain Fundus Image via Stable Test-time Training : Abstract: Fundus images are essential for the early screening and detection of eye diseases. While deep learning models using fundus images have significantly advanced the diagnosis of multiple eye di...
- Affordance-based Robot Manipulation with Flow Matching : Abstract: We present a framework for assistive robot manipulation, which focuses on two fundamental challenges: first, efficiently adapting large-scale models to downstream scene affordance understand...
- CTPD: Cross-Modal Temporal Pattern Discovery for Enhanced Multimodal Electronic Health Records Analysis : Abstract: Integrating multimodal Electronic Health Records (EHR) data, such as numerical time series and free-text clinical reports, has great potential in predicting clinical outcomes. However, prior...
- TOBUGraph: Knowledge Graph-Based Retrieval for Enhanced LLM Performance Beyond RAG : Abstract: Retrieval-Augmented Generation (RAG) is one of the leading and most widely used techniques for enhancing LLM retrieval capabilities, but it still faces significant limitations in commercial ...
- Cognitive Edge Computing: A Comprehensive Survey on Optimizing Large Models and AI Agents for Pervasive Deployment : Abstract: This article surveys Cognitive Edge Computing as a practical and methodical pathway for deploying reasoning-capable Large Language Models (LLMs) and autonomous AI agents on resource-constrai...
- MMDocIR: Benchmarking Multimodal Retrieval for Long Documents : Abstract: Multimodal document retrieval aims to identify and retrieve various forms of multimodal content, such as figures, tables, charts, and layout information from extensive documents. Despite its...
- iTool: Reinforced Fine-Tuning with Dynamic Deficiency Calibration for Advanced Tool Use : Abstract: Augmenting large language models (LLMs) with external tools is a promising approach to enhance their capabilities, especially for complex tasks. Synthesizing tool-use data through real-world...
- Activation-Informed Merging of Large Language Models : Abstract: Model merging, a method that combines the parameters and embeddings of multiple fine-tuned large language models (LLMs), offers a promising approach to enhance model performance across vario...
- Analyzing limits for in-context learning : Abstract: Our paper challenges claims from prior research that transformer-based models, when learning in context, implicitly implement standard learning algorithms. We present empirical evidence inco...
- InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback : Abstract: Existing benchmarks do not test Large Multimodal Models (LMMs) on their interactive intelligence with human users, which is vital for developing general-purpose AI assistants. We design Inte...
- Large language models as uncertainty-calibrated optimizers for experimental discovery : Abstract: Scientific discovery increasingly depends on efficient experimental optimization to navigate vast design spaces under time and resource constraints. Traditional approaches often require exte...
- Pogobot: an Open-Source, Low-Cost Robot for Swarm Robotics and Programmable Active Matter : Abstract: This paper describes the Pogobot, an open-source platform specifically designed for research at the interface of swarm robotics and active matter. Pogobot features vibration-based or wheel-b...
- Taskmaster Deconstructed: A Quantitative Look at Tension, Volatility, and Viewer Ratings : Abstract: Taskmaster is a British television show that combines comedic performance with a formal scoring system. Despite the appearance of structured competition, it remains unclear whether scoring d...
- Optimizing Anytime Reasoning via Budget Relative Policy Optimization : Abstract: Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a ...
- RCCDA: Adaptive Model Updates in the Presence of Concept Drift under a Constrained Resource Budget : Abstract: Machine learning (ML) algorithms deployed in real-world environments are often faced with the challenge of adapting models to concept drift, where the task data distributions are shifting ov...
- Learning of Population Dynamics: Inverse Optimization Meets JKO Scheme : Abstract: Learning population dynamics involves recovering the underlying process that governs particle evolution, given evolutionary snapshots of samples at discrete time points. Recent methods frame...
- Conformal Prediction Adaptive to Unknown Subpopulation Shifts : Abstract: Conformal prediction is widely used to equip black-box machine learning models with uncertainty quantification, offering formal coverage guarantees under exchangeable data. However, these gu...
- Know What You Don't Know: Uncertainty Calibration of Process Reward Models : Abstract: Process reward models (PRMs) play a central role in guiding inference-time scaling algorithms for large language models (LLMs). However, we observe that even state-of-the-art PRMs can be poo...
- Progressive Inference-Time Annealing of Diffusion Models for Sampling from Boltzmann Densities : Abstract: Sampling efficiently from a target unnormalized probability density remains a core challenge, with relevance across countless high-impact scientific applications. A promising approach toward...
- Less Greedy Equivalence Search : Abstract: Greedy Equivalence Search (GES) is a classic score-based algorithm for causal discovery from observational data. In the sample limit, it recovers the Markov equivalence class of graphs that ...
- Conformal Information Pursuit for Interactively Guiding Large Language Models : Abstract: A significant use case of instruction-finetuned Large Language Models (LLMs) is to solve question-answering tasks interactively. In this setting, an LLM agent is tasked with making a predict...
- Graph Learning : Abstract: Graph learning has rapidly evolved into a critical subfield of machine learning and artificial intelligence (AI). Its development began with early graph-theoretic methods, gaining significan...
- NMIXX: Domain-Adapted Neural Embeddings for Cross-Lingual eXploration of Finance : Abstract: General-purpose sentence embedding models often struggle to capture specialized financial semantics, especially in low-resource languages like Korean, due to domain-specific jargon, temporal...
- Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models : Abstract: Advances in 3D generative AI have enabled the creation of physical objects from text prompts, but challenges remain in creating objects involving multiple component types. We present a pipel...
- AWEMixer: Adaptive Wavelet-Enhanced Mixer Network for Long-Term Time Series Forecasting : Abstract: Forecasting long-term time series in IoT environments remains a significant challenge due to the non-stationary and multi-scale characteristics of sensor signals. Furthermore, error accumula...
- Regularized GLISp for sensor-guided human-in-the-loop optimization : Abstract: Human-in-the-loop calibration is often addressed via preference-based optimization, where algorithms learn from pairwise comparisons rather than explicit cost evaluations. While effective, m...
- When Data Falls Short: Grokking Below the Critical Threshold : Abstract: In this paper, we investigate the phenomenon of grokking, where models exhibit delayed generalization following overfitting on training data. We focus on data-scarce regimes where the number...
- FuseFlow: A Fusion-Centric Compilation Framework for Sparse Deep Learning on Streaming Dataflow : Abstract: As deep learning models scale, sparse computation and specialized dataflow hardware have emerged as powerful solutions to address efficiency. We propose FuseFlow, a compiler that converts sp...
- SLOFetch: Compressed-Hierarchical Instruction Prefetching for Cloud Microservices : Abstract: Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and en...
- Conditional Neural ODE for Longitudinal Parkinson's Disease Progression Forecasting : Abstract: Parkinson's disease (PD) shows heterogeneous, evolving brain-morphometry patterns. Modeling these longitudinal trajectories enables mechanistic insight, treatment development, and individual...
- DuetServe: Harmonizing Prefill and Decode for LLM Serving via Adaptive GPU Multiplexing : Abstract: Modern LLM serving systems must sustain high throughput while meeting strict latency SLOs across two distinct inference phases: compute-intensive prefill and memory-bound decode phases. Exis...
- Simplex-FEM Networks (SiFEN): Learning A Triangulated Function Approximator : Abstract: We introduce Simplex-FEM Networks (SiFEN), a learned piecewise-polynomial predictor that represents f: R^d -> R^k as a globally C^r finite-element field on a learned simplicial mesh in an op...
- Autoencoding Dynamics: Topological Limitations and Capabilities : Abstract: Given a "data manifold" $M\subset \mathbb{R}^n$ and "latent space" $\mathbb{R}^\ell$, an autoencoder is a pair of continuous maps consisting of an "encoder" $E\colon \mathbb{R}^n\to \mathbb{...
- Sharp Minima Can Generalize: A Loss Landscape Perspective On Data : Abstract: The volume hypothesis suggests deep learning is effective because it is likely to find flat minima due to their large volumes, and flat minima generalize well. This picture does not explain ...
- Persistent reachability homology in machine learning applications : Abstract: We explore the recently introduced persistent reachability homology (PRH) of digraph data, i.e. data in the form of directed graphs. In particular, we study the effectiveness of PRH in netwo...
- SPECTRA: Spectral Target-Aware Graph Augmentation for Imbalanced Molecular Property Regression : Abstract: In molecular property prediction, the most valuable compounds (e.g., high potency) often occupy sparse regions of the target space. Standard Graph Neural Networks (GNNs) commonly optimize fo...
- Sublinear iterations can suffice even for DDPMs : Abstract: SDE-based methods such as denoising diffusion probabilistic models (DDPMs) have shown remarkable success in real-world sample generation tasks. Prior analyses of DDPMs have been focused on t...
- Investigating U.S. Consumer Demand for Food Products with Innovative Transportation Certificates Based on Stated Preferences and Machine Learning Approaches : Abstract: This paper utilizes a machine learning model to estimate the consumer's behavior for food products with innovative transportation certificates in the U.S. Building on previous research that ...
- A hybrid solution approach for the Integrated Healthcare Timetabling Competition 2024 : Abstract: We report about the algorithm, implementation and results submitted to the Integrated Healthcare Timetabling Competition 2024 by Team Twente, which scored third in the competition. Our appro...
- Epistemic Reject Option Prediction : Abstract: In high-stakes applications, predictive models must not only produce accurate predictions but also quantify and communicate their uncertainty. Reject-option prediction addresses this by allo...
- DMA: Online RAG Alignment with Human Feedback : Abstract: Retrieval-augmented generation (RAG) systems often rely on static retrieval, limiting adaptation to evolving intent and content drift. We introduce Dynamic Memory Alignment (DMA), an online ...
- Real-Time Reasoning Agents in Evolving Environments : Abstract: Agents in the real world must make not only logical but also timely judgments. This requires continuous awareness of the dynamic environment: hazards emerge, opportunities arise, and other a...
- ORCHID: Orchestrated Retrieval-Augmented Classification with Human-in-the-Loop Intelligent Decision-Making for High-Risk Property : Abstract: High-Risk Property (HRP) classification is critical at U.S. Department of Energy (DOE) sites, where inventories include sensitive and often dual-use equipment. Compliance must track evolving...
- Autonomous generation of different courses of action in mechanized combat operations : Abstract: In this paper, we propose a methodology designed to support decision-making during the execution phase of military ground combat operations, with a focus on one's actions. This methodology g...
- Cleaning Maintenance Logs with LLM Agents for Improved Predictive Maintenance : Abstract: Economic constraints, limited availability of datasets for reproducibility and shortages of specialized expertise have long been recognized as key challenges to the adoption and advancement ...
- Reasoning Is All You Need for Urban Planning AI : Abstract: AI has proven highly successful at urban planning analysis -- learning patterns from data to predict future conditions. The next frontier is AI-assisted decision-making: agents that recommen...
- Efficient Deployment of CNN Models on Multiple In-Memory Computing Units : Abstract: In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computatio...
- AI-Powered Citation Auditing: A Zero-Assumption Protocol for Systematic Reference Verification in Academic Research : Abstract: Academic citation integrity faces persistent challenges, with research indicating 20% of citations contain errors and manual verification requiring months of expert time. This paper presents...
- Stateful KV Cache Management for LLMs: Balancing Space, Time, Accuracy, and Positional Fidelity : Abstract: The Key-Value (KV) cache is integral to efficient autoregressive inference in large language models (LLMs), yet its unbounded growth in stateful multi-turn scenarios presents major challenge...
- Adaptive Testing for LLM Evaluation: A Psychometric Alternative to Static Benchmarks : Abstract: Large language model evaluation requires thousands of benchmark items, making evaluations expensive and slow. Existing methods compute average accuracy across fixed item sets, treating all i...
- A Penny for Your Thoughts: Decoding Speech from Inexpensive Brain Signals : Abstract: We explore whether neural networks can decode brain activity into speech by mapping EEG recordings to audio representations. Using EEG data recorded as subjects listened to natural speech, w...
- Reasoning Up the Instruction Ladder for Controllable Language Models : Abstract: As large language model (LLM) based systems take on high-stakes roles in real-world decision-making, they must reconcile competing instructions from multiple sources (e.g., model developers,...
- EncouRAGe: Evaluating RAG Local, Fast, and Reliable : Abstract: We introduce EncouRAGe, a comprehensive Python framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using Large Language Models (LL...
- Simulating Misinformation Vulnerabilities With Agent Personas : Abstract: Disinformation campaigns can distort public perception and destabilize institutions. Understanding how different populations respond to information is crucial for designing effective interve...
- multiMentalRoBERTa: A Fine-tuned Multiclass Classifier for Mental Health Disorder : Abstract: The early detection of mental health disorders from social media text is critical for enabling timely support, risk assessment, and referral to appropriate resources. This work introduces mu...
- Separate the Wheat from the Chaff: Winnowing Down Divergent Views in Retrieval Augmented Generation : Abstract: Retrieval-augmented generation (RAG) enhances large language models (LLMs) by integrating external knowledge sources to address their limitations in accessing up-to-date or specialized infor...
- Measuring what Matters: Construct Validity in Large Language Model Benchmarks : Abstract: Evaluating large language models (LLMs) is crucial for both assessing their capabilities and identifying safety or robustness issues prior to deployment. Reliably measuring abstract and comp...
- POLIS-Bench: Towards Multi-Dimensional Evaluation of LLMs for Bilingual Policy Tasks in Governmental Scenarios : Abstract: We introduce POLIS-Bench, the first rigorous, systematic evaluation suite designed for LLMs operating in governmental bilingual policy scenarios. Compared to existing benchmarks, POLIS-Bench...
- Prioritize Economy or Climate Action? Investigating ChatGPT Response Differences Based on Inferred Political Orientation : Abstract: Large Language Models (LLMs) distinguish themselves by quickly delivering information and providing personalized responses through natural language prompts. However, they also infer user dem...
- Jailbreaking in the Haystack : Abstract: Recent advances in long-context language models (LMs) have enabled million-token inputs, expanding their capabilities across complex tasks like computer-use agents. Yet, the safety implicati...
- SWAP: Towards Copyright Auditing of Soft Prompts via Sequential Watermarking : Abstract: Large-scale vision-language models, especially CLIP, have demonstrated remarkable performance across diverse downstream tasks. Soft prompts, as carefully crafted modules that efficiently ada...
- First is Not Really Better Than Last: Evaluating Layer Choice and Aggregation Strategies in Language Model Data Influence Estimation : Abstract: Identifying how training samples influence/impact Large Language Model (LLM) decision-making is essential for effectively interpreting model decisions and auditing large-scale datasets. Curr...
- P-MIA: A Profiled-Based Membership Inference Attack on Cognitive Diagnosis Models : Abstract: Cognitive diagnosis models (CDMs) are pivotal for creating fine-grained learner profiles in modern intelligent education platforms. However, these models are trained on sensitive student dat...
- Ada-FCN: Adaptive Frequency-Coupled Network for fMRI-Based Brain Disorder Classification : Abstract: Resting-state fMRI has become a valuable tool for classifying brain disorders and constructing brain functional connectivity networks by tracking BOLD signals across brain regions. However...
- Learning to reason about rare diseases through retrieval-augmented agents : Abstract: Rare diseases represent the long tail of medical imaging, where AI models often fail due to the scarcity of representative training data. In clinical workflows, radiologists frequently consu...
- Temporal convolutional and fusional transformer model with Bi-LSTM encoder-decoder for multi-time-window remaining useful life prediction : Abstract: Health prediction is crucial for ensuring reliability, minimizing downtime, and optimizing maintenance in industrial systems. Remaining Useful Life (RUL) prediction is a key component of thi...
- IndicVisionBench: Benchmarking Cultural and Multilingual Understanding in VLMs : Abstract: Vision-language models (VLMs) have demonstrated impressive generalization across multimodal tasks, yet most evaluation benchmarks remain Western-centric, leaving open questions about their p...
- Trustworthiness Calibration Framework for Phishing Email Detection Using Large Language Models : Abstract: Phishing emails continue to pose a persistent challenge to online communication, exploiting human trust and evading automated filters through realistic language and adaptive tactics. While l...
- Knowledge-based anomaly detection for identifying network-induced shape artifacts : Abstract: Synthetic data provides a promising approach to address data scarcity for training machine learning models; however, adoption without proper quality assessments may introduce artifacts, dist...
- CPO: Condition Preference Optimization for Controllable Image Generation : Abstract: To enhance controllability in text-to-image generation, ControlNet introduces image-based control signals, while ControlNet++ improves pixel-level cycle consistency between generated images ...
- ScheduleStream: Temporal Planning with Samplers for GPU-Accelerated Multi-Arm Task and Motion Planning & Scheduling : Abstract: Bimanual and humanoid robots are appealing because of their human-like ability to leverage multiple arms to efficiently complete tasks. However, controlling multiple arms at once is computat...
- Causal Structure and Representation Learning with Biomedical Applications : Abstract: Massive data collection holds the promise of a better understanding of complex phenomena and, ultimately, better decisions. Representation learning has become a key driver of deep learning a...
- MDM: Manhattan Distance Mapping of DNN Weights for Parasitic-Resistance-Resilient Memristive Crossbars : Abstract: Manhattan Distance Mapping (MDM) is a post-training deep neural network (DNN) weight mapping technique for memristive bit-sliced compute-in-memory (CIM) crossbars that reduces parasitic resi...
- Data Efficiency and Transfer Robustness in Biomedical Image Segmentation: A Study of Redundancy and Forgetting with Cellpose : Abstract: Generalist biomedical image segmentation models such as Cellpose are increasingly applied across diverse imaging modalities and cell types. However, two critical challenges remain underexplo...
- PuzzleMoE: Efficient Compression of Large Mixture-of-Experts Models via Sparse Expert Merging and Bit-packed inference : Abstract: Mixture-of-Experts (MoE) models have shown strong potential in scaling language models efficiently by activating only a small subset of experts per input. However, their widespread deploymen...
- An Active Learning Pipeline for Biomedical Image Instance Segmentation with Minimal Human Intervention : Abstract: Biomedical image segmentation is critical for precise structure delineation and downstream analysis. Traditional methods often struggle with noisy data, while deep learning models such as U-...
- Unified Multimodal Diffusion Forcing for Forceful Manipulation : Abstract: Given a dataset of expert trajectories, standard imitation learning approaches typically learn a direct mapping from observations (e.g., RGB images) to actions. However, such methods often o...
- A Standardized Benchmark for Multilabel Antimicrobial Peptide Classification : Abstract: Antimicrobial peptides have emerged as promising molecules to combat antimicrobial resistance. However, fragmented datasets, inconsistent annotations, and the lack of standardized benchmarks...
- Isaac Lab: A GPU-Accelerated Simulation Framework for Multi-Modal Robot Learning : Abstract: We present Isaac Lab, the natural successor to Isaac Gym, which extends the paradigm of GPU-native robotics simulation into the era of large-scale multi-modal learning. Isaac Lab combines hi...
- Prompt-Based Safety Guidance Is Ineffective for Unlearned Text-to-Image Diffusion Models : Abstract: Recent advances in text-to-image generative models have raised concerns about their potential to produce harmful content when provided with malicious input text prompts. To address this issu...
- Software Defined Vehicle Code Generation: A Few-Shot Prompting Approach : Abstract: The emergence of Software-Defined Vehicles (SDVs) marks a paradigm shift in the automotive industry, where software now plays a pivotal role in defining vehicle functionality, enabling rapid...
- Minimal and Mechanistic Conditions for Behavioral Self-Awareness in LLMs : Abstract: Recent studies have revealed that LLMs can exhibit behavioral self-awareness: the ability to accurately describe or predict their own learned behaviors without explicit supervision. This cap...
- Beta Distribution Learning for Reliable Roadway Crash Risk Assessment : Abstract: Roadway traffic accidents represent a global health crisis, responsible for over a million deaths annually and costing many countries up to 3% of their GDP. Traditional traffic safety studie...
- You Need Reasoning to Learn Reasoning: The Limitations of Label-Free RL in Weak Base Models : Abstract: Recent advances in large language models have demonstrated the promise of unsupervised reinforcement learning (RL) methods for enhancing reasoning capabilities without external supervision. ...
- A Dual Perspective on Decision-Focused Learning: Scalable Training via Dual-Guided Surrogates : Abstract: Many real-world decisions are made under uncertainty by solving optimization problems using predicted quantities. This predict-then-optimize paradigm has motivated decision-focused learning,...
- MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages : Abstract: We present MERaLiON-SER, a robust speech emotion recognition model de- signed for English and Southeast Asian languages. The model is trained using a hybrid objective combining weighted cate...
- BudgetMem: Learning Selective Memory Policies for Cost-Efficient Long-Context Processing in Language Models : Abstract: Large Language Models (LLMs) face significant computational and memory constraints when processing long contexts, despite growing demand for applications requiring reasoning over extensive d...
- Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG : Abstract: Retrieval systems are essential to contemporary AI pipelines, although most confuse two separate processes: finding relevant information and giving enough context for reasoning. We introduce...
- A benchmark multimodal oro-dental dataset for large vision-language models : Abstract: The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, w...
- DeepForgeSeal: Latent Space-Driven Semi-Fragile Watermarking for Deepfake Detection Using Multi-Agent Adversarial Reinforcement Learning : Abstract: Rapid advances in generative AI have led to increasingly realistic deepfakes, posing growing challenges for law enforcement and public trust. Existing passive deepfake detectors struggle to ...
- Too Good to be Bad: On the Failure of LLMs to Role-Play Villains : Abstract: Large Language Models (LLMs) are increasingly tasked with creative generation, including the simulation of fictional characters. However, their ability to portray non-prosocial, antagonistic...
- Pattern-Aware Diffusion Synthesis of fMRI/dMRI with Tissue and Microstructural Refinement : Abstract: Magnetic resonance imaging (MRI), especially functional MRI (fMRI) and diffusion MRI (dMRI), is essential for studying neurodegenerative diseases. However, missing modalities pose a major ba...
- Learning Fourier shapes to probe the geometric world of deep neural networks : Abstract: While both shape and texture are fundamental to visual recognition, research on deep neural networks (DNNs) has predominantly focused on the latter, leaving their geometric understanding poo...
- Enhancing Public Speaking Skills in Engineering Students Through AI : Abstract: This research-to-practice full paper was inspired by the persistent challenge in effective communication among engineering students. Public speaking is a necessary skill for future engineers...
- BiPETE: A Bi-Positional Embedding Transformer Encoder for Risk Assessment of Alcohol and Substance Use Disorder with Electronic Health Records : Abstract: Transformer-based deep learning models have shown promise for disease risk prediction using electronic health records(EHRs), but modeling temporal dependencies remains a key challenge due to...
- Query Generation Pipeline with Enhanced Answerability Assessment for Financial Information Retrieval : Abstract: As financial applications of large language models (LLMs) gain attention, accurate Information Retrieval (IR) remains crucial for reliable AI services. However, existing benchmarks fail to c...
- Multi-agent Coordination via Flow Matching : Abstract: This work presents MAC-Flow, a simple yet expressive framework for multi-agent coordination. We argue that requirements of effective coordination are twofold: (i) a rich representation of th...
- Pluralistic Behavior Suite: Stress-Testing Multi-Turn Adherence to Custom Behavioral Policies : Abstract: Large language models (LLMs) are typically aligned to a universal set of safety and usage principles intended for broad public acceptability. Yet, real-world applications of LLMs often take ...
- 8bit-GPT: Exploring Human-AI Interaction on Obsolete Macintosh Operating Systems : Abstract: The proliferation of assistive chatbots offering efficient, personalized communication has driven widespread over-reliance on them for decision-making, information-seeking and everyday tasks...
- OvA-LP: A Simple and Efficient Framework for Federated Learning on Non-IID Data : Abstract: Federated fine-tuning (FFT) adapts foundation models to decentralized data but remains fragile under heterogeneous client distributions due to local drift, i.e., client-level update divergen...
- Dynamic Residual Encoding with Slide-Level Contrastive Learning for End-to-End Whole Slide Image Representation : Abstract: Whole Slide Image (WSI) representation is critical for cancer subtyping, cancer recognition and mutation prediction.Training an end-to-end WSI representation model poses significant challeng...
- PECL: A Heterogeneous Parallel Multi-Domain Network for Radar-Based Human Activity Recognition : Abstract: Radar systems are increasingly favored for medical applications because they provide non-intrusive monitoring with high privacy and robustness to lighting conditions. However, existing resea...
- UA-Code-Bench: A Competitive Programming Benchmark for Evaluating LLM Code Generation in Ukrainian : Abstract: Evaluating the real capabilities of large language models in low-resource languages still represents a challenge, as many existing benchmarks focus on widespread tasks translated from Englis...
- Accelerating HDC-CNN Hybrid Models Using Custom Instructions on RISC-V GPUs : Abstract: Machine learning based on neural networks has advanced rapidly, but the high energy consumption required for training and inference remains a major challenge. Hyperdimensional Computing (HDC...
- No Pose Estimation? No Problem: Pose-Agnostic and Instance-Aware Test-Time Adaptation for Monocular Depth Estimation : Abstract: Monocular depth estimation (MDE), inferring pixel-level depths in single RGB images from a monocular camera, plays a crucial and pivotal role in a variety of AI applications demanding a thre...
- Deep learning models are vulnerable, but adversarial examples are even more vulnerable : Abstract: Understanding intrinsic differences between adversarial examples and clean samples is key to enhancing DNN robustness and detection against adversarial attacks. This study first empirically ...
- DL101 Neural Network Outputs and Loss Functions : Abstract: The loss function used to train a neural network is strongly connected to its output layer from a statistical point of view. This technical report analyzes common activation functions for a ...
- From Linear Probing to Joint-Weighted Token Hierarchy: A Foundation Model Bridging Global and Cellular Representations in Biomarker Detection : Abstract: AI-based biomarkers can infer molecular features directly from hematoxylin & eosin (H&E) slides, yet most pathology foundation models (PFMs) rely on global patch-level embeddings and overloo...
- SmartSecChain-SDN: A Blockchain-Integrated Intelligent Framework for Secure and Efficient Software-Defined Networks : Abstract: With more and more existing networks being transformed to Software-Defined Networking (SDN), they need to be more secure and demand smarter ways of traffic control. This work, SmartSecChain-...
- Generating Software Architecture Description from Source Code using Reverse Engineering and Large Language Model : Abstract: Software Architecture Descriptions (SADs) are essential for managing the inherent complexity of modern software systems. They enable high-level architectural reasoning, guide design decision...
- Model Merging Improves Zero-Shot Generalization in Bioacoustic Foundation Models : Abstract: Foundation models capable of generalizing across species and tasks represent a promising new frontier in bioacoustics, with NatureLM being one of the most prominent examples. While its domai...
- No One-Model-Fits-All: Uncovering Spatio-Temporal Forecasting Trade-offs with Graph Neural Networks and Foundation Models : Abstract: Modern IoT deployments for environmental sensing produce high volume spatiotemporal data to support downstream tasks such as forecasting, typically powered by machine learning models. While ...
- 4D3R: Motion-Aware Neural Reconstruction and Rendering of Dynamic Scenes from Monocular Videos : Abstract: Novel view synthesis from monocular videos of dynamic scenes with unknown camera poses remains a fundamental challenge in computer vision and graphics. While recent advances in 3D representa...
- Accurate online action and gesture recognition system using detectors and Deep SPD Siamese Networks : Abstract: Online continuous motion recognition is a hot topic of research since it is more practical in real life application cases. Recently, Skeleton-based approaches have become increasingly popula...
- A Gate-Based Quantum Genetic Algorithm for Real-Valued Global Optimization : Abstract: We propose a gate-based Quantum Genetic Algorithm (QGA) for real-valued global optimization. In this model, individuals are represented by quantum circuits whose measurement outcomes are dec...
- OregairuChar: A Benchmark Dataset for Character Appearance Frequency Analysis in My Teen Romantic Comedy SNAFU : Abstract: The analysis of character appearance frequency is essential for understanding narrative structure, character prominence, and story progression in anime. In this work, we introduce OregairuCh...
- An End-to-End Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drones : Abstract: The emergence of truck-drone collaborative systems in last-mile logistics has positioned the Traveling Salesman Problem with Drones (TSP-D) as a pivotal extension of classical routing optimi...
- Integrating Score-Based Diffusion Models with Machine Learning-Enhanced Localization for Advanced Data Assimilation in Geological Carbon Storage : Abstract: Accurate characterization of subsurface heterogeneity is important for the safe and effective implementation of geological carbon storage (GCS) projects. This paper explores how machine lear...
- TAMAS: Benchmarking Adversarial Risks in Multi-Agent LLM Systems : Abstract: Large Language Models (LLMs) have demonstrated strong capabilities as autonomous agents through tool use, planning, and decision-making abilities, leading to their widespread adoption across...
- DeepEyesV2: Toward Agentic Multimodal Model : Abstract: Agentic multimodal models should not only comprehend text and images, but also actively invoke external tools, such as code execution environments and web search, and integrate these operati...
- LiveStar: Live Streaming Assistant for Real-World Online Video Understanding : Abstract: Despite significant progress in Video Large Language Models (Video-LLMs) for offline video understanding, existing online Video-LLMs typically struggle to simultaneously process continuous f...
Research Sources: 344 | Generated: 11/10/2025
