AI RESEARCH PAPERS & ACADEMIC SOURCES
- SuperFlow: Training Flow Matching Models with RL on the Fly : Abstract: Recent progress in flow-based generative models and reinforcement learning (RL) has improved text-image alignment and visual quality. However, current RL training for flow models still has t...
- Pengembangan Model untuk Mendeteksi Kerusakan pada Terumbu Karang dengan Klasifikasi Citra : Abstract: The rich biodiversity of coral reefs in Indonesian waters represents a valuable asset that must be preserved. Rapid climate change and uncontrolled human activities have caused significant d...
- SpectralKAN: Weighted Activation Distribution Kolmogorov-Arnold Network for Hyperspectral Image Change Detection : Abstract: Kolmogorov-Arnold networks (KANs) represent data features by learning the activation functions and demonstrate superior accuracy with fewer parameters, FLOPs, GPU memory usage (Memory), shor...
- HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models : Abstract: High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities. To reduce the training and computation costs c...
- TLRN: Temporal Latent Residual Networks For Large Deformation Image Registration : Abstract: This paper presents a novel approach, termed {\em Temporal Latent Residual Network (TLRN)}, to predict a sequence of deformation fields in time-series image registration. The challenge of re...
- TRASE: Tracking-free 4D Segmentation and Editing : Abstract: Understanding dynamic 3D scenes is crucial for extended reality (XR) and autonomous driving. Incorporating semantic information into 3D reconstruction enables holistic scene representations,...
- SuperGSeg: Open-Vocabulary 3D Segmentation with Structured Super-Gaussians : Abstract: 3D Gaussian Splatting has recently gained traction for its efficient training and real-time rendering. While the vanilla Gaussian Splatting representation is mainly designed for view synthes...
- CLIP-GS: Unifying Vision-Language Representation with 3D Gaussian Splatting : Abstract: Recent works in 3D multimodal learning have made remarkable progress. However, typically 3D multimodal models are only capable of handling point clouds. Compared to the emerging 3D represent...
- GM-MoE: Low-Light Enhancement with Gated-Mechanism Mixture-of-Experts : Abstract: Low-light enhancement has wide applications in autonomous driving, 3D reconstruction, remote sensing, surveillance, and so on, which can significantly improve information utilization. Howeve...
- PanoSAMic: Panoramic Image Segmentation from SAM Feature Encoding and Dual View Fusion : Abstract: Existing image foundation models are not optimized for spherical images having been trained primarily on perspective images. PanoSAMic integrates the pre-trained Segment Anything (SAM) encod...
- From Sketch to Fresco: Efficient Diffusion Transformer with Progressive Resolution : Abstract: Diffusion Transformers achieve impressive generative quality but remain computationally expensive due to iterative sampling. Recently, dynamic resolution sampling has emerged as a promising ...
- FocalOrder: Focal Preference Optimization for Reading Order Detection : Abstract: Reading order detection is the foundation of document understanding. Most existing methods rely on uniform supervision, implicitly assuming a constant difficulty distribution across layout r...
- Anatomy Aware Cascade Network: Bridging Epistemic Uncertainty and Geometric Manifold for 3D Tooth Segmentation : Abstract: Accurate three-dimensional (3D) tooth segmentation from Cone-Beam Computed Tomography (CBCT) is a prerequisite for digital dental workflows. However, achieving high-fidelity segmentation rem...
- Mon3tr: Monocular 3D Telepresence with Pre-built Gaussian Avatars as Amortization : Abstract: Immersive telepresence aims to transform human interaction in AR/VR applications by enabling lifelike full-body holographic representations for enhanced remote collaboration. However, existi...
- ViewMorpher3D: A 3D-aware Diffusion Framework for Multi-Camera Novel View Synthesis in Autonomous Driving : Abstract: Autonomous driving systems rely heavily on multi-view images to ensure accurate perception and robust decision-making. To effectively develop and evaluate perception stacks and planning algo...
- BenchSeg: A Large-Scale Dataset and Benchmark for Multi-View Food Video Segmentation : Abstract: Food image segmentation is a critical task for dietary analysis, enabling accurate estimation of food volume and nutrients. However, current methods suffer from limited multi-view data and p...
- Robust Multicentre Detection and Classification of Colorectal Liver Metastases on CT: Application of Foundation Models : Abstract: Colorectal liver metastases (CRLM) are a major cause of cancer-related mortality, and reliable detection on CT remains challenging in multi-centre settings. We developed a foundation model-b...
- Diffusion in SPAD Signals : Abstract: We derive the likelihood of a raw signal in a single photon avalanche diode (SPAD), given a fixed photon flux. The raw signal comprises timing of detection events, which are nonlinearly rela...
- UIKA: Fast Universal Head Avatar from Pose-Free Images : Abstract: We present UIKA, a feed-forward animatable Gaussian head model from an arbitrary number of unposed inputs, including a single image, multi-view captures, and smartphone-captured videos. Unli...
- PARL: Position-Aware Relation Learning Network for Document Layout Analysis : Abstract: Document layout analysis aims to detect and categorize structural elements (e.g., titles, tables, figures) in scanned or digital documents. Popular methods often rely on high-quality Optical...
- GeoMotionGPT: Geometry-Aligned Motion Understanding with Large Language Models : Abstract: Discrete motion tokenization has recently enabled Large Language Models (LLMs) to serve as versatile backbones for motion understanding and motion-language reasoning. However, existing pipel...
- StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation : Abstract: We present StdGEN++, a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs. Existing 3D generative methods often produce mo...
- Variational Contrastive Learning for Skeleton-based Action Recognition : Abstract: In recent years, self-supervised representation learning for skeleton-based action recognition has advanced with the development of contrastive learning methods. However, most of contrastive...
- Advancing Multinational License Plate Recognition Through Synthetic and Real Data Fusion: A Comprehensive Evaluation : Abstract: Automatic License Plate Recognition is a frequent research topic due to its wide-ranging practical applications. While recent studies use synthetic images to improve License Plate Recognitio...
- Leveraging 3D Representation Alignment and RGB Pretrained Priors for LiDAR Scene Generation : Abstract: LiDAR scene synthesis is an emerging solution to scarcity in 3D data for robotic tasks such as autonomous driving. Recent approaches employ diffusion or flow matching models to generate real...
- Smooth Operator: Smooth Verifiable Reward Activates Spatial Reasoning Ability of Vision-Language Model : Abstract: Vision-Language Models (VLMs) face a critical bottleneck in achieving precise numerical prediction for 3D scene understanding. Traditional reinforcement learning (RL) approaches, primarily b...
- FMAC: a Fair Fiducial Marker Accuracy Comparison Software : Abstract: This paper presents a method for carrying fair comparisons of the accuracy of pose estimation using fiducial markers. These comparisons rely on large sets of high-fidelity synthetic images e...
- Evaluating the encoding competence of visual language models using uncommon actions : Abstract: We propose UAIT (Uncommon-sense Action Image-Text) dataset, a new evaluation benchmark designed to test the semantic understanding ability of visual language models (VLMs) in uncommon-sense ...
- On the application of the Wasserstein metric to 2D curves classification : Abstract: In this work we analyse a number of variants of the Wasserstein distance which allow to focus the classification on the prescribed parts (fragments) of classified 2D curves. These variants a...
- Video Evidence to Reasoning Efficient Video Understanding via Explicit Evidence Grounding : Abstract: Large Vision-Language Models (LVLMs) face a fundamental dilemma in video reasoning: they are caught between the prohibitive computational costs of verbose reasoning and the hallucination ris...
- Beyond External Guidance: Unleashing the Semantic Richness Inside Diffusion Transformers for Improved Training : Abstract: Recent works such as REPA have shown that guiding diffusion models with external semantic features (e.g., DINO) can significantly accelerate the training of diffusion transformers (DiTs). Ho...
- Vision-Language Model for Accurate Crater Detection : Abstract: The European Space Agency (ESA), driven by its ambitions on planned lunar missions with the Argonaut lander, has a profound interest in reliable crater detection, since craters pose a risk t...
- Exchange Is All You Need for Remote Sensing Change Detection : Abstract: Remote sensing change detection fundamentally relies on the effective fusion and discrimination of bi-temporal features. Prevailing paradigms typically utilize Siamese encoders bridged by ex...
- More Images, More Problems? A Controlled Analysis of VLM Failure Modes : Abstract: Large Vision Language Models (LVLMs) have demonstrated remarkable capabilities, yet their proficiency in understanding and reasoning over multiple images remains largely unexplored. While ex...
- SDHSI-Net: Learning Better Representations for Hyperspectral Images via Self-Distillation : Abstract: Hyperspectral image (HSI) classification presents unique challenges due to its high spectral dimensionality and limited labeled data. Traditional deep learning models often suffer from overf...
- Tuning-free Visual Effect Transfer across Videos : Abstract: We present RefVFX, a new framework that transfers complex temporal effects from a reference video onto a target video or image in a feed-forward manner. While existing methods excel at promp...
- Investigating Anthropometric Fidelity in SAM 3D Body : Abstract: The recent release of SAM 3D Body \cite{sam3dbody2025} marks a significant milestone in human mesh recovery, demonstrating state-of-the-art performance in producing clean, topologically cohe...
- Using street view images and visual LLMs to predict heritage values for governance support: Risks, ethics, and policy implications : Abstract: During 2025 and 2026, the Energy Performance of Buildings Directive is being implemented in the European Union member states, requiring all member states to have National Building Renovation...
- Deep Joint Source-Channel Coding for Wireless Video Transmission with Asymmetric Context : Abstract: In this paper, we propose a high-efficiency deep joint source-channel coding (JSCC) method for video transmission based on conditional coding with asymmetric context. The conditional coding-...
- Leveraging Membership Inference Attacks for Privacy Measurement in Federated Learning for Remote Sensing Images : Abstract: Federated Learning (FL) enables collaborative model training while keeping training data localized, allowing us to preserve privacy in various domains including remote sensing. However, rece...
- Real-Time Image Processing Algorithms for Embedded Systems : Abstract: Embedded vision systems need efficient and robust image processing algorithms to perform real-time, with resource-constrained hardware. This research investigates image processing algorithms...
- Gamma2Patterns: Deep Cognitive Attention Region Identification and Gamma-Alpha Pattern Analysis : Abstract: Deep cognitive attention is characterized by heightened gamma oscillations and coordinated visual behavior. Despite the physiological importance of these mechanisms, computational studies ra...
- Performance Analysis of DCT, Hadamard, and PCA in Block-Based Image Compression : Abstract: Block based image compression relies on transform coding to concentrate signal energy into a small number of coefficients. While classical codecs use fixed transforms such as the Discrete Co...
- From Easy to Hard++: Promoting Differentially Private Image Synthesis Through Spatial-Frequency Curriculum : Abstract: To improve the quality of Differentially private (DP) synthetic images, most studies have focused on improving the core optimization techniques (e.g., DP-SGD). Recently, we have witnessed a ...
- Semantic Enrichment of CAD-Based Industrial Environments via Scene Graphs for Simulation and Reasoning : Abstract: Utilizing functional elements in an industrial environment, such as displays and interactive valves, provide effective possibilities for robot training. When preparing simulations for robots...
- CulinaryCut-VLAP: A Vision-Language-Action-Physics Framework for Food Cutting via a Force-Aware Material Point Method : Abstract: Food cutting is a highly practical yet underexplored application at the intersection of vision and robotic manipulation. The task remains challenging because interactions between the knife a...
- VIPER Strike: Defeating Visual Reasoning CAPTCHAs via Structured Vision-Language Inference : Abstract: Visual Reasoning CAPTCHAs (VRCs) combine visual scenes with natural-language queries that demand compositional inference over objects, attributes, and spatial relations. They are increasingl...
- R$^3$D: Regional-guided Residual Radar Diffusion : Abstract: Millimeter-wave radar enables robust environment perception in autonomous systems under adverse conditions yet suffers from sparse, noisy point clouds with low angular resolution. Existing d...
- Precision Meets Art: Autonomous Multi-UAV System for Large Scale Mural Drawing : Abstract: The integration of autonomous unmanned aerial vehicles (UAVs) into large-scale artistic projects has emerged as a new application in robotics. This paper presents the design, deployment, and...
- Hard Thresholding Pursuit Algorithms for Least Absolute Deviations Problem : Abstract: Least absolute deviations (LAD) is a statistical optimality criterion widely utilized in scenarios where a minority of measurements are contaminated by outliers of arbitrary magnitudes. In t...
- USFetal: Tools for Fetal Brain Ultrasound Compounding : Abstract: Ultrasound offers a safe, cost-effective, and widely accessible technology for fetal brain imaging, making it especially suitable for routine clinical use. However, it suffers from view-depe...
- AutoTour: Automatic Photo Tour Guide with Smartphones and LLMs : Abstract: We present AutoTour, a system that enhances user exploration by automatically generating fine-grained landmark annotations and descriptive narratives for photos captured by users. The key id...
- Revisiting the Ordering of Channel and Spatial Attention: A Comprehensive Study on Sequential and Parallel Designs : Abstract: Attention mechanisms have become a core component of deep learning models, with Channel Attention and Spatial Attention being the two most representative architectures. Current research on t...
- SC-MII: Infrastructure LiDAR-based 3D Object Detection on Edge Devices for Split Computing with Multiple Intermediate Outputs Integration : Abstract: 3D object detection using LiDAR-based point cloud data and deep neural networks is essential in autonomous driving technology. However, deploying state-of-the-art models on edge devices pres...
- BlindU: Blind Machine Unlearning without Revealing Erasing Data : Abstract: Machine unlearning enables data holders to remove the contribution of their specified samples from trained models to protect their privacy. However, it is paradoxical that most unlearning me...
- HERE: Hierarchical Active Exploration of Radiance Field with Epistemic Uncertainty Minimization : Abstract: We present HERE, an active 3D scene reconstruction framework based on neural radiance fields, enabling high-fidelity implicit mapping. Our approach centers around an active learning strategy...
- Fast Multi-Stack Slice-to-Volume Reconstruction via Multi-Scale Unrolled Optimization : Abstract: Fully convolutional networks have become the backbone of modern medical imaging due to their ability to learn multi-scale representations and perform end-to-end inference. Yet their potentia...
- A Multimodal Dataset of Student Oral Presentations with Sensors and Evaluation Data : Abstract: Oral presentation skills are a critical component of higher education, yet comprehensive datasets capturing real-world student performance across multiple modalities remain scarce. To addres...
- SecureCAI: Injection-Resilient LLM Assistants for Cybersecurity Operations : Abstract: Large Language Models have emerged as transformative tools for Security Operations Centers, enabling automated log analysis, phishing triage, and malware explanation; however, deployment in ...
- JoIN: Joint GANs Inversion for Intrinsic Image Decomposition : Abstract: Intrinsic Image Decomposition (IID) is a challenging inverse problem that seeks to decompose a natural image into its underlying intrinsic components such as albedo and shading. While recent...
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head : Abstract: While the Transformer architecture dominates many fields, its quadratic self-attention complexity hinders its use in large-scale applications. Linear attention offers an efficient alternativ...
- OSCAR: Open-Set CAD Retrieval from a Language Prompt and a Single Image : Abstract: 6D object pose estimation plays a crucial role in scene understanding for applications such as robotics and augmented reality. To support the needs of ever-changing object sets in such conte...
- Reconstruction Guided Few-shot Network For Remote Sensing Image Classification : Abstract: Few-shot remote sensing image classification is challenging due to limited labeled samples and high variability in land-cover types. We propose a reconstruction-guided few-shot network (RGFS...
- PulseMind: A Multi-Modal Medical Model for Real-World Clinical Diagnosis : Abstract: Recent advances in medical multi-modal models focus on specialized image analysis like dermatology, pathology, or radiology. However, they do not fully capture the complexity of real-world c...
- Seeing Right but Saying Wrong: Inter- and Intra-Layer Refinement in MLLMs without Training : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated strong capabilities across a variety of vision-language tasks. However, their internal reasoning often exhibits a critical inconsis...
- HiVid-Narrator: Hierarchical Video Narrative Generation with Scene-Primed ASR-anchored Compression : Abstract: Generating structured narrations for real-world e-commerce videos requires models to perceive fine-grained visual details and organize them into coherent, high-level stories--capabilities th...
- Learning Dynamic Collaborative Network for Semi-supervised 3D Vessel Segmentation : Abstract: In this paper, we present a new dynamic collaborative network for semi-supervised 3D vessel segmentation, termed DiCo. Conventional mean teacher (MT) methods typically employ a static approa...
- Forecast the Principal, Stabilize the Residual: Subspace-Aware Feature Caching for Efficient Diffusion Transformers : Abstract: Diffusion Transformer (DiT) models have achieved unprecedented quality in image and video generation, yet their iterative sampling process remains computationally prohibitive. To accelerate ...
- ObjSplat: Geometry-Aware Gaussian Surfels for Active Object Reconstruction : Abstract: Autonomous high-fidelity object reconstruction is fundamental for creating digital assets and bridging the simulation-to-reality gap in robotics. We present ObjSplat, an active reconstructio...
- Motion Focus Recognition in Fast-Moving Egocentric Video : Abstract: From Vision-Language-Action (VLA) systems to robotics, existing egocentric datasets primarily focus on action recognition tasks, while largely overlooking the inherent role of motion analysi...
- Test-time Adaptive Hierarchical Co-enhanced Denoising Network for Reliable Multimodal Classification : Abstract: Reliable learning on low-quality multimodal data is a widely concerning issue, especially in safety-critical applications. However, multimodal noise poses a major challenge in this domain an...
- DIVER: Dynamic Iterative Visual Evidence Reasoning for Multimodal Fake News Detection : Abstract: Multimodal fake news detection is crucial for mitigating adversarial misinformation. Existing methods, relying on static fusion or LLMs, face computational redundancy and hallucination risks...
- ShowUI-Aloha: Human-Taught GUI Agent : Abstract: Graphical User Interfaces (GUIs) are central to human-computer interaction, yet automating complex GUI tasks remains a major challenge for autonomous agents, largely due to a lack of scalabl...
- SIRR-LMM: Single-image Reflection Removal via Large Multimodal Model : Abstract: Glass surfaces create complex interactions of reflected and transmitted light, making single-image reflection removal (SIRR) challenging. Existing datasets suffer from limited physical reali...
- SceneNAT: Masked Generative Modeling for Language-Guided Indoor Scene Synthesis : Abstract: We present SceneNAT, a single-stage masked non-autoregressive Transformer that synthesizes complete 3D indoor scenes from natural language instructions through only a few parallel decoding p...
- VENUS: Visual Editing with Noise Inversion Using Scene Graphs : Abstract: State-of-the-art text-based image editing models often struggle to balance background preservation with semantic consistency, frequently resulting either in the synthesis of entirely new ima...
- Language-Grounded Multi-Domain Image Translation via Semantic Difference Guidance : Abstract: Multi-domain image-to-image translation re quires grounding semantic differences ex pressed in natural language prompts into corresponding visual transformations, while preserving unrelated ...
- Universal Adversarial Purification with DDIM Metric Loss for Stable Diffusion : Abstract: Stable Diffusion (SD) often produces degraded outputs when the training dataset contains adversarial noise. Adversarial purification offers a promising solution by removing adversarial noise...
- From Landslide Conditioning Factors to Satellite Embeddings: Evaluating the Utilisation of Google AlphaEarth for Landslide Susceptibility Mapping using Deep Learning : Abstract: Data-driven landslide susceptibility mapping (LSM) typically relies on landslide conditioning factors (LCFs), whose availability, heterogeneity, and preprocessing-related uncertainties can c...
- PALUM: Part-based Attention Learning for Unified Motion Retargeting : Abstract: Retargeting motion between characters with different skeleton structures is a fundamental challenge in computer animation. When source and target characters have vastly different bone arrang...
- GenDet: Painting Colored Bounding Boxes on Images via Diffusion Model for Object Detection : Abstract: This paper presents GenDet, a novel framework that redefines object detection as an image generation task. In contrast to traditional approaches, GenDet adopts a pioneering approach by lever...
- Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models : Abstract: The task of Image-to-Video (I2V) generation aims to synthesize a video from a reference image and a text prompt. This requires diffusion models to reconcile high-frequency visual constraints...
- VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding : Abstract: This paper presents VideoLoom, a unified Video Large Language Model (Video LLM) for joint spatial-temporal understanding. To facilitate the development of fine-grained spatial and temporal l...
- A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model : Abstract: Watermarking has emerged as a pivotal solution for content traceability and intellectual property protection in Large Vision-Language Models (LVLMs). However, vision-agnostic watermarks intr...
- Inference-Time Scaling for Visual AutoRegressive modeling by Searching Representative Samples : Abstract: While inference-time scaling has significantly enhanced generative quality in large language and diffusion models, its application to vector-quantized (VQ) visual autoregressive modeling (VA...
- Mimic Human Cognition, Master Multi-Image Reasoning: A Meta-Action Framework for Enhanced Visual Understanding : Abstract: While Multimodal Large Language Models (MLLMs) excel at single-image understanding, they exhibit significantly degraded performance in multi-image reasoning scenarios. Multi-image reasoning ...
- Low-Back Pain Physical Rehabilitation by Movement Analysis in Clinical Trial : Abstract: To allow the development and assessment of physical rehabilitation by an intelligent tutoring system, we propose a medical dataset of clinical patients carrying out low back-pain rehabilitat...
- What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models : Abstract: Current vision-language benchmarks predominantly feature well-structured questions with clear, explicit prompts. However, real user queries are often informal and underspecified. Users natur...
- B-FIRE: Binning-Free Diffusion Implicit Neural Representation for Hyper-Accelerated Motion-Resolved MRI : Abstract: Accelerated dynamic volumetric magnetic resonance imaging (4DMRI) is essential for applications relying on motion resolution. Existing 4DMRI produces acceptable artifacts of averaged breathi...
- Analyzing the Structure of Handwritten Digits: A Comparative Study of PCA, Factor Analysis, and UMAP : Abstract: Handwritten digit images lie in a high-dimensional pixel space but exhibit strong geometric and statistical structure. This paper investigates the latent organization of handwritten digits i...
- Think Bright, Diffuse Nice: Enhancing T2I-ICL via Inductive-Bias Hint Instruction and Query Contrastive Decoding : Abstract: Text-to-Image In-Context Learning (T2I-ICL) enables customized image synthesis via interleaved text-image examples but faces two mutually reinforcing bottlenecks, compliance failure and prio...
- TIR-Flow: Active Video Search and Reasoning with Frozen VLMs : Abstract: While Large Video-Language Models (Video-LLMs) have achieved remarkable progress in perception, their reasoning capabilities remain a bottleneck. Existing solutions typically resort to a hea...
- A Unified Attention U-Net Framework for Cross-Modality Tumor Segmentation in MRI and CT : Abstract: This study presents a unified Attention U-Net architecture trained jointly on MRI (BraTS 2021) and CT (LIDC-IDRI) datasets to investigate the generalizability of a single model across divers...
- How Does India Cook Biryani? : Abstract: Biryani, one of India's most celebrated dishes, exhibits remarkable regional diversity in its preparation, ingredients, and presentation. With the growing availability of online cooking vide...
- QwenStyle: Content-Preserving Style Transfer with Qwen-Image-Edit : Abstract: Content-Preserving Style transfer, given content and style references, remains challenging for Diffusion Transformers (DiTs) due to its internal entangled content and style features. In this...
- Cascading multi-agent anomaly detection in surveillance systems via vision-language models and embedding-based classification : Abstract: Intelligent anomaly detection in dynamic visual environments requires reconciling real-time performance with semantic interpretability. Conventional approaches address only fragments of this...
- When Imbalance Comes Twice: Active Learning under Simulated Class Imbalance and Label Shift in Binary Semantic Segmentation : Abstract: The aim of Active Learning is to select the most informative samples from an unlabelled set of data. This is useful in cases where the amount of data is large and labelling is expensive, suc...
- Akasha 2: Hamiltonian State Space Duality and Visual-Language Joint Embedding Predictive Architectur : Abstract: We present Akasha 2, a state-of-the-art multimodal architecture that integrates Hamiltonian State Space Duality (H-SSD) with Visual-Language Joint Embedding Predictive Architecture (VL-JEPA)...
- Two-step Authentication: Multi-biometric System Using Voice and Facial Recognition : Abstract: We present a cost-effective two-step authentication system that integrates face identification and speaker verification using only a camera and microphone available on common devices. The pi...
- SAPL: Semantic-Agnostic Prompt Learning in CLIP for Weakly Supervised Image Manipulation Localization : Abstract: Malicious image manipulation threatens public safety and requires efficient localization methods. Existing approaches depend on costly pixel-level annotations which make training expensive. ...
- Ground What You See: Hallucination-Resistant MLLMs via Caption Feedback, Diversity-Aware Sampling, and Conflict Regularization : Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable success across diverse tasks, their practical deployment is severely hindered by hallucination issues, which become pa...
- Synthetic FMCW Radar Range Azimuth Maps Augmentation with Generative Diffusion Model : Abstract: The scarcity and low diversity of well-annotated automotive radar datasets often limit the performance of deep-learning-based environmental perception. To overcome these challenges, we propo...
- A survey of facial recognition techniques : Abstract: As multimedia content is quickly growing, the field of facial recognition has become one of the major research fields, particularly in the recent years. The most problematic area to research...
- EyeTheia: A Lightweight and Accessible Eye-Tracking Toolbox : Abstract: We introduce EyeTheia, a lightweight and open deep learning pipeline for webcam-based gaze estimation, designed for browser-based experimental platforms and real-world cognitive and clinical...
- NAS-GS: Noise-Aware Sonar Gaussian Splatting : Abstract: Underwater sonar imaging plays a crucial role in various applications, including autonomous navigation in murky water, marine archaeology, and environmental monitoring. However, the unique c...
- Perception Test 2025: Challenge Summary and a Unified VQA Extension : Abstract: The Third Perception Test challenge was organised as a full-day workshop alongside the IEEE/CVF International Conference on Computer Vision (ICCV) 2025. Its primary goal is to benchmark stat...
- VideoWeave: A Data-Centric Approach for Efficient Video Understanding : Abstract: Training video-language models is often prohibitively expensive due to the high cost of processing long frame sequences and the limited availability of annotated long videos. We present Vide...
- Object-WIPER : Training-Free Object and Associated Effect Removal in Videos : Abstract: In this paper, we introduce Object-WIPER, a training-free framework for removing dynamic objects and their associated visual effects from videos, and inpainting them with semantically consis...
- Context Matters: Peer-Aware Student Behavioral Engagement Measurement via VLM Action Parsing and LLM Sequence Classification : Abstract: Understanding student behavior in the classroom is essential to improve both pedagogical quality and student engagement. Existing methods for predicting student engagement typically require ...
- GlobalPaint: Spatiotemporal Coherent Video Outpainting with Global Feature Guidance : Abstract: Video outpainting extends a video beyond its original boundaries by synthesizing missing border content. Compared with image outpainting, it requires not only per-frame spatial plausibility ...
- WHU-PCPR: A cross-platform heterogeneous point cloud dataset for place recognition in complex urban scenes : Abstract: Point Cloud-based Place Recognition (PCPR) demonstrates considerable potential in applications such as autonomous driving, robot localization and navigation, and map update. In practical app...
- How to Build Robust, Scalable Models for GSV-Based Indicators in Neighborhood Research : Abstract: A substantial body of health research demonstrates a strong link between neighborhood environments and health outcomes. Recently, there has been increasing interest in leveraging advances in...
- On the Adversarial Robustness of 3D Large Vision-Language Models : Abstract: 3D Vision-Language Models (VLMs), such as PointLLM and GPT4Point, have shown strong reasoning and generalization abilities in 3D understanding tasks. However, their adversarial robustness re...
- SparseOccVLA: Bridging Occupancy and Vision-Language Models via Sparse Queries for Unified 4D Scene Understanding and Planning : Abstract: In autonomous driving, Vision Language Models (VLMs) excel at high-level reasoning , whereas semantic occupancy provides fine-grained details. Despite significant progress in individual fiel...
- VVTRec: Radio Interferometric Reconstruction through Visual and Textual Modality Enrichment : Abstract: Radio astronomy is an indispensable discipline for observing distant celestial objects. Measurements of wave signals from radio telescopes, called visibility, need to be transformed into ima...
- SRFlow: A Dataset and Regularization Model for High-Resolution Facial Optical Flow via Splatting Rasterization : Abstract: Facial optical flow supports a wide range of tasks in facial motion analysis. However, the lack of high-resolution facial optical flow datasets has hindered progress in this area. In this pa...
- Learning Domain Agnostic Latent Embeddings of 3D Faces for Zero-shot Animal Expression Transfer : Abstract: We present a zero-shot framework for transferring human facial expressions to 3D animal face meshes. Our method combines intrinsic geometric descriptors (HKS/WKS) with a mesh-agnostic latent...
- 3D CoCa v2: Contrastive Learners with Test-Time Search for Generalizable Spatial Intelligence : Abstract: Spatial intelligence refers to the ability to perceive, reason about, and describe objects and their relationships within three-dimensional environments, forming a foundation for embodied pe...
- Bridging Robustness and Efficiency: Real-Time Low-Light Enhancement via Attention U-Net GAN : Abstract: Recent advancements in Low-Light Image Enhancement (LLIE) have focused heavily on Diffusion Probabilistic Models, which achieve high perceptual quality but suffer from significant computatio...
- Toward Generalizable Deblurring: Leveraging Massive Blur Priors with Linear Attention for Real-World Scenarios : Abstract: Image deblurring has advanced rapidly with deep learning, yet most methods exhibit poor generalization beyond their training datasets, with performance dropping significantly in real-world s...
- Towards Egocentric 3D Hand Pose Estimation in Unseen Domains : Abstract: We present V-HPOT, a novel approach for improving the cross-domain performance of 3D hand pose estimation from egocentric images across diverse, unseen domains. State-of-the-art methods demo...
- LLMTrack: Semantic Multi-Object Tracking with Multi-modal Large Language Models : Abstract: Traditional Multi-Object Tracking (MOT) systems have achieved remarkable precision in localization and association, effectively answering \textit{where} and \textit{who}. However, they often...
- ArrowGEV: Grounding Events in Video via Learning the Arrow of Time : Abstract: Grounding events in videos serves as a fundamental capability in video analysis. While Vision-Language Models (VLMs) are increasingly employed for this task, existing approaches predominantl...
- QCaption: Video Captioning and Q&A through Fusion of Large Multimodal Models : Abstract: This paper introduces QCaption, a novel video captioning and Q&A pipeline that enhances video analytics by fusing three models: key frame extraction, a Large Multimodal Model (LMM) for image...
- APEX: Learning Adaptive Priorities for Multi-Objective Alignment in Vision-Language Generation : Abstract: Multi-objective alignment for text-to-image generation is commonly implemented via static linear scalarization, but fixed weights often fail under heterogeneous rewards, leading to optimizat...
- Sissi: Zero-shot Style-guided Image Synthesis via Semantic-style Integration : Abstract: Text-guided image generation has advanced rapidly with large-scale diffusion models, yet achieving precise stylization with visual exemplars remains difficult. Existing approaches often depe...
- Boosting Overlapping Organoid Instance Segmentation Using Pseudo-Label Unmixing and Synthesis-Assisted Learning : Abstract: Organoids, sophisticated in vitro models of human tissues, are crucial for medical research due to their ability to simulate organ functions and assess drug responses accurately. Accurate or...
- eSkiTB: A Synthetic Event-based Dataset for Tracking Skiers : Abstract: Tracking skiers in RGB broadcast footage is challenging due to motion blur, static overlays, and clutter that obscure the fast-moving athlete. Event cameras, with their asynchronous contrast...
- Quantification and Classification of Carbon Nanotubes in Electron Micrographs using Vision Foundation Models : Abstract: Accurate characterization of carbon nanotube morphologies in electron microscopy images is vital for exposure assessment and toxicological studies, yet current workflows rely on slow, subjec...
- When Humans Judge Irises: Pupil Size Normalization as an Aid and Synthetic Irises as a Challenge : Abstract: Iris recognition is a mature biometric technology offering remarkable precision and speed, and allowing for large-scale deployments to populations exceeding a billion enrolled users (e.g., A...
- The Normalized Difference Layer: A Differentiable Spectral Index Formulation for Deep Learning : Abstract: Normalized difference indices have been a staple in remote sensing for decades. They stay reliable under lighting changes produce bounded values and connect well to biophysical signals. Even...
- SpatialNav: Leveraging Spatial Scene Graphs for Zero-Shot Vision-and-Language Navigation : Abstract: Although learning-based vision-and-language navigation (VLN) agents can learn spatial knowledge implicitly from large-scale training data, zero-shot VLN agents lack this process, relying pri...
- SARA: Scene-Aware Reconstruction Accelerator : Abstract: We present SARA (Scene-Aware Reconstruction Accelerator), a geometry-driven pair selection module for Structure-from-Motion (SfM). Unlike conventional pipelines that select pairs based on vi...
- Enhancing Low-resolution Image Representation Through Normalizing Flows : Abstract: Low-resolution image representation is a special form of sparse representation that retains only low-frequency information while discarding high-frequency components. This property reduces s...
- OSCAR: Optical-aware Semantic Control for Aleatoric Refinement in Sar-to-Optical Translation : Abstract: Synthetic Aperture Radar (SAR) provides robust all-weather imaging capabilities; however, translating SAR observations into photo-realistic optical images remains a fundamentally ill-posed p...
- PRISM: Color-Stratified Point Cloud Sampling : Abstract: We present PRISM, a novel color-guided stratified sampling method for RGB-LiDAR point clouds. Our approach is motivated by the observation that unique scene features often exhibit chromatic ...
- MedGround: Bridging the Evidence Gap in Medical Vision-Language Models with Verified Grounding Data : Abstract: Vision-Language Models (VLMs) can generate convincing clinical narratives, yet frequently struggle to visually ground their statements. We posit this limitation arises from the scarcity of h...
- MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation : Abstract: Most existing 3D referring expression segmentation (3DRES) methods rely on dense, high-quality point clouds, while real-world agents such as robots and mobile phones operate with only a few ...
- Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation : Abstract: Unsupervised Domain Adaptation with SAM-RefiSeR for Enhanced Brain Tumor Segmentation
- MixRI: Mixing Features of Reference Images for Novel Object Pose Estimation : Abstract: We present MixRI, a lightweight network that solves the CAD-based novel object pose estimation problem in RGB images. It can be instantly applied to a novel object at test time without finet...
- CLIMP: Contrastive Language-Image Mamba Pretraining : Abstract: Contrastive Language-Image Pre-training (CLIP) relies on Vision Transformers whose attention mechanism is susceptible to spurious correlations, and scales quadratically with resolution. To a...
- UDPNet: Unleashing Depth-based Priors for Robust Image Dehazing : Abstract: Image dehazing has witnessed significant advancements with the development of deep learning models. However, a few methods predominantly focus on single-modal RGB features, neglecting the in...
- RenderFlow: Single-Step Neural Rendering via Flow Matching : Abstract: Conventional physically based rendering (PBR) pipelines generate photorealistic images through computationally intensive light transport simulations. Although recent deep learning approaches...
- Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning : Abstract: In real-world video question answering scenarios, videos often provide only localized visual cues, while verifiable answers are distributed across the open web; models therefore need to join...
- SketchJudge: A Diagnostic Benchmark for Grading Hand-drawn Diagrams with Multimodal Large Language Models : Abstract: While Multimodal Large Language Models (MLLMs) have achieved remarkable progress in visual understanding, they often struggle when faced with the unstructured and ambiguous nature of human-g...
- Unified Personalized Understanding, Generating and Editing : Abstract: Unified large multimodal models (LMMs) have achieved remarkable progress in general-purpose multimodal understanding and generation. However, they still operate under a ``one-size-fits-all''...
- Can Textual Reasoning Improve the Performance of MLLMs on Fine-grained Visual Classification? : Abstract: Multi-modal large language models (MLLMs) exhibit strong general-purpose capabilities, yet still struggle on Fine-Grained Visual Classification (FGVC), a core perception task that requires s...
- Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI : Abstract: Accurate molecular subtype classification is essential for personalized breast cancer treatment, yet conventional immunohistochemical analysis relies on invasive biopsies and is prone to sam...
- Adversarial Attacks on Medical Hyperspectral Imaging Exploiting Spectral-Spatial Dependencies and Multiscale Features : Abstract: Medical hyperspectral imaging (HSI) enables accurate disease diagnosis by capturing rich spectral-spatial tissue information, but recent advances in deep learning have exposed its vulnerabil...
- Billboard in Focus: Estimating Driver Gaze Duration from a Single Image : Abstract: Roadside billboards represent a central element of outdoor advertising, yet their presence may contribute to driver distraction and accident risk. This study introduces a fully automated pip...
- Efficient Visual Question Answering Pipeline for Autonomous Driving via Scene Region Compression : Abstract: Autonomous driving increasingly relies on Visual Question Answering (VQA) to enable vehicles to understand complex surroundings by analyzing visual inputs and textual queries. Currently, a p...
- 3D Wavelet-Based Structural Priors for Controlled Diffusion in Whole-Body Low-Dose PET Denoising : Abstract: Low-dose Positron Emission Tomography (PET) imaging reduces patient radiation exposure but suffers from increased noise that degrades image quality and diagnostic reliability. Although diffu...
- MEDVISTAGYM: A Scalable Training Environment for Thinking with Medical Images via Tool-Integrated Reinforcement Learning : Abstract: Vision language models (VLMs) achieve strong performance on general image understanding but struggle to think with medical images, especially when performing multi-step reasoning through ite...
- Few-shot Class-Incremental Learning via Generative Co-Memory Regularization : Abstract: Few-shot class-incremental learning (FSCIL) aims to incrementally learn models from a small amount of novel data, which requires strong representation and adaptation ability of models learne...
- L-RAG: Balancing Context and Retrieval with Entropy-Based Lazy Loading : Abstract: Retrieval-Augmented Generation (RAG) has emerged as the predominant paradigm for grounding Large Language Model outputs in factual knowledge, effectively mitigating hallucinations. However, ...
- Benchmarking Egocentric Clinical Intent Understanding Capability for Medical Multimodal Large Language Models : Abstract: Medical Multimodal Large Language Models (Med-MLLMs) require egocentric clinical intent understanding for real-world deployment, yet existing benchmarks fail to evaluate this critical capabi...
- Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have achieved strong performance across many tasks, yet most systems remain limited to offline inference, requiring complete inputs before generating...
- An Ubuntu-Guided Large Language Model Framework for Cognitive Behavioral Mental Health Dialogue : Abstract: South Africa's escalating mental health crisis, compounded by limited access to culturally responsive care, calls for innovative and contextually grounded interventions. While large language...
- TagSpeech: End-to-End Multi-Speaker ASR and Diarization with Fine-Grained Temporal Grounding : Abstract: We present TagSpeech, a unified LLM-based framework that utilizes Temporal Anchor Grounding for joint multi-speaker ASR and diarization. The framework is built on two key designs: (1) decoup...
- Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos : Abstract: Vision-Language Models (VLMs) are increasingly deployed in socially consequential settings, raising concerns about social bias driven by demographic cues. A central challenge in measuring su...
- FinCARDS: Card-Based Analyst Reranking for Financial Document Question Answering : Abstract: Financial question answering (QA) over long corporate filings requires evidence to satisfy strict constraints on entities, financial metrics, fiscal periods, and numeric values. However, exi...
- ReinPool: Reinforcement Learning Pooling Multi-Vector Embeddings for Retrieval System : Abstract: Multi-vector embedding models have emerged as a powerful paradigm for document retrieval, preserving fine-grained visual and textual details through token-level representations. However, thi...
- Rewarding Creativity: A Human-Aligned Generative Reward Model for Reinforcement Learning in Storytelling : Abstract: While Large Language Models (LLMs) can generate fluent text, producing high-quality creative stories remains challenging. Reinforcement Learning (RL) offers a promising solution but faces tw...
- Lost in the Noise: How Reasoning Models Fail with Contextual Distractors : Abstract: Recent advances in reasoning models and agentic AI systems have led to an increased reliance on diverse external information. However, this shift introduces input contexts that are inherentl...
- LRAS: Advanced Legal Reasoning with Agentic Search : Abstract: While Large Reasoning Models (LRMs) have demonstrated exceptional logical capabilities in mathematical domains, their application to the legal field remains hindered by the strict requiremen...
- On Narrative: The Rhetorical Mechanisms of Online Polarisation : Abstract: Polarisation research has demonstrated how people cluster in homogeneous groups with opposing opinions. However, this effect emerges not only through interaction between people, limiting com...
- Beyond Static Tools: Test-Time Tool Evolution for Scientific Reasoning : Abstract: The central challenge of AI for Science is not reasoning alone, but the ability to create computational methods in an open-ended scientific world. Existing LLM-based agents rely on static, p...
- Reasoning Models Will Blatantly Lie About Their Reasoning : Abstract: It has been shown that Large Reasoning Models (LRMs) may not *say what they think*: they do not always volunteer information about how certain parts of the input influence their reasoning. B...
- OS-Symphony: A Holistic Framework for Robust and Generalist Computer-Using Agent : Abstract: While Vision-Language Models (VLMs) have significantly advanced Computer-Using Agents (CUAs), current frameworks struggle with robustness in long-horizon workflows and generalization in nove...
- Efficient Continual Pre-training for Building Domain Specific Large Language Models : Abstract: Large language models (LLMs) have demonstrated remarkable open-domain capabilities. LLMs tailored for a domain are typically trained entirely on domain corpus to excel at handling domain-spe...
- Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion : Abstract: With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performanc...
- Correcting misinformation on social media with a large language model : Abstract: Real-world information, often multimodal, can be misinformed or potentially misleading due to factual errors, outdated claims, missing context, misinterpretation, and more. Such "misinformat...
- Perspectives in Play: A Multi-Perspective Approach for More Inclusive NLP Systems : Abstract: In the realm of Natural Language Processing (NLP), common approaches for handling human disagreement consist of aggregating annotators' viewpoints to establish a single ground truth. However...
- Cross-Domain Transfer and Few-Shot Learning for Personal Identifiable Information Recognition : Abstract: Accurate recognition of personally identifiable information (PII) is central to automated text anonymization. This paper investigates the effectiveness of cross-domain model transfer, multi-...
- HyperTopo-Adapters: Geometry- and Topology-Aware Segmentation of Leaf Lesions on Frozen Encoders : Abstract: Leaf-lesion segmentation is topology-sensitive: small merges, splits, or false holes can be biologically meaningful descriptors of biochemical pathways, yet they are weakly penalized by stan...
- OptFormer: Optical Flow-Guided Attention and Phase Space Reconstruction for SST Forecasting : Abstract: Sea Surface Temperature (SST) prediction plays a vital role in climate modeling and disaster forecasting. However, it remains challenging due to its nonlinear spatiotemporal dynamics and ext...
- Semantic Event Graphs for Long-Form Video Question Answering : Abstract: Long-form video question answering remains challenging for modern vision-language models, which struggle to reason over hour-scale footage without exceeding practical token and compute budge...
- ReMIND: Orchestrating Modular Large Language Models for Controllable Serendipity A REM-Inspired System Design for Emergent Creative Ideation : Abstract: Large language models (LLMs) are used not only for problem solving but also for creative ideation; however, eliciting serendipitous insights that are both novel and internally coherent remai...
- Measuring Iterative Temporal Reasoning with TimePuzzles : Abstract: We introduce TimePuzzles, a constraint-based date inference task for evaluating iterative temporal reasoning. Each puzzle combines factual temporal anchors with (cross-cultural) calendar rel...
- Can Large Language Models Understand, Reason About, and Generate Code-Switched Text? : Abstract: Code-switching is a pervasive phenomenon in multilingual communication, yet the robustness of large language models (LLMs) in mixed-language settings remains insufficiently understood. In th...
- Structured Reasoning for Large Language Models : Abstract: Large language models (LLMs) achieve strong performance by generating long chains of thought, but longer traces always introduce redundant or ineffective reasoning steps. One typical behavio...
- Relink: Constructing Query-Driven Evidence Graph On-the-Fly for GraphRAG : Abstract: Graph-based Retrieval-Augmented Generation (GraphRAG) mitigates hallucinations in Large Language Models (LLMs) by grounding them in structured knowledge. However, current GraphRAG methods ar...
- MI-PRUN: Optimize Large Language Model Pruning via Mutual Information : Abstract: Large Language Models (LLMs) have become indispensable across various domains, but this comes at the cost of substantial computational and memory resources. Model pruning addresses this by r...
- The Roots of Performance Disparity in Multilingual Language Models: Intrinsic Modeling Difficulty or Design Choices? : Abstract: Multilingual language models (LMs) promise broader NLP access, yet current systems deliver uneven performance across the world's languages. This survey examines why these gaps persist and wh...
- ActiShade: Activating Overshadowed Knowledge to Guide Multi-Hop Reasoning in Large Language Models : Abstract: In multi-hop reasoning, multi-round retrieval-augmented generation (RAG) methods typically rely on LLM-generated content as the retrieval query. However, these approaches are inherently vuln...
- The Confidence Dichotomy: Analyzing and Mitigating Miscalibration in Tool-Use Agents : Abstract: Autonomous agents based on large language models (LLMs) are rapidly evolving to handle multi-turn tasks, but ensuring their trustworthiness remains a critical challenge. A fundamental pillar...
- Document-Level Zero-Shot Relation Extraction with Entity Side Information : Abstract: Document-Level Zero-Shot Relation Extraction (DocZSRE) aims to predict unseen relation labels in text documents without prior training on specific relations. Existing approaches rely on Larg...
- Towards Comprehensive Semantic Speech Embeddings for Chinese Dialects : Abstract: Despite having hundreds of millions of speakers, Chinese dialects lag behind Mandarin in speech and language technologies. Most varieties are primarily spoken, making dialect-to-Mandarin spe...
- ReasonTabQA: A Comprehensive Benchmark for Table Question Answering from Real World Industrial Scenarios : Abstract: Recent advancements in Large Language Models (LLMs) have significantly catalyzed table-based question answering (TableQA). However, existing TableQA benchmarks often overlook the intricacies...
- PsyCLIENT: Client Simulation via Conversational Trajectory Modeling for Trainee Practice and Model Evaluation in Mental Health Counseling : Abstract: LLM-based client simulation has emerged as a promising tool for training novice counselors and evaluating automated counseling systems. However, existing client simulation approaches face th...
- Mitrasamgraha: A Comprehensive Classical Sanskrit Machine Translation Dataset : Abstract: While machine translation is regarded as a "solved problem" for many high-resource languages, close analysis quickly reveals that this is not the case for content that shows challenges such ...
- How to predict creativity ratings from written narratives: A comparison of co-occurrence and textual forma mentis networks : Abstract: This tutorial paper provides a step-by-step workflow for building and analysing semantic networks from short creative texts. We introduce and compare two widely used text-to-network approach...
- BayesRAG: Probabilistic Mutual Evidence Corroboration for Multimodal Retrieval-Augmented Generation : Abstract: Retrieval-Augmented Generation (RAG) has become a pivotal paradigm for Large Language Models (LLMs), yet current approaches struggle with visually rich documents by treating text and images ...
- Beyond Literal Mapping: Benchmarking and Improving Non-Literal Translation Evaluation : Abstract: Large Language Models (LLMs) have significantly advanced Machine Translation (MT), applying them to linguistically complex domains-such as Social Network Services, literature etc. In these s...
- DiffER: Diffusion Entity-Relation Modeling for Reversal Curse in Diffusion Large Language Models : Abstract: The "reversal curse" refers to the phenomenon where large language models (LLMs) exhibit predominantly unidirectional behavior when processing logically bidirectional relationships. Prior wo...
- Controlled Self-Evolution for Algorithmic Code Optimization : Abstract: Self-evolution methods enhance code generation through iterative "generate-verify-refine" cycles, yet existing approaches suffer from low exploration efficiency, failing to discover solution...
- Reward Modeling from Natural Language Human Feedback : Abstract: Reinforcement Learning with Verifiable reward (RLVR) on preference data has become the mainstream approach for training Generative Reward Models (GRMs). Typically in pairwise rewarding tasks...
- Beyond Hard Masks: Progressive Token Evolution for Diffusion Language Models : Abstract: Diffusion Language Models (DLMs) offer a promising alternative for language modeling by enabling parallel decoding through iterative refinement. However, most DLMs rely on hard binary maskin...
- TALON: Confidence-Aware Speculative Decoding with Adaptive Token Trees : Abstract: Speculative decoding (SD) has become a standard technique for accelerating LLM inference without sacrificing output quality. Recent advances in speculative decoding have shifted from sequent...
- Semantic Compression of LLM Instructions via Symbolic Metalanguages : Abstract: We introduce MetaGlyph, a symbolic language for compressing prompts by encoding instructions as mathematical symbols rather than prose. Unlike systems requiring explicit decoding rules, Meta...
- Interpretable Text Classification Applied to the Detection of LLM-generated Creative Writing : Abstract: We consider the problem of distinguishing human-written creative fiction (excerpts from novels) from similar text generated by an LLM. Our results show that, while human observers perform po...
- Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models : Abstract: While Mixture-of-Experts (MoE) scales capacity via conditional computation, Transformers lack a native primitive for knowledge lookup, forcing them to inefficiently simulate retrieval throug...
- GROKE: Vision-Free Navigation Instruction Evaluation via Graph Reasoning on OpenStreetMap : Abstract: The evaluation of navigation instructions remains a persistent challenge in Vision-and-Language Navigation (VLN) research. Traditional reference-based metrics such as BLEU and ROUGE fail to ...
- Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations : Abstract: Despite their impressive capabilities, large language models (LLMs) frequently generate hallucinations. Previous work shows that their internal states encode rich signals of truthfulness, ye...
- SAD: A Large-Scale Strategic Argumentative Dialogue Dataset : Abstract: Argumentation generation has attracted substantial research interest due to its central role in human reasoning and decision-making. However, most existing argumentative corpora focus on non...
- KALE: Enhancing Knowledge Manipulation in Large Language Models via Knowledge-aware Learning : Abstract: Despite the impressive performance of large language models (LLMs) pretrained on vast knowledge corpora, advancing their knowledge manipulation-the ability to effectively recall, reason, and...
- Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation : Abstract: While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability...
- High-Rank Structured Modulation for Parameter-Efficient Fine-Tuning : Abstract: As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) us...
- Thinking Before Constraining: A Unified Decoding Framework for Large Language Models : Abstract: Natural generation allows Language Models (LMs) to produce free-form responses with rich reasoning, but the lack of guaranteed structure makes outputs difficult to parse or verify. Structure...
- From RAG to Agentic RAG for Faithful Islamic Question Answering : Abstract: LLMs are increasingly used for Islamic question answering, where ungrounded responses may carry serious religious consequences. Yet standard MCQ/MRC-style evaluations do not capture key real...
- A Unified Framework for Emotion Recognition and Sentiment Analysis via Expert-Guided Multimodal Fusion with Large Language Models : Abstract: Multimodal emotion understanding requires effective integration of text, audio, and visual modalities for both discrete emotion recognition and continuous sentiment analysis. We present EGMF...
- ES-Mem: Event Segmentation-Based Memory for Long-Term Dialogue Agents : Abstract: Memory is critical for dialogue agents to maintain coherence and enable continuous adaptation in long-term interactions. While existing memory mechanisms offer basic storage and retrieval ca...
- Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments : Abstract: Large language models are increasingly being used to assess and forecast research ideas, yet we lack scalable ways to evaluate the quality of models' judgments about these scientific ideas. ...
- Integrating Machine-Generated Short Descriptions into the Wikipedia Android App: A Pilot Deployment of Descartes : Abstract: Short descriptions are a key part of the Wikipedia user experience, but their coverage remains uneven across languages and topics. In previous work, we introduced Descartes, a multilingual m...
- PlaM: Training-Free Plateau-Guided Model Merging for Better Visual Grounding in MLLMs : Abstract: Multimodal Large Language Models (MLLMs) rely on strong linguistic reasoning inherited from their base language models. However, multimodal instruction fine-tuning paradoxically degrades thi...
- Order in the Evaluation Court: A Critical Analysis of NLG Evaluation Trends : Abstract: Despite advances in Natural Language Generation (NLG), evaluation remains challenging. Although various new metrics and LLM-as-a-judge (LaaJ) methods are proposed, human judgment persists as...
- Exploring the Meta-level Reasoning of Large Language Models via a Tool-based Multi-hop Tabular Question Answering Task : Abstract: Recent advancements in Large Language Models (LLMs) are increasingly focused on "reasoning" ability, a concept with many overlapping definitions in the LLM discourse. We take a more structur...
- Emotional Support Evaluation Framework via Controllable and Diverse Seeker Simulator : Abstract: As emotional support chatbots have recently gained significant traction across both research and industry, a common evaluation strategy has emerged: use help-seeker simulators to interact wi...
- Is Agentic RAG worth it? An experimental comparison of RAG approaches : Abstract: Retrieval-Augmented Generation (RAG) systems are usually defined by the combination of a generator and a retrieval component that extracts textual context from a knowledge base to answer use...
- Structure First, Reason Next: Enhancing a Large Language Model using Knowledge Graph for Numerical Reasoning in Financial Documents : Abstract: Numerical reasoning is an important task in the analysis of financial documents. It helps in understanding and performing numerical predictions with logical conclusions for the given query s...
- Contrastive Learning with Narrative Twins for Modeling Story Salience : Abstract: Understanding narratives requires identifying which events are most salient for a story's progression. We present a contrastive learning framework for modeling narrative salience that learns...
- Enhancing Self-Correction in Large Language Models through Multi-Perspective Reflection : Abstract: While Chain-of-Thought (CoT) prompting advances LLM reasoning, challenges persist in consistency, accuracy, and self-correction, especially for complex or ethically sensitive tasks. Existing...
- Beyond Single-Shot: Multi-step Tool Retrieval via Query Planning : Abstract: LLM agents operating over massive, dynamic tool libraries rely on effective retrieval, yet standard single-shot dense retrievers struggle with complex requests. These failures primarily stem...
- Kinship Data Benchmark for Multi-hop Reasoning : Abstract: Large language models (LLMs) are increasingly evaluated on their ability to perform multi-hop reasoning, i.e., to combine multiple pieces of information into a coherent inference. We introdu...
- Learning Through Dialogue: Unpacking the Dynamics of Human-LLM Conversations on Political Issues : Abstract: Large language models (LLMs) are increasingly used as conversational partners for learning, yet the interactional dynamics supporting users' learning and engagement are understudied. We anal...
- Reference Games as a Testbed for the Alignment of Model Uncertainty and Clarification Requests : Abstract: In human conversation, both interlocutors play an active role in maintaining mutual understanding. When addressees are uncertain about what speakers mean, for example, they can request clari...
- "They parted illusions -- they parted disclaim marinade": Misalignment as structural fidelity in LLMs : Abstract: The prevailing technical literature in AI Safety interprets scheming and sandbagging behaviors in large language models (LLMs) as indicators of deceptive agency or hidden objectives. This tr...
- Why Slop Matters : Abstract: AI-generated "slop" is often seen as digital pollution. We argue that this dismissal of the topic risks missing important aspects of AI Slop that deserve rigorous study. AI Slop serves a soc...
- La norme technique comme catalyseur de transfert de connaissances : la francophonie a l'{\oe}uvre dans le domaine de l'{\'e}ducation : Abstract: Standards are adopted in a wide range of fields, both technical and industrial, as well as socio-economic, cultural and linguistic. They are presented explicitly as laws and regulations, tec...
- Comment on arXiv:2511.21731v1: Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition : Abstract: This note is a friendly technical check of arXiv:2511.21731v1. I highlight a few places where the manuscript's interpretation of (i) the reported CHSH/Bell-type calculations and (ii) Bose--E...
- From RLHF to Direct Alignment: A Theoretical Unification of Preference Learning for Large Language Models : Abstract: Aligning large language models (LLMs) with human preferences has become essential for safe and beneficial AI deployment. While Reinforcement Learning from Human Feedback (RLHF) established t...
- Structure-Aware Diversity Pursuit as an AI Safety Strategy against Homogenization : Abstract: Generative AI models reproduce the biases in the training data and can further amplify them through mode collapse. We refer to the resulting harmful loss of diversity as homogenization. Our ...
- An evaluation of LLMs for political bias in Western media: Israel-Hamas and Ukraine-Russia wars : Abstract: Political bias in media plays a critical role in shaping public opinion, voter behaviour, and broader democratic discourse. Subjective opinions and political bias can be found in media sourc...
- Attention Mechanism and Heuristic Approach: Context-Aware File Ranking Using Multi-Head Self-Attention : Abstract: The identification and ranking of impacted files within software reposi-tories is a key challenge in change impact analysis. Existing deterministic approaches that combine heuristic signals,...
- Political Alignment in Large Language Models: A Multidimensional Audit of Psychometric Identity and Behavioral Bias : Abstract: As large language models (LLMs) are increasingly integrated into social decision-making, understanding their political positioning and alignment behavior is critical for safety and fairness....
- Classroom AI: Large Language Models as Grade-Specific Teachers : Abstract: Large Language Models (LLMs) offer a promising solution to complement traditional teaching and address global teacher shortages that affect hundreds of millions of children, but they fail to...
- BizFinBench.v2: A Unified Dual-Mode Bilingual Benchmark for Expert-Level Financial Capability Alignment : Abstract: Large language models have undergone rapid evolution, emerging as a pivotal technology for intelligence in financial operations. However, existing benchmarks are often constrained by pitfall...
- Tone Matters: The Impact of Linguistic Tone on Hallucination in VLMs : Abstract: Vision-Language Models (VLMs) are increasingly used in safety-critical applications that require reliable visual grounding. However, these models often hallucinate details that are not prese...
- BabyVision: Visual Reasoning Beyond Language : Abstract: While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual un...
- Task Arithmetic with Support Languages for Low-Resource ASR : Abstract: The development of resource-constrained approaches to automatic speech recognition (ASR) is of great interest due to its broad applicability to many low-resource languages for which there is...
- When Abundance Conceals Weakness: Knowledge Conflict in Multilingual Models : Abstract: Large Language Models (LLMs) encode vast world knowledge across multiple languages, yet their internal beliefs are often unevenly distributed across linguistic spaces. When external evidence...
- Engineering of Hallucination in Generative AI: It's not a Bug, it's a Feature : Abstract: Generative artificial intelligence (AI) is conquering our lives at lightning speed. Large language models such as ChatGPT answer our questions or write texts for us, large computer vision mo...
- The Need for a Socially-Grounded Persona Framework for User Simulation : Abstract: Synthetic personas are widely used to condition large language models (LLMs) for social simulation, yet most personas are still constructed from coarse sociodemographic attributes or summari...
- IndRegBias: A Dataset for Studying Indian Regional Biases in English and Code-Mixed Social Media Comments : Abstract: Warning: This paper consists of examples representing regional biases in Indian regions that might be offensive towards a particular region. While social biases corresponding to gender, race...
- Spec-o3: A Tool-Augmented Vision-Language Agent for Rare Celestial Object Candidate Vetting via Automated Spectral Inspection : Abstract: Due to the limited generalization and interpretability of deep learning classifiers, The final vetting of rare celestial object candidates still relies on expert visual inspection--a manuall...
- MedRAGChecker: Claim-Level Verification for Biomedical Retrieval-Augmented Generation : Abstract: Biomedical retrieval-augmented generation (RAG) can ground LLM answers in medical literature, yet long-form outputs often contain isolated unsupported or contradictory claims with safety imp...
- Atomic-SNLI: Fine-Grained Natural Language Inference through Atomic Fact Decomposition : Abstract: Current Natural Language Inference (NLI) systems primarily operate at the sentence level, providing black-box decisions that lack explanatory power. While atomic-level NLI offers a promising...
- Expos\'ia: Academic Writing Assessment of Expos\'es and Peer Feedback : Abstract: We present Exposía, the first public dataset that connects writing and feedback assessment in higher education, enabling research on educationally grounded approaches to academic writing eva...
- CSR-RAG: An Efficient Retrieval System for Text-to-SQL on the Enterprise Scale : Abstract: Natural language to SQL translation (Text-to-SQL) is one of the long-standing problems that has recently benefited from advances in Large Language Models (LLMs). While most academic Text-to-...
- EVM-QuestBench: An Execution-Grounded Benchmark for Natural-Language Transaction Code Generation : Abstract: Large language models are increasingly applied to various development scenarios. However, in on-chain transaction scenarios, even a minor error can cause irreversible loss for users. Existin...
- Are Emotions Arranged in a Circle? Geometric Analysis of Emotion Representations via Hyperspherical Contrastive Learning : Abstract: Psychological research has long utilized circumplex models to structure emotions, placing similar emotions adjacently and opposing ones diagonally. Although frequently used to interpret deep...
- Stylistic Evolution and LLM Neutrality in Singlish Language : Abstract: Singlish is a creole rooted in Singapore's multilingual environment and continues to evolve alongside social and technological change. This study investigates the evolution of Singlish over ...
- How Context Shapes Truth: Geometric Transformations of Statement-level Truth Representations in LLMs : Abstract: Large Language Models (LLMs) often encode whether a statement is true as a vector in their residual stream activations. These vectors, also known as truth vectors, have been studied in prior...
- Probing Multimodal Large Language Models on Cognitive Biases in Chinese Short-Video Misinformation : Abstract: Short-video platforms have become major channels for misinformation, where deceptive claims frequently leverage visual experiments and social cues. While Multimodal Large Language Models (ML...
- N2N-GQA: Noise-to-Narrative for Graph-Based Table-Text Question Answering Using LLMs : Abstract: Multi-hop question answering over hybrid table-text data requires retrieving and reasoning across multiple evidence pieces from large corpora, but standard Retrieval-Augmented Generation (RA...
- Efficient and Reliable Estimation of Named Entity Linking Quality: A Case Study on GutBrainIE : Abstract: Named Entity Linking (NEL) is a core component of biomedical Information Extraction (IE) pipelines, yet assessing its quality at scale is challenging due to the high cost of expert annotatio...
- Labels have Human Values: Value Calibration of Subjective Tasks : Abstract: Building NLP systems for subjective tasks requires one to ensure their alignment to contrasting human values. We propose the MultiCalibrated Subjective Task Learner framework (MC-STL), which...
- MedEinst: Benchmarking the Einstellung Effect in Medical LLMs through Counterfactual Differential Diagnosis : Abstract: Despite achieving high accuracy on medical benchmarks, LLMs exhibit the Einstellung Effect in clinical diagnosis--relying on statistical shortcuts rather than patient-specific evidence, caus...
- Efficient Aspect Term Extraction using Spiking Neural Network : Abstract: Aspect Term Extraction (ATE) identifies aspect terms in review sentences, a key subtask of sentiment analysis. While most existing approaches use energy-intensive deep neural networks (DNNs)...
- Do Language Models Reason Across Languages? : Abstract: The real-world information sources are inherently multilingual, which naturally raises a question about whether language models can synthesize information across languages. In this paper, we...
- What makes for an enjoyable protagonist? An analysis of character warmth and competence : Abstract: Drawing on psychological and literary theory, we investigated whether the warmth and competence of movie protagonists predict IMDb ratings, and whether these effects vary across genres. Usin...
- InFi-Check: Interpretable and Fine-Grained Fact-Checking of LLMs : Abstract: Large language models (LLMs) often hallucinate, yet most existing fact-checking methods treat factuality evaluation as a binary classification problem, offering limited interpretability and ...
- Will it Merge? On The Causes of Model Mergeability : Abstract: Model merging has emerged as a promising technique for combining multiple fine-tuned models into a single multitask model without retraining. However, the factors that determine whether merg...
- Evaluating Cross-Lingual Unlearning in Multilingual Language Models : Abstract: We present the first comprehensive evaluation of cross-lingual unlearning in multilingual LLMs. Using translated TOFU benchmarks in seven language/script variants, we test major unlearning a...
- IDRBench: Interactive Deep Research Benchmark : Abstract: Deep research agents powered by Large Language Models (LLMs) can perform multi-step reasoning, web exploration, and long-form report generation. However, most existing systems operate in an ...
- Characterising Toxicity in Generative Large Language Models : Abstract: In recent years, the advent of the attention mechanism has significantly advanced the field of natural language processing (NLP), revolutionizing text processing and text generation. This ha...
- GRASP LoRA: GRPO Guided Adapter Sparsity Policy for Cross Lingual Transfer : Abstract: Parameter efficient fine tuning is a way to adapt LLMs to new languages when compute or data are limited, yet adapter pipelines usually choose a global prune ratio by grid search. This pract...
- Evaluating Accounting Reasoning Capabilities of Large Language Models : Abstract: Large language models are transforming learning, cognition, and research across many fields. Effectively integrating them into professional domains, such as accounting, is a key challenge fo...
- Towards Computational Chinese Paleography : Abstract: Chinese paleography, the study of ancient Chinese writing, is undergoing a computational turn powered by artificial intelligence. This position paper charts the trajectory of this emerging f...
- MTMCS-Bench: Evaluating Contextual Safety of Multimodal Large Language Models in Multi-Turn Dialogues : Abstract: Multimodal large language models (MLLMs) are increasingly deployed as assistants that interact through text and images, making it crucial to evaluate contextual safety when risk depends on b...
- Multi-Stage Evolutionary Model Merging with Meta Data Driven Curriculum Learning for Sentiment-Specialized Large Language Modeling : Abstract: The emergence of large language models (LLMs) has significantly transformed natural language processing (NLP), enabling more generalized models to perform various tasks with minimal training...
- EpiCaR: Knowing What You Don't Know Matters for Better Reasoning in LLMs : Abstract: Improving the reasoning abilities of large language models (LLMs) has largely relied on iterative self-training with model-generated data. While effective at boosting accuracy, existing appr...
- Garbage Attention in Large Language Models: BOS Sink Heads and Sink-aware Pruning : Abstract: Large Language Models (LLMs) are known to contain significant redundancy, yet a systematic explanation for why certain components, particularly in higher layers, are more redundant has remai...
- CIRAG: Construction-Integration Retrieval and Adaptive Generation for Multi-hop Question Answering : Abstract: Triple-based Iterative Retrieval-Augmented Generation (iRAG) mitigates document-level noise for multi-hop question answering. However, existing methods still face limitations: (i) greedy sin...
- Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition : Abstract: Although many Automatic Speech Recognition (ASR) systems have been developed for Modern Standard Arabic (MSA) and Dialectal Arabic (DA), few studies have focused on dialect-specific implemen...
- Forest Before Trees: Latent Superposition for Efficient Visual Reasoning : Abstract: While Chain-of-Thought empowers Large Vision-Language Models with multi-step reasoning, explicit textual rationales suffer from an information bandwidth bottleneck, where continuous visual d...
- AgentHallu: Benchmarking Automated Hallucination Attribution of LLM-based Agents : Abstract: As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unli...
- PDR: A Plug-and-Play Positional Decay Framework for LLM Pre-training Data Detection : Abstract: Detecting pre-training data in Large Language Models (LLMs) is crucial for auditing data privacy and copyright compliance, yet it remains challenging in black-box, zero-shot settings where c...
- Explainable Multimodal Aspect-Based Sentiment Analysis with Dependency-guided Large Language Model : Abstract: Multimodal aspect-based sentiment analysis (MABSA) aims to identify aspect-level sentiments by jointly modeling textual and visual information, which is essential for fine-grained opinion un...
- BiasLab: A Multilingual, Dual-Framing Framework for Robust Measurement of Output-Level Bias in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes contexts where their outputs influence real-world decisions. However, evaluating bias in LLM outputs remains methodologi...
- Fine-grained Verbal Attack Detection via a Hierarchical Divide-and-Conquer Framework : Abstract: In the digital era, effective identification and analysis of verbal attacks are essential for maintaining online civility and ensuring social security. However, existing research is limited ...
- TreePS-RAG: Tree-based Process Supervision for Reinforcement Learning in Agentic RAG : Abstract: Agentic retrieval-augmented generation (RAG) formulates question answering as a multi-step interaction between reasoning and information retrieval, and has recently been advanced by reinforc...
- Symphonym: Universal Phonetic Embeddings for Cross-Script Toponym Matching via Teacher-Student Distillation : Abstract: Linking place names across languages and writing systems is a fundamental challenge in digital humanities and geographic information retrieval. Existing approaches rely on language-specific ...
- RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction : Abstract: As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchm...
- Categorize Early, Integrate Late: Divergent Processing Strategies in Automatic Speech Recognition : Abstract: In speech language modeling, two architectures dominate the frontier: the Transformer and the Conformer. However, it remains unknown whether their comparable performance stems from convergen...
- LLMs Can't Play Hangman: On the Necessity of a Private Working Memory for Language Agents : Abstract: As LLMs move from text completion toward autonomous agents, they remain constrained by the standard chat interface, which lacks private working memory. This raises a fundamental question: ca...
- UETQuintet at BioCreative IX - MedHopQA: Enhancing Biomedical QA with Selective Multi-hop Reasoning and Contextual Retrieval : Abstract: Biomedical Question Answering systems play a critical role in processing complex medical queries, yet they often struggle with the intricate nature of medical data and the demand for multi-h...
- MedTutor: A Retrieval-Augmented LLM System for Case-Based Medical Education : Abstract: The learning process for medical residents presents significant challenges, demanding both the ability to interpret complex case reports and the rapid acquisition of accurate medical knowled...
- Lexicalized Constituency Parsing for Middle Dutch: Low-resource Training and Cross-Domain Generalization : Abstract: Recent years have seen growing interest in applying neural networks and contextualized word embeddings to the parsing of historical languages. However, most advances have focused on dependen...
- TurkBench: A Benchmark for Evaluating Turkish Large Language Models : Abstract: With the recent surge in the development of large language models, the need for comprehensive and language-specific evaluation benchmarks has become critical. While significant progress has ...
- Solar Open Technical Report : Abstract: We introduce Solar Open, a 102B-parameter bilingual Mixture-of-Experts language model for underserved languages. Solar Open demonstrates a systematic methodology for building competitive LLM...
- Codified Foreshadowing-Payoff Text Generation : Abstract: Foreshadowing and payoff are ubiquitous narrative devices through which authors introduce commitments early in a story and resolve them through concrete, observable outcomes. However, despit...
- TeleMem: Building Long-Term and Multimodal Memory for Agentic AI : Abstract: Large language models (LLMs) excel at many NLP tasks but struggle to sustain long-term interactions due to limited attention over extended dialogue histories. Retrieval-augmented generation ...
- Operation Veja: Fixing Fundamental Concepts Missing from Modern Roleplaying Training Paradigms : Abstract: Modern roleplaying models are increasingly sophisticated, yet they consistently struggle to capture the essence of believable, engaging characters. We argue this failure stems from training ...
- Lexical and Statistical Analysis of Bangla Newspaper and Literature: A Corpus-Driven Study on Diversity, Readability, and NLP Adaptation : Abstract: In this paper, we present a comprehensive corpus-driven analysis of Bangla literary and newspaper texts to investigate their lexical diversity, structural complexity and readability. We unde...
- A Multi-Stage Workflow for the Review of Marketing Content with Reasoning Large Language Models : Abstract: Reasoning Large Language Models (LLMs) have shown promising results when tasked with solving complex problems. In this paper, we propose and evaluate a multi-stage workflow that leverages th...
- AzeroS: Extending LLM to Speech with Self-Generated Instruction-Free Tuning : Abstract: Extending large language models (LLMs) to the speech domain has recently gained significant attention. A typical approach connects a pretrained LLM with an audio encoder through a projection...
- Amory: Building Coherent Narrative-Driven Agent Memory through Agentic Reasoning : Abstract: Long-term conversational agents face a fundamental scalability challenge as interactions extend over time: repeatedly processing entire conversation histories becomes computationally prohibi...
- Why LoRA Fails to Forget: Regularized Low-Rank Adaptation Against Backdoors in Language Models : Abstract: Low-Rank Adaptation (LoRA) is widely used for parameter-efficient fine-tuning of large language models, but it is notably ineffective at removing backdoor behaviors from poisoned pretrained ...
- SyntaxMind at BLP-2025 Task 1: Leveraging Attention Fusion of CNN and GRU for Hate Speech Detection : Abstract: This paper describes our system used in the BLP-2025 Task 1: Hate Speech Detection. We participated in Subtask 1A and Subtask 1B, addressing hate speech classification in Bangla text. Our ap...
- A Rising Tide Lifts All Boats: MTQE Rewards for Idioms Improve General Translation Quality : Abstract: Non-compositional expressions (e.g., idioms, proverbs, and metaphors) pose significant challenges for neural machine translation systems because their meanings cannot be derived from individ...
- Annotating Dimensions of Social Perception in Text: The First Sentence-Level Dataset of Warmth and Competence : Abstract: Warmth (W) (often further broken down into Trust (T) and Sociability (S)) and Competence (C) are central dimensions along which people evaluate individuals and social groups (Fiske, 2018). W...
- On the Fallacy of Global Token Perplexity in Spoken Language Model Evaluation : Abstract: Generative spoken language models pretrained on large-scale raw audio can continue a speech prompt with appropriate content while preserving attributes like speaker and emotion, serving as f...
- What Matters When Building Universal Multilingual Named Entity Recognition Models? : Abstract: Recent progress in universal multilingual named entity recognition (NER) has been driven by advances in multilingual transformer models and task-specific architectures, loss functions, and t...
- Average shortest-path length in word-adjacency networks: Chinese versus English : Abstract: Complex networks provide powerful tools for analyzing and understanding the intricate structures present in various systems, including natural language. Here, we analyze topology of growing ...
- Talking to Extraordinary Objects: Folktales Offer Analogies for Interacting with Technology : Abstract: Speech and language are valuable for interacting with technology. It would be ideal to be able to decouple their use from anthropomorphization, which has recently met an important moment of ...
- AfriqueLLM: How Data Mixing and Model Architecture Impact Continued Pre-training for African Languages : Abstract: Large language models (LLMs) are increasingly multilingual, yet open models continue to underperform relative to proprietary systems, with the gap most pronounced for African languages. Cont...
- MITRA: A Large-Scale Parallel Corpus and Multilingual Pretrained Language Model for Machine Translation and Semantic Retrieval for P\=ali, Sanskrit, Buddhist Chinese, and Tibetan : Abstract: Ancient Buddhist literature features frequent, yet often unannotated, textual parallels spread across diverse languages: Sanskrit, Pāli, Buddhist Chinese, Tibetan, and more. The scale of thi...
- Steer Model beyond Assistant: Controlling System Prompt Strength via Contrastive Decoding : Abstract: Large language models excel at complex instructions yet struggle to deviate from their helpful assistant persona, as post-training instills strong priors that resist conflicting instructions...
- Value of Information: A Framework for Human-Agent Communication : Abstract: Large Language Model (LLM) agents deployed for real-world tasks face a fundamental dilemma: user requests are underspecified, yet agents must decide whether to act on incomplete information ...
- Structured Episodic Event Memory : Abstract: Current approaches to memory in Large Language Models (LLMs) predominantly rely on static Retrieval-Augmented Generation (RAG), which often results in scattered retrieval and fails to captur...
- Can a Unimodal Language Agent Provide Preferences to Tune a Multimodal Vision-Language Model? : Abstract: To explore a more scalable path for adding multimodal capabilities to existing LLMs, this paper addresses a fundamental question: Can a unimodal LLM, relying solely on text, reason about its...
- NC-Bench: An LLM Benchmark for Evaluating Conversational Competence : Abstract: The Natural Conversation Benchmark (NC-Bench) introduce a new approach to evaluating the general conversational competence of large language models (LLMs). Unlike prior benchmarks that focus...
- Time Travel Engine: A Shared Latent Chronological Manifold Enables Historical Navigation in Large Language Models : Abstract: Time functions as a fundamental dimension of human cognition, yet the mechanisms by which Large Language Models (LLMs) encode chronological progression remain opaque. We demonstrate that tem...
- LitVISTA: A Benchmark for Narrative Orchestration in Literary Text : Abstract: Computational narrative analysis aims to capture rhythm, tension, and emotional dynamics in literary texts. Existing large language models can generate long stories but overly focus on causa...
- Accumulation of Sub-Sampling Matrices with Applications to Statistical Computation : Abstract: With appropriately chosen sampling probabilities, sampling-based random projection can be used to implement large-scale statistical methods, substantially reducing computational cost while m...
- Approximating Persistent Homology for Large Datasets : Abstract: Persistent homology is an important methodology in topological data analysis which adapts theory from algebraic topology to data settings. Computing persistent homology produces persistence ...
- Generative Modeling via Hierarchical Tensor Sketching : Abstract: We propose a hierarchical tensor-network approach for approximating high-dimensional probability density via empirical distribution. This leverages randomized singular value decomposition (S...
- The Interpolating Information Criterion for Overparameterized Models : Abstract: The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset. Classical information criteria...
- A Convex Framework for Confounding Robust Inference : Abstract: We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case c...
- Learning Operators with Stochastic Gradient Descent in General Hilbert Spaces : Abstract: This study investigates leveraging stochastic gradient descent (SGD) to learn operators between general Hilbert spaces. We propose weak and strong regularity conditions for the target operat...
- Reimagining Anomalies: What If Anomalies Were Normal? : Abstract: Deep learning-based methods have achieved a breakthrough in image anomaly detection, but their complexity introduces a considerable challenge to understanding why an instance is predicted to...
- Low-Rank Online Dynamic Assortment with Dual Contextual Information : Abstract: As e-commerce expands, delivering real-time personalized recommendations from vast catalogs poses a critical challenge for retail platforms. Maximizing revenue requires careful consideration...
- Hierarchic Flows to Estimate and Sample High-dimensional Probabilities : Abstract: Finding low-dimensional interpretable models of complex physical fields such as turbulence remains an open question, 80 years after the pioneer work of Kolmogorov. Estimating high-dimensiona...
- Low-Dimensional Federated Knowledge Graph Embedding via Knowledge Distillation : Abstract: Federated Knowledge Graph Embedding (FKGE) aims to facilitate collaborative learning of entity and relation embeddings from distributed Knowledge Graphs (KGs) across multiple clients, while ...
- Point processes with event time uncertainty : Abstract: Point processes are widely used statistical models for continuous-time discrete event data, such as medical records, crime reports, and social network interactions, to capture the influence ...
- Memory-Efficient Training for Text-Dependent SV with Independent Pre-trained Models : Abstract: This paper presents our submission to the Iranian division of the Text-Dependent Speaker Verification Challenge (TdSV) 2024. Conventional TdSV approaches typically jointly model speaker and ...
- Berezinskii--Kosterlitz--Thouless transition in a context-sensitive random language model : Abstract: Several power-law critical properties involving different statistics in natural languages -- reminiscent of scaling properties of physical systems at or near phase transitions -- have been d...
- DB3 Team's Solution For Meta KDD Cup' 25 : Abstract: This paper presents the db3 team's winning solution for the Meta CRAG-MM Challenge 2025 at KDD Cup'25. Addressing the challenge's unique multi-modal, multi-turn question answering benchmark ...
- Convergence Rate Analysis of the AdamW-Style Shampoo: Unifying One-sided and Two-Sided Preconditioning : Abstract: This paper studies the AdamW-style Shampoo optimizer, an effective implementation of classical Shampoo that notably won the external tuning track of the AlgoPerf neural network training algo...
- SEE: Signal Embedding Energy for Quantifying Noise Interference in Large Audio Language Models : Abstract: Large Audio Language Models (LALMs) have been widely applied in real-time scenarios, such as in-car assistants and online meeting comprehension. In practice, audio inputs are often corrupted...
- Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning : Abstract: Group Relative Policy Optimization (GRPO) has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks. However, standard GRPO employs a coarse-grained credit a...
- Position: Don't be Afraid of Over-Smoothing And Over-Squashing : Abstract: Over-smoothing and over-squashing have been extensively studied in the literature on Graph Neural Networks (GNNs) over the past years. We challenge this prevailing focus in GNN research, arg...
- PIDT: Physics-Informed Digital Twin for Optical Fiber Parameter Estimation : Abstract: We propose physics-informed digital twin (PIDT): a fiber parameter estimation approach that combines a parameterized split-step method with a physics-informed loss. PIDT improves accuracy an...
- Improving Video Question Answering through query-based frame selection : Abstract: Video Question Answering (VideoQA) models enhance understanding and interaction with audiovisual content, making it more accessible, searchable, and useful for a wide range of fields such as...
- Puzzle it Out: Local-to-Global World Model for Offline Multi-Agent Reinforcement Learning : Abstract: Offline multi-agent reinforcement learning (MARL) aims to solve cooperative decision-making problems in multi-agent systems using pre-collected datasets. Existing offline MARL methods primar...
- The Secretary Problem with Predictions and a Chosen Order : Abstract: We study a learning-augmented variant of the secretary problem, recently introduced by Fujii and Yoshida (2023), in which the decision-maker has access to machine-learned predictions of cand...
- Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions : Abstract: Vision-language models are increasingly employed as multimodal conversational agents (MCAs) for diverse conversational tasks. Recently, reinforcement learning (RL) has been widely explored f...
- Nonparametric Kernel Clustering with Bandit Feedback : Abstract: Clustering with bandit feedback refers to the problem of partitioning a set of items, where the clustering algorithm can sequentially query the items to receive noisy observations. The probl...
- An adjoint method for training data-driven reduced-order models : Abstract: Reduced-order modeling lies at the interface of numerical analysis and data-driven scientific computing, providing principled ways to compress high-fidelity simulations in science and engine...
- Large Language Models for Physics Instrument Design : Abstract: We study the use of large language models (LLMs) for physics instrument design and compare their performance to reinforcement learning (RL). Using only prompting, LLMs are given task constra...
- Machine learning nonequilibrium phase transitions in charge-density wave insulators : Abstract: Nonequilibrium electronic forces play a central role in voltage-driven phase transitions but are notoriously expensive to evaluate in dynamical simulations. Here we develop a machine learnin...
- Temporal-Aligned Meta-Learning for Risk Management: A Stacking Approach for Multi-Source Credit Scoring : Abstract: This paper presents a meta-learning framework for credit risk assessment of Italian Small and Medium Enterprises (SMEs) that explicitly addresses the temporal misalignment of credit scoring ...
- GRPO with State Mutations: Improving LLM-Based Hardware Test Plan Generation : Abstract: RTL design often relies heavily on ad-hoc testbench creation early in the design cycle. While large language models (LLMs) show promise for RTL code generation, their ability to reason about...
- Learning About Learning: A Physics Path from Spin Glasses to Artificial Intelligence : Abstract: The Hopfield model, originally inspired by spin-glass physics, occupies a central place at the intersection of statistical mechanics, neural networks, and modern artificial intelligence. Des...
- Reinforcement Learning for Micro-Level Claims Reserving : Abstract: Outstanding claim liabilities are revised repeatedly as claims develop, yet most modern reserving models are trained as one-shot predictors and typically learn only from settled claims. We f...
- Dual-Level Models for Physics-Informed Multi-Step Time Series Forecasting : Abstract: This paper develops an approach for multi-step forecasting of dynamical systems by integrating probabilistic input forecasting with physics-informed output prediction. Accurate multi-step fo...
- Studying the Role of Synthetic Data for Machine Learning-based Wireless Networks Traffic Forecasting : Abstract: Synthetic data generation is an appealing tool for augmenting and enriching datasets, playing a crucial role in advancing artificial intelligence (AI) and machine learning (ML). Not only doe...
- Active Evaluation of General Agents: Problem Definition and Comparison of Baseline Algorithms : Abstract: As intelligent agents become more generally-capable, i.e. able to master a wide variety of tasks, the complexity and cost of properly evaluating them rises significantly. Tasks that assess s...
- Learning to accelerate Krasnosel'skii-Mann fixed-point iterations with guarantees : Abstract: We introduce a principled learning to optimize (L2O) framework for solving fixed-point problems involving general nonexpansive mappings. Our idea is to deliberately inject summable perturbat...
- Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference : Abstract: Due to the prevalence of large language models (LLMs), key-value (KV) cache reduction for LLM inference has received remarkable attention. Among numerous works that have been proposed in rec...
- Self-Creating Random Walks for Decentralized Learning under Pac-Man Attacks : Abstract: Random walk (RW)-based algorithms have long been popular in distributed systems due to low overheads and scalability, with recent growing applications in decentralized learning. However, the...
- Physics-Informed Singular-Value Learning for Cross-Covariances Forecasting in Financial Markets : Abstract: A new wave of work on covariance cleaning and nonlinear shrinkage has delivered asymptotically optimal analytical solutions for large covariance matrices. Building on this progress, these id...
- A Framework for Feature Discovery in Intracranial Pressure Monitoring Data Using Neural Network Attention : Abstract: We present a novel framework for analyzing intracranial pressure monitoring data by applying interpretability principles. Intracranial pressure monitoring data was collected from 60 patients...
- Hidden Monotonicity: Explaining Deep Neural Networks via their DC Decomposition : Abstract: It has been demonstrated in various contexts that monotonicity leads to better explainability in neural networks. However, not every function can be well approximated by a monotone neural ne...
- Backward Reconstruction of the Chafee--Infante Equation via Physics-Informed WGAN-GP : Abstract: We present a physics-informed Wasserstein GAN with gradient penalty (WGAN-GP) for solving the inverse Chafee--Infante problem on two-dimensional domains with Dirichlet boundary conditions. T...
- PFT: Phonon Fine-tuning for Machine Learned Interatomic Potentials : Abstract: Many materials properties depend on higher-order derivatives of the potential energy surface, yet machine learned interatomic potentials (MLIPs) trained with standard a standard loss on ener...
- Riesz Representer Fitting under Bregman Divergence: A Unified Framework for Debiased Machine Learning : Abstract: Estimating the Riesz representer is a central problem in debiased machine learning for causal and structural parameter estimation. Various methods for Riesz representer estimation have been ...
- Learning to bin: differentiable and Bayesian optimization for multi-dimensional discriminants in high-energy physics : Abstract: Categorizing events using discriminant observables is central to many high-energy physics analyses. Yet, bin boundaries are often chosen by hand. A simple, popular choice is to apply argmax ...
- The Confidence Trap: Gender Bias and Predictive Certainty in LLMs : Abstract: The increased use of Large Language Models (LLMs) in sensitive domains leads to growing interest in how their confidence scores correspond to fairness and bias. This study examines the align...
- Failure-Aware RL: Reliable Offline-to-Online Reinforcement Learning with Self-Recovery for Real-World Manipulation : Abstract: Post-training algorithms based on deep reinforcement learning can push the limits of robotic models for specific objectives, such as generalizability, accuracy, and robustness. However, Inte...
- A Complete Decomposition of Stochastic Differential Equations : Abstract: We show that any stochastic differential equation with prescribed time-dependent marginal distributions admits a decomposition into three components: a unique scalar field governing marginal...
- Multiple-policy Evaluation via Density Estimation : Abstract: We study the multiple-policy evaluation problem where we are given a set of $K$ policies and the goal is to evaluate their performance (expected total reward over a fixed horizon) to an accu...
- Finite-Time Analysis of Simultaneous Double Q-learning : Abstract: $Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-lear...
- EMP: Enhance Memory in Data Pruning : Abstract: Recently, large language and vision models have shown strong performance, but due to high pre-training and fine-tuning costs, research has shifted towards faster training via dataset pruning...
- $\texttt{skwdro}$: a library for Wasserstein distributionally robust machine learning : Abstract: We present skwdro, a Python library for training robust machine learning models. The library is based on distributionally robust optimization using Wasserstein distances, popular in optimal ...
- Canopy: Property-Driven Learning for Congestion Control : Abstract: Learning-based congestion controllers offer better adaptability compared to traditional heuristics. However, the unreliability of learning techniques can cause learning-based controllers to ...
- Integrated Multivariate Segmentation Tree for Heterogeneous Credit Data Analysis in Small- and Medium-Sized Enterprises : Abstract: Traditional decision tree models, which rely exclusively on numerical variables, often face challenges in handling high-dimensional data and are limited in their ability to incorporate textu...
- PixRec: Leveraging Visual Context for Next-Item Prediction in Sequential Recommendation : Abstract: Large Language Models (LLMs) have recently shown strong potential for usage in sequential recommendation tasks through text-only models, which combine advanced prompt design, contrastive ali...
- Physics-informed Gaussian Process Regression in Solving Eigenvalue Problem of Linear Operators : Abstract: Applying Physics-Informed Gaussian Process Regression to the eigenvalue problem $(\mathcal{L}-λ)u = 0$ poses a fundamental challenge, where the null source term results in a trivial predicti...
- PRISP: Privacy-Safe Few-Shot Personalization via Lightweight Adaptation : Abstract: Large language model (LLM) personalization aims to adapt general-purpose models to individual users. Most existing methods, however, are developed under data-rich and resource-abundant setti...
- Hybrid LSTM-UKF Framework: Ankle Angle and Ground Reaction Force Estimation : Abstract: Accurate prediction of joint kinematics and kinetics is essential for advancing gait analysis and developing intelligent assistive systems such as prosthetics and exoskeletons. This study pr...
- Inference-Time Alignment for Diffusion Models via Doob's Matching : Abstract: Inference-time alignment for diffusion models aims to adapt a pre-trained diffusion model toward a target distribution without retraining the base score network, thereby preserving the gener...
- Pareto-Optimal Model Selection for Low-Cost, Single-Lead EMG Control in Embedded Systems : Abstract: Consumer-grade biosensors offer a cost-effective alternative to medical-grade electromyography (EMG) systems, reducing hardware costs from thousands of dollars to approximately $13. However,...
- SimLLM: Fine-Tuning Code LLMs for SimPy-Based Queueing System Simulation : Abstract: The Python package SimPy is widely used for modeling queueing systems due to its flexibility, simplicity, and smooth integration with modern data analysis and optimization frameworks. Recent...
- Detecting LLM-Generated Text with Performance Guarantees : Abstract: Large language models (LLMs) such as GPT, Claude, Gemini, and Grok have been deeply integrated into our daily life. They now support a wide range of tasks -- from dialogue and email drafting...
- UMLoc: Uncertainty-Aware Map-Constrained Inertial Localization with Quantified Bounds : Abstract: Inertial localization is particularly valuable in GPS-denied environments such as indoors. However, localization using only Inertial Measurement Units (IMUs) suffers from drift caused by mot...
- Object-Centric World Models Meet Monte Carlo Tree Search : Abstract: In this paper, we introduce ObjectZero, a novel reinforcement learning (RL) algorithm that leverages the power of object-level representations to model dynamic environments more effectively....
- Pragya: An AI-Based Semantic Recommendation System for Sanskrit Subhasitas : Abstract: Sanskrit Subhasitas encapsulate centuries of cultural and philosophical wisdom, yet remain underutilized in the digital age due to linguistic and contextual barriers. In this work, we presen...
- Cross-Border Data Security and Privacy Risks in Large Language Models and IoT Systems : Abstract: The reliance of Large Language Models and Internet of Things systems on massive, globally distributed data flows creates systemic security and privacy challenges. When data traverses borders...
- Lower Bounds for the Algorithmic Complexity of Learned Indexes : Abstract: Learned index structures aim to accelerate queries by training machine learning models to approximate the rank function associated with a database attribute. While effective in practice, the...
- A Multimodal Deep Learning Framework for Predicting ICU Deterioration: Integrating ECG Waveforms with Clinical Data and Clinician Benchmarking : Abstract: Artificial intelligence holds strong potential to support clinical decision making in intensive care units where timely and accurate risk assessment is critical. However, many existing model...
- Diffusion Models with Heavy-Tailed Targets: Score Estimation and Sampling Guarantees : Abstract: Score-based diffusion models have become a powerful framework for generative modeling, with score estimation as a central statistical bottleneck. Existing guarantees for score estimation lar...
- DS-CIM: Digital Stochastic Computing-In-Memory Featuring Accurate OR-Accumulation via Sample Region Remapping for Edge AI Models : Abstract: Stochastic computing (SC) offers hardware simplicity but suffers from low throughput, while high-throughput Digital Computing-in-Memory (DCIM) is bottlenecked by costly adder logic for matri...
- Logic-Driven Semantic Communication for Resilient Multi-Agent Systems : Abstract: The advent of 6G networks is accelerating autonomy and intelligence in large-scale, decentralized multi-agent systems (MAS). While this evolution enables adaptive behavior, it also heightens...
- A Backpropagation-Free Feedback-Hebbian Network for Continual Learning Dynamics : Abstract: Feedback-rich neural architectures can regenerate earlier representations and inject temporal context, making them a natural setting for strictly local synaptic plasticity. We ask whether a ...
- Comparative Separation: Evaluating Separation on Comparative Judgment Test Data : Abstract: This research seeks to benefit the software engineering society by proposing comparative separation, a novel group fairness notion to evaluate the fairness of machine learning software on co...
- GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO : Abstract: We present a Bengali mathematical reasoning model called GanitLLM (named after the Bangla word for mathematics, "Ganit"), together with a new difficulty-aware Bengali math corpus and a curri...
- ALFA: A Safe-by-Design Approach to Mitigate Quishing Attacks Launched via Fancy QR Codes : Abstract: Phishing with Quick Response (QR) codes is termed as Quishing. The attackers exploit this method to manipulate individuals into revealing their confidential data. Recently, we see the colorf...
- Dimension-reduced outcome-weighted learning for estimating individualized treatment regimes in observational studies : Abstract: Individualized treatment regimes (ITRs) aim to improve clinical outcomes by assigning treatment based on patient-specific characteristics. However, existing methods often struggle with high-...
- CliffordNet: All You Need is Geometric Algebra : Abstract: Modern computer vision architectures, from CNNs to Transformers, predominantly rely on the stacking of heuristic modules: spatial mixers (Attention/Conv) followed by channel mixers (FFNs). I...
- Thinking with Deltas: Incentivizing Reinforcement Learning via Differential Visual Reasoning Policy : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has significantly advanced reasoning capabilities in Large Language Models. However, adapting RLVR to multimodal domains suffers from a ...
- Constrained Density Estimation via Optimal Transport : Abstract: A novel framework for density estimation under expectation constraints is proposed. The framework minimizes the Wasserstein distance between the estimated density and a prior, subject to the...
- {\dag}DAGGER: Distractor-Aware Graph Generation for Executable Reasoning in Math Problems : Abstract: Chain-of-Thought (CoT) prompting is widely adopted for mathematical problem solving, including in low-resource languages, yet its behavior under irrelevant context remains underexplored. To ...
- Deep Learning Based Channel Extrapolation for Dual-Band Massive MIMO Systems : Abstract: Future wireless communication systems will increasingly rely on the integration of millimeter wave (mmWave) and sub-6 GHz bands to meet heterogeneous demands on high-speed data transmission ...
- qAttCNN - Self Attention Mechanism for Video QoE Prediction in Encrypted Traffic : Abstract: The rapid growth of multimedia consumption, driven by major advances in mobile devices since the mid-2000s, has led to widespread use of video conferencing applications (VCAs) such as Zoom a...
- Applying Embedding-Based Retrieval to Airbnb Search : Abstract: The goal of Airbnb search is to match guests with the ideal accommodation that fits their travel needs. This is a challenging problem, as popular search locations can have around a hundred t...
- Paraphrasing Adversarial Attack on LLM-as-a-Reviewer : Abstract: The use of large language models (LLMs) in peer review systems has attracted growing attention, making it essential to examine their potential vulnerabilities. Prior attacks rely on prompt i...
- Distributional Clarity: The Hidden Driver of RL-Friendliness in Large Language Models : Abstract: Language model families exhibit striking disparity in their capacity to benefit from reinforcement learning: under identical training, models like Qwen achieve substantial gains, while other...
- mind_call: A Dataset for Mental Health Function Calling with Large Language Models : Abstract: Large Language Model (LLM)-based systems increasingly rely on function calling to enable structured and controllable interaction with external data sources, yet existing datasets do not addr...
- X-Coder: Advancing Competitive Programming with Fully Synthetic Tasks, Solutions, and Tests : Abstract: Competitive programming presents great challenges for Code LLMs due to its intensive reasoning demands and high logical complexity. However, current Code LLMs still rely heavily on real-worl...
- The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks : Abstract: The success of deep neural networks largely depends on the statistical structure of the training data. While learning dynamics and generalization on isotropic data are well-established, the ...
- Generalization Bounds for Transformer Channel Decoders : Abstract: Transformer channel decoders, such as the Error Correction Code Transformer (ECCT), have shown strong empirical performance in channel decoding, yet their generalization behavior remains the...
- Match Made with Matrix Completion: Efficient Learning under Matching Interference : Abstract: Matching markets face increasing needs to learn the matching qualities between demand and supply for effective design of matching policies. In practice, the matching rewards are high-dimensi...
- Unity Forests: Improving Interaction Modelling and Interpretability in Random Forests : Abstract: Random forests (RFs) are widely used for prediction and variable importance analysis and are often believed to capture any types of interactions via recursive splitting. However, since the s...
- Conditional Normalizing Flows for Forward and Backward Joint State and Parameter Estimation : Abstract: Traditional filtering algorithms for state estimation -- such as classical Kalman filtering, unscented Kalman filtering, and particle filters - show performance degradation when applied to n...
- Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers : Abstract: Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely drive...
- Fine-Tuning vs. RAG for Multi-Hop Question Answering with Novel Knowledge : Abstract: Multi-hop question answering is widely used to evaluate the reasoning capabilities of large language models (LLMs), as it requires integrating multiple pieces of supporting knowledge to arri...
- Local EGOP for Continuous Index Learning : Abstract: We introduce the setting of continuous index learning, in which a function of many variables varies only along a small number of directions at each point. For efficient estimation, it is ben...
- Robust Mean Estimation under Quantization : Abstract: We consider the problem of mean estimation under quantization and adversarial corruption. We construct multivariate robust estimators that are optimal up to logarithmic factors in two differ...
- XBTorch: A Unified Framework for Modeling and Co-Design of Crossbar-Based Deep Learning Accelerators : Abstract: Emerging memory technologies have gained significant attention as a promising pathway to overcome the limitations of conventional computing architectures in deep learning applications. By en...
- Robust Bayesian Optimization via Tempered Posteriors : Abstract: Bayesian optimization (BO) iteratively fits a Gaussian process (GP) surrogate to accumulated evaluations and selects new queries via an acquisition function such as expected improvement (EI)...
- Enhancing Cloud Network Resilience via a Robust LLM-Empowered Multi-Agent Reinforcement Learning Framework : Abstract: While virtualization and resource pooling empower cloud networks with structural flexibility and elastic scalability, they inevitably expand the attack surface and challenge cyber resilience...
- Proof of Reasoning for Privacy Enhanced Federated Blockchain Learning at the Edge : Abstract: Consensus mechanisms are the core of any blockchain system. However, the majority of these mechanisms do not target federated learning directly nor do they aid in the aggregation step. This ...
- Optimal Transport under Group Fairness Constraints : Abstract: Ensuring fairness in matching algorithms is a key challenge in allocating scarce resources and positions. Focusing on Optimal Transport (OT), we introduce a novel notion of group fairness re...
- AscendKernelGen: A Systematic Study of LLM-Based Kernel Generation for Neural Processing Units : Abstract: To meet the ever-increasing demand for computational efficiency, Neural Processing Units (NPUs) have become critical in modern AI infrastructure. However, unlocking their full potential requ...
- On Lie Groups Preserving Subspaces of Degenerate Clifford Algebras : Abstract: This paper introduces Lie groups in degenerate geometric (Clifford) algebras that preserve four fundamental subspaces determined by the grade involution and reversion under the adjoint and t...
- Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration : Abstract: While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between t...
- Learning to Trust the Crowd: A Multi-Model Consensus Reasoning Engine for Large Language Models : Abstract: Large language models (LLMs) achieve strong aver- age performance yet remain unreliable at the instance level, with frequent hallucinations, brittle failures, and poorly calibrated confidenc...
- Multi-environment Invariance Learning with Missing Data : Abstract: Learning models that can handle distribution shifts is a key challenge in domain generalization. Invariance learning, an approach that focuses on identifying features invariant across enviro...
- A High-Recall Cost-Sensitive Machine Learning Framework for Real-Time Online Banking Transaction Fraud Detection : Abstract: Fraudulent activities on digital banking services are becoming more intricate by the day, challenging existing defenses. While older rule driven methods struggle to keep pace, even precision...
- Covariance-Driven Regression Trees: Reducing Overfitting in CART : Abstract: Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART ...
- ARM: Role-Conditioned Neuron Transplantation for Training-Free Generalist LLM Agent Merging : Abstract: Interactive large language model agents have advanced rapidly, but most remain specialized to a single environment and fail to adapt robustly to other environments. Model merging offers a tr...
- Variational Approximations for Robust Bayesian Inference via Rho-Posteriors : Abstract: The $ρ$-posterior framework provides universal Bayesian estimation with explicit contamination rates and optimal convergence guarantees, but has remained computationally difficult due to an ...
- Circuit Mechanisms for Spatial Relation Generation in Diffusion Transformers : Abstract: Diffusion Transformers (DiTs) have greatly advanced text-to-image generation, but models still struggle to generate the correct spatial relations between objects as specified in the text pro...
- Computational Mapping of Reactive Stroma in Prostate Cancer Yields Interpretable, Prognostic Biomarkers : Abstract: Current histopathological grading of prostate cancer relies primarily on glandular architecture, largely overlooking the tumor microenvironment. Here, we present PROTAS, a deep learning fram...
- Supervised and Unsupervised Neural Network Solver for First Order Hyperbolic Nonlinear PDEs : Abstract: We present a neural network-based method for learning scalar hyperbolic conservation laws. Our method replaces the traditional numerical flux in finite volume schemes with a trainable neural...
- Continual Quantum Architecture Search with Tensor-Train Encoding: Theory and Applications to Signal Processing : Abstract: We introduce CL-QAS, a continual quantum architecture search framework that mitigates the challenges of costly amplitude encoding and catastrophic forgetting in variational quantum circuits....
- On a Gradient Approach to Chebyshev Center Problems with Applications to Function Learning : Abstract: We introduce $\textsf{gradOL}$, the first gradient-based optimization framework for solving Chebyshev center problems, a fundamental challenge in optimal function learning and geometric opti...
- PRPO: Aligning Process Reward with Outcome Reward in Policy Optimization : Abstract: Policy optimization for large language models often suffers from sparse reward signals in multi-step reasoning tasks. Critic-free methods like GRPO assign a single normalized outcome reward ...
- Standardization of Post-Publication Code Verification by Journals is Possible with the Support of the Community : Abstract: Reproducibility remains a challenge in machine learning research. While code and data availability requirements have become increasingly common, post-publication verification in journals is ...
- Beyond Variance: Knowledge-Aware LLM Compression via Fisher-Aligned Subspace Diagnostics : Abstract: Post-training activation compression is essential for deploying Large Language Models (LLMs) on resource-constrained hardware. However, standard methods like Singular Value Decomposition (SV...
- Forward versus Backward: Comparing Reasoning Objectives in Direct Preference Optimization : Abstract: Large language models exhibit impressive reasoning capabilities yet frequently generate plausible but incorrect solutions, a phenomenon commonly termed hallucination. This paper investigates...
- Safeguarding LLM Fine-tuning via Push-Pull Distributional Alignment : Abstract: The inherent safety alignment of Large Language Models (LLMs) is prone to erosion during fine-tuning, even when using seemingly innocuous datasets. While existing defenses attempt to mitigat...
- CalPro: Prior-Aware Evidential--Conformal Prediction with Structure-Aware Guarantees for Protein Structures : Abstract: Deep protein structure predictors such as AlphaFold provide confidence estimates (e.g., pLDDT) that are often miscalibrated and degrade under distribution shifts across experimental modaliti...
- MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization : Abstract: Group-Relative Policy Optimization (GRPO) has emerged as an efficient paradigm for aligning Large Language Models (LLMs), yet its efficacy is primarily confined to domains with verifiable gr...
- DDT: A Dual-Masking Dual-Expert Transformer for Energy Time-Series Forecasting : Abstract: Accurate energy time-series forecasting is crucial for ensuring grid stability and promoting the integration of renewable energy, yet it faces significant challenges from complex temporal de...
- Innovation Capacity of Dynamical Learning Systems : Abstract: In noisy physical reservoirs, the classical information-processing capacity $C_{\mathrm{ip}}$ quantifies how well a linear readout can realize tasks measurable from the input history, yet $C...
- Simulated Annealing-based Candidate Optimization for Batch Acquisition Functions : Abstract: Bayesian Optimization with multi-objective acquisition functions such as q-Expected Hypervolume Improvement (qEHVI) requires efficient candidate optimization to maximize acquisition function...
- Pseudodata-guided Invariant Representation Learning Boosts the Out-of-Distribution Generalization in Enzymatic Kinetic Parameter Prediction : Abstract: Accurate prediction of enzyme kinetic parameters is essential for understanding catalytic mechanisms and guiding enzyme engineering.However, existing deep learning-based enzyme-substrate int...
- Kernel Alignment-based Multi-view Unsupervised Feature Selection with Sample-level Adaptive Graph Learning : Abstract: Although multi-view unsupervised feature selection (MUFS) has demonstrated success in dimensionality reduction for unlabeled multi-view data, most existing methods reduce feature redundancy ...
- Explaining Machine Learning Predictive Models through Conditional Expectation Methods : Abstract: The rapid adoption of complex Artificial Intelligence (AI) and Machine Learning (ML) models has led to their characterization as black boxes due to the difficulty of explaining their interna...
- BEAT-Net: Injecting Biomimetic Spatio-Temporal Priors for Interpretable ECG Classification : Abstract: Although deep learning has advanced automated electrocardiogram (ECG) diagnosis, prevalent supervised methods typically treat recordings as undifferentiated one-dimensional (1D) signals or t...
- Segmental Advantage Estimation: Enhancing PPO for Long-Context LLM Training : Abstract: Training Large Language Models (LLMs) for reasoning tasks is increasingly driven by Reinforcement Learning with Verifiable Rewards (RLVR), where Proximal Policy Optimization (PPO) provides a...
- CompNO: A Novel Foundation Model approach for solving Partial Differential Equations : Abstract: Partial differential equations (PDEs) govern a wide range of physical phenomena, but their numerical solution remains computationally demanding, especially when repeated simulations are requ...
- Computing patient similarity based on unstructured clinical notes : Abstract: Clinical notes hold rich yet unstructured details about diagnoses, treatments, and outcomes that are vital to precision medicine but hard to exploit at scale. We introduce a method that repr...
- On the Non-decoupling of Supervised Fine-tuning and Reinforcement Learning in Post-training : Abstract: Post-training of large language models routinely interleaves supervised fine-tuning (SFT) with reinforcement learning (RL). These two methods have different objectives: SFT minimizes the cro...
- OceanSAR-2: A Universal Feature Extractor for SAR Ocean Observation : Abstract: We present OceanSAR-2, the second generation of our foundation model for SAR-based ocean observation. Building on our earlier release, which pioneered self-supervised learning on Sentinel-1 ...
- SCALPEL: Selective Capability Ablation via Low-rank Parameter Editing for Large Language Model Interpretability Analysis : Abstract: Large language models excel across diverse domains, yet their deployment in healthcare, legal systems, and autonomous decision-making remains limited by incomplete understanding of their int...
- The Practicality of Normalizing Flow Test-Time Training in Bayesian Inference for Agent-Based Models : Abstract: Agent-Based Models (ABMs) are gaining great popularity in economics and social science because of their strong flexibility to describe the realistic and heterogeneous decisions and interacti...
- PLANET v2.0: A comprehensive Protein-Ligand Affinity Prediction Model Based on Mixture Density Network : Abstract: Drug discovery represents a time-consuming and financially intensive process, and virtual screening can accelerate it. Scoring functions, as one of the tools guiding virtual screening, have ...
- Variational Autoencoder with Normalizing flow for X-ray spectral fitting : Abstract: Black hole X-ray binaries (BHBs) can be studied with spectral fitting to provide physical constraints on accretion in extreme gravitational environments. Traditional methods of spectral fitt...
- Surrogate-based Optimization via Clustering for Box-Constrained Problems : Abstract: Global optimization of large-scale, complex systems such as multi-physics black-box simulations and real-world industrial systems is important but challenging. This work presents a novel Sur...
- AntiPaSTO: Self-Supervised Steering of Moral Reasoning : Abstract: As models grow more capable, human supervision breaks down: labels don't scale, outputs can be gamed, and training doesn't generalize. Scalable oversight requires steering methods that are i...
- Task Prototype-Based Knowledge Retrieval for Multi-Task Learning from Partially Annotated Data : Abstract: Multi-task learning (MTL) is critical in real-world applications such as autonomous driving and robotics, enabling simultaneous handling of diverse tasks. However, obtaining fully annotated ...
- ARCQuant: Boosting NVFP4 Quantization with Augmented Residual Channels for LLMs : Abstract: The emergence of fine-grained numerical formats like NVFP4 presents new opportunities for efficient Large Language Model (LLM) inference. However, it is difficult to adapt existing Post-Trai...
- Graph Inference Towards ICD Coding : Abstract: Automated ICD coding involves assigning standardized diagnostic codes to clinical narratives. The vast label space and extreme class imbalance continue to challenge precise prediction. To ad...
- FROAV: A Framework for RAG Observation and Agent Verification - Lowering the Barrier to LLM Agent Research : Abstract: The rapid advancement of Large Language Models (LLMs) and their integration into autonomous agent systems has created unprecedented opportunities for document analysis, decision support, and...
- Land-then-transport: A Flow Matching-Based Generative Decoder for Wireless Image Transmission : Abstract: Due to strict rate and reliability demands, wireless image transmission remains difficult for both classical layered designs and joint source-channel coding (JSCC), especially under low late...
- Stagewise Reinforcement Learning and the Geometry of the Regret Landscape : Abstract: Singular learning theory characterizes Bayesian learning as an evolving tradeoff between accuracy and complexity, with transitions between qualitatively different solutions as sample size in...
- Near-Optimal Private Linear Regression via Iterative Hessian Mixing : Abstract: We study differentially private ordinary least squares (DP-OLS) with bounded data. The dominant approach, adaptive sufficient-statistics perturbation (AdaSSP), adds an adaptively chosen pert...
- Contextual Discrepancy-Aware Contrastive Learning for Robust Medical Time Series Diagnosis in Small-Sample Scenarios : Abstract: Medical time series data, such as EEG and ECG, are vital for diagnosing neurological and cardiovascular diseases. However, their precise interpretation faces significant challenges due to hi...
- TFEC: Multivariate Time-Series Clustering via Temporal-Frequency Enhanced Contrastive Learning : Abstract: Multivariate Time-Series (MTS) clustering is crucial for signal processing and data analysis. Although deep learning approaches, particularly those leveraging Contrastive Learning (CL), are ...
- d3LLM: Ultra-Fast Diffusion LLM using Pseudo-Trajectory Distillation : Abstract: Diffusion large language models (dLLMs) offer capabilities beyond those of autoregressive (AR) LLMs, such as parallel decoding and random-order generation. However, realizing these benefits ...
- Neural Architecture for Fast and Reliable Coagulation Assessment in Clinical Settings: Leveraging Thromboelastography : Abstract: In an ideal medical environment, real-time coagulation monitoring can enable early detection and prompt remediation of risks. However, traditional Thromboelastography (TEG), a widely employe...
- Beyond Sharpness: A Flatness Decomposition Framework for Efficient Continual Learning : Abstract: Continual Learning (CL) aims to enable models to sequentially learn multiple tasks without forgetting previous knowledge. Recent studies have shown that optimizing towards flatter loss minim...
- Tab-TRM: Tiny Recursive Model for Insurance Pricing on Tabular Data : Abstract: We introduce Tab-TRM (Tabular-Tiny Recursive Model), a network architecture that adapts the recursive latent reasoning paradigm of Tiny Recursive Models (TRMs) to insurance modeling. Drawing...
- Improving Domain Generalization in Contrastive Learning using Adaptive Temperature Control : Abstract: Self-supervised pre-training with contrastive learning is a powerful method for learning from sparsely labeled data. However, performance can drop considerably when there is a shift in the d...
- Free-RBF-KAN: Kolmogorov-Arnold Networks with Adaptive Radial Basis Functions for Efficient Function Learning : Abstract: Kolmogorov-Arnold Networks (KANs) have shown strong potential for efficiently approximating complex nonlinear functions. However, the original KAN formulation relies on B-spline basis functi...
- Are LLM Decisions Faithful to Verbal Confidence? : Abstract: Large Language Models (LLMs) can produce surprisingly sophisticated estimates of their own uncertainty. However, it remains unclear to what extent this expressed confidence is tied to the re...
- DT-ICU: Towards Explainable Digital Twins for ICU Patient Monitoring via Multi-Modal and Multi-Task Iterative Inference : Abstract: We introduce DT-ICU, a multimodal digital twin framework for continuous risk estimation in intensive care. DT-ICU integrates variable-length clinical time series with static patient informat...
- Optimal Learning Rate Schedule for Balancing Effort and Performance : Abstract: Learning how to learn efficiently is a fundamental challenge for biological agents and a growing concern for artificial ones. To learn effectively, an agent must regulate its learning speed,...
- Leveraging Foundation Models for Calibration-Free c-VEP BCIs : Abstract: Foundation Models (FMs) have surged in popularity over the past five years, with applications spanning fields from computer vision to natural language processing. Brain-Computer Interfaces (...
- Reinforcement Learning for Chain of Thought Compression with One-Domain-to-All Generalization : Abstract: Chain-of-thought reasoning in large language models often creates an "overthinking trap," leading to excessive computational cost and latency for unreliable accuracy gains. Prior work has ty...
- One if by Land, Two if by Sea, Three if by Four Seas, and More to Come -- Values of Perception, Prediction, Communication, and Common Sense in Decision Making : Abstract: This work aims to rigorously define the values of perception, prediction, communication, and common sense in decision making. The defined quantities are decision-theoretic, but have informat...
- PriceSeer: Evaluating Large Language Models in Real-Time Stock Prediction : Abstract: Stock prediction, a subject closely related to people's investment activities in fully dynamic and live environments, has been widely studied. Current large language models (LLMs) have shown...
- Dynamic Intelligence Ceilings: Measuring Long-Horizon Limits of Planning and Creativity in Artificial Systems : Abstract: Recent advances in artificial intelligence have produced systems capable of remarkable performance across a wide range of tasks. These gains, however, are increasingly accompanied by concern...
- CBMAS: Cognitive Behavioral Modeling via Activation Steering : Abstract: Large language models (LLMs) often encode cognitive behaviors unpredictably across prompts, layers, and contexts, making them difficult to diagnose and control. We present CBMAS, a diagnosti...
- COVR:Collaborative Optimization of VLMs and RL Agent for Visual-Based Control : Abstract: Visual reinforcement learning (RL) suffers from poor sample efficiency due to high-dimensional observations in complex tasks. While existing works have shown that vision-language models (VLM...
- Is Sanskrit the most token-efficient language? A quantitative study using GPT, Gemini, and SentencePiece : Abstract: Tokens are the basic units of Large Language Models (LLMs). LLMs rely on tokenizers to segment text into these tokens, and tokenization is the primary determinant of computational and infere...
- Forget-It-All: Multi-Concept Machine Unlearning via Concept-Aware Neuron Masking : Abstract: The widespread adoption of text-to-image (T2I) diffusion models has raised concerns about their potential to generate copyrighted, inappropriate, or sensitive imagery learned from massive tr...
- Performance of models for monitoring sustainable development goals from remote sensing: A three-level meta-regression : Abstract: Machine learning (ML) is a tool to exploit remote sensing data for the monitoring and implementation of the United Nations' Sustainable Development Goals (SDGs). In this paper, we report on ...
- Neuro-Symbolic Compliance: Integrating LLMs and SMT Solvers for Automated Financial Legal Analysis : Abstract: Financial regulations are increasingly complex, hindering automated compliance-especially the maintenance of logical consistency with minimal human oversight. We introduce a Neuro-Symbolic C...
- Rational Synthesizers or Heuristic Followers? Analyzing LLMs in RAG-based Question-Answering : Abstract: Retrieval-Augmented Generation (RAG) is the prevailing paradigm for grounding Large Language Models (LLMs), yet the mechanisms governing how models integrate groups of conflicting retrieved ...
- Towards Public Administration Research Based on Interpretable Machine Learning : Abstract: Causal relationships play a pivotal role in research within the field of public administration. Ensuring reliable causal inference requires validating the predictability of these relationshi...
- Cyber Threat Detection and Vulnerability Assessment System using Generative AI and Large Language Model : Abstract: Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threat...
- Hard Constraint Projection in a Physics Informed Neural Network : Abstract: In this work, we embed hard constraints in a physics informed neural network (PINN) which predicts solutions to the 2D incompressible Navier Stokes equations. We extend the hard constraint m...
- How well can off-the-shelf LLMs elucidate molecular structures from mass spectra using chain-of-thought reasoning? : Abstract: Mass spectrometry (MS) is a powerful analytical technique for identifying small molecules, yet determining complete molecular structures directly from tandem mass spectra (MS/MS) remains a l...
- $\texttt{AMEND++}$: Benchmarking Eligibility Criteria Amendments in Clinical Trials : Abstract: Clinical trial amendments frequently introduce delays, increased costs, and administrative burden, with eligibility criteria being the most commonly amended component. We introduce \textit{e...
- When Smaller Wins: Dual-Stage Distillation and Pareto-Guided Compression of Liquid Neural Networks for Edge Battery Prognostics : Abstract: Battery management systems increasingly require accurate battery health prognostics under strict on-device constraints. This paper presents DLNet, a practical framework with dual-stage disti...
- Triadic Concept Analysis for Logic Interpretation of Simple Artificial Networks : Abstract: An artificial neural network (ANN) is a numerical method used to solve complex classification problems. Due to its high classification power, the ANN method often outperforms other classific...
- SPINAL -- Scaling-law and Preference Integration in Neural Alignment Layers : Abstract: Direct Preference Optimization (DPO) is a principled, scalable alternative to RLHF for aligning large language models from pairwise preferences, but its internal geometric footprint remains ...
- AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving : Abstract: Optimizing Large Language Model (LLM) inference in production systems is increasingly difficult due to dynamic workloads, stringent latency/throughput targets, and a rapidly expanding config...
- SourceNet: Interpretable Sim-to-Real Inference on Variable-Geometry Sensor Arrays for Earthquake Source Inversion : Abstract: Inferring high-dimensional physical states from sparse, ad-hoc sensor arrays is a fundamental challenge across AI for Science, as they are complicated by irregular geometries and the profoun...
- Future-as-Label: Scalable Supervision from Real-World Outcomes : Abstract: Many real-world prediction problems lack labels observable at prediction time, creating a temporal gap between prediction and outcome that yields supervision only after events resolve. To ad...
- Evaluating Robustness of Large Language Models in Enterprise Applications: Benchmarks for Perturbation Consistency Across Formats and Languages : Abstract: Enterprise LLM applications require consistently high quality and reliable performance across diverse scenarios, demanding robustness to minor variations. Existing research shows that even s...
- Federated Learning and Class Imbalances : Abstract: Federated Learning (FL) enables collaborative model training across decentralized devices while preserving data privacy. However, real-world FL deployments face critical challenges such as d...
- A Fast and Effective Method for Euclidean Anticlustering: The Assignment-Based-Anticlustering Algorithm : Abstract: The anticlustering problem is to partition a set of objects into K equal-sized anticlusters such that the sum of distances within anticlusters is maximized. The anticlustering problem is NP-...
- Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning : Abstract: Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage...
- Hierarchical Pooling and Explainability in Graph Neural Networks for Tumor and Tissue-of-Origin Classification Using RNA-seq Data : Abstract: This study explores the use of graph neural networks (GNNs) with hierarchical pooling and multiple convolution layers for cancer classification based on RNA-seq data. We combine gene express...
- One-Shot Hierarchical Federated Clustering : Abstract: Driven by the growth of Web-scale decentralized services, Federated Clustering (FC) aims to extract knowledge from heterogeneous clients in an unsupervised manner while preserving the client...
- Teach Diffusion Language Models to Learn from Their Own Mistakes : Abstract: Masked Diffusion Language Models (DLMs) achieve significant speed by generating multiple tokens in parallel. However, this parallel sampling approach, especially when using fewer inference s...
- A Unified Shape-Aware Foundation Model for Time Series Classification : Abstract: Foundation models pre-trained on large-scale source datasets are reshaping the traditional training paradigm for time series classification. However, existing time series foundation models p...
- Certified Unlearning in Decentralized Federated Learning : Abstract: Driven by the right to be forgotten (RTBF), machine unlearning has become an essential requirement for privacy-preserving machine learning. However, its realization in decentralized federate...
- FlexAct: Why Learn when you can Pick? : Abstract: Learning activation functions has emerged as a promising direction in deep learning, allowing networks to adapt activation mechanisms to task-specific demands. In this work, we introduce a n...
- Physics-Informed Tree Search for High-Dimensional Computational Design : Abstract: High-dimensional design spaces underpin a wide range of physics-based modeling and computational design tasks in science and engineering. These problems are commonly formulated as constraine...
- Gecko: An Efficient Neural Architecture Inherently Processing Sequences with Arbitrary Lengths : Abstract: Designing a unified neural network to efficiently and inherently process sequential data with arbitrary lengths is a central and challenging problem in sequence modeling. The design choices ...
- StablePDENet: Enhancing Stability of Operator Learning for Solving Differential Equations : Abstract: Learning solution operators for differential equations with neural networks has shown great potential in scientific computing, but ensuring their stability under input perturbations remains ...
- Deriving Decoder-Free Sparse Autoencoders from First Principles : Abstract: Gradient descent on log-sum-exp (LSE) objectives performs implicit expectation--maximization (EM): the gradient with respect to each component output equals its responsibility. The same theo...
- ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking : Abstract: Reinforcement learning has substantially improved the performance of LLM agents on tasks with verifiable outcomes, but it still struggles on open-ended agent tasks with vast solution spaces ...
- Neural Nonmyopic Bayesian Optimization in Dynamic Cost Settings : Abstract: Bayesian optimization (BO) is a common framework for optimizing black-box functions, yet most existing methods assume static query costs and rely on myopic acquisition strategies. We introdu...
- A novel RF-enabled Non-Destructive Inspection Method through Machine Learning and Programmable Wireless Environments : Abstract: Contemporary industrial Non-Destructive Inspection (NDI) methods require sensing capabilities that operate in occluded, hazardous, or access restricted environments. Yet, the current visual ...
- Improving Day-Ahead Grid Carbon Intensity Forecasting by Joint Modeling of Local-Temporal and Cross-Variable Dependencies Across Different Frequencies : Abstract: Accurate forecasting of the grid carbon intensity factor (CIF) is critical for enabling demand-side management and reducing emissions in modern electricity systems. Leveraging multiple inter...
- Short-term electricity load forecasting with multi-frequency reconstruction diffusion : Abstract: Diffusion models have emerged as a powerful method in various applications. However, their application to Short-Term Electricity Load Forecasting (STELF) -- a typical scenario in energy syst...
- Mosaic: Unlocking Long-Context Inference for Diffusion LLMs via Global Memory Planning and Dynamic Peak Taming : Abstract: Diffusion-based large language models (dLLMs) have emerged as a promising paradigm, utilizing simultaneous denoising to enable global planning and iterative refinement. While these capabilit...
- Hellinger Multimodal Variational Autoencoders : Abstract: Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions ...
- Softly Induced Functional Simplicity Implications for Neural Network Generalisation, Robustness, and Distillation : Abstract: Learning robust and generalisable abstractions from high-dimensional input data is a central challenge in machine learning and its applications to high-energy physics (HEP). Solutions of low...
- Implicit bias as a Gauge correction: Theory and Inverse Design : Abstract: A central problem in machine learning theory is to characterize how learning dynamics select particular solutions among the many compatible with the training objective, a phenomenon, called ...
- CEDAR: Context Engineering for Agentic Data Science : Abstract: We demonstrate CEDAR, an application for automating data science (DS) tasks with an agentic setup. Solving DS problems with LLMs is an underexplored area that has immense market value. The c...
- KASER: Knowledge-Aligned Student Error Simulator for Open-Ended Coding Tasks : Abstract: Open-ended tasks, such as coding problems that are common in computer science education, provide detailed insights into student knowledge. However, training large language models (LLMs) to s...
- Leveraging Soft Prompts for Privacy Attacks in Federated Prompt Tuning : Abstract: Membership inference attack (MIA) poses a significant privacy threat in federated learning (FL) as it allows adversaries to determine whether a client's private dataset contains a specific d...
- Revisiting Training Scale: An Empirical Study of Token Count, Power Consumption, and Parameter Efficiency : Abstract: Research in machine learning has questioned whether increases in training token counts reliably produce proportional performance gains in large language models. Building on prior work introd...
- Reinforcement Learning-Guided Dynamic Multi-Graph Fusion for Evacuation Traffic Prediction : Abstract: Real-time traffic prediction is critical for managing transportation systems during hurricane evacuations. Although data-driven graph-learning models have demonstrated strong capabilities in...
- Plasticity vs. Rigidity: The Impact of Low-Rank Adapters on Reasoning on a Micro-Budget : Abstract: Recent advances in mathematical reasoning typically rely on massive scale, yet the question remains: can strong reasoning capabilities be induced in small language models ($\leq1.5\text{B}$)...
- Explainability of Complex AI Models with Correlation Impact Ratio : Abstract: Complex AI systems make better predictions but often lack transparency, limiting trustworthiness, interpretability, and safe deployment. Common post hoc AI explainers, such as LIME, SHAP, HS...
- Beyond Perfect Scores: Proof-by-Contradiction for Trustworthy Machine Learning : Abstract: Machine learning (ML) models show strong promise for new biomedical prediction tasks, but concerns about trustworthiness have hindered their clinical adoption. In particular, it is often unc...
- Predicting Student Success with Heterogeneous Graph Deep Learning and Machine Learning Models : Abstract: Early identification of student success is crucial for enabling timely interventions, reducing dropout rates, and promoting on time graduation. In educational settings, AI powered systems ha...
- Why are there many equally good models? An Anatomy of the Rashomon Effect : Abstract: The Rashomon effect -- the existence of multiple, distinct models that achieve nearly equivalent predictive performance -- has emerged as a fundamental phenomenon in modern machine learning ...
- Federated Continual Learning for Privacy-Preserving Hospital Imaging Classification : Abstract: Deep learning models for radiology interpretation increasingly rely on multi-institutional data, yet privacy regulations and distribution shift across hospitals limit central data pooling. F...
- Structure-preserving learning and prediction in optimal control of collective motion : Abstract: Wide-spread adoption of unmanned vehicle technologies requires the ability to predict the motion of the combined vehicle operation from observations. While the general prediction of such mot...
- Artificial Entanglement in the Fine-Tuning of Large Language Models : Abstract: Large language models (LLMs) can be adapted to new tasks using parameter-efficient fine-tuning (PEFT) methods that modify only a small number of trainable parameters, often through low-rank ...
- Cross-Modal Computational Model of Brain-Heart Interactions via HRV and EEG Feature : Abstract: The electroencephalogram (EEG) has been the gold standard for quantifying mental workload; however, due to its complexity and non-portability, it can be constraining. ECG signals, which are ...
- Graph Neural Network with One-side Edge Sampling for Fraud Detection : Abstract: Financial fraud is always a major problem in the field of finance, as it can cause significant consequences. As a result, many approaches have been designed to detect it, and lately Graph Ne...
- WFR-FM: Simulation-Free Dynamic Unbalanced Optimal Transport : Abstract: The Wasserstein-Fisher-Rao (WFR) metric extends dynamic optimal transport (OT) by coupling displacement with change of mass, providing a principled geometry for modeling unbalanced snapshot ...
- Analyzing the effect of prediction accuracy on the distributionally-robust competitive ratio : Abstract: The field of algorithms with predictions aims to improve algorithm performance by integrating machine learning predictions into algorithm design. A central question in this area is how predi...
- Variational decomposition autoencoding improves disentanglement of latent representations : Abstract: Understanding the structure of complex, nonstationary, high-dimensional time-evolving signals is a central challenge in scientific data analysis. In many domains, such as speech and biomedic...
- MoE-DisCo:Low Economy Cost Training Mixture-of-Experts Models : Abstract: Training large-scale Mixture-of-Experts (MoE) models typically requires high-memory, high-bandwidth GPUs (e.g., A100), and their high cost has become a major barrier to large-model training....
- U-MASK: User-adaptive Spatio-Temporal Masking for Personalized Mobile AI Applications : Abstract: Personalized mobile artificial intelligence applications are widely deployed, yet they are expected to infer user behavior from sparse and irregular histories under a continuously evolving s...
- DaQ-MSA: Denoising and Qualifying Diffusion Augmentations for Multimodal Sentiment Analysis : Abstract: Multimodal large language models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their effectiveness on multimodal sentiment analysis remains constrained by the sc...
- Tractable Multinomial Logit Contextual Bandits with Non-Linear Utilities : Abstract: We study the multinomial logit (MNL) contextual bandit problem for sequential assortment selection. Although most existing research assumes utility functions to be linear in item features, t...
- Active Learning Strategies for Efficient Machine-Learned Interatomic Potentials Across Diverse Material Systems : Abstract: Efficient discovery of new materials demands strategies to reduce the number of costly first-principles calculations required to train predictive machine learning models. We develop and vali...
- Forgetting Similar Samples: Can Machine Unlearning Do it Better? : Abstract: Machine unlearning, a process enabling pre-trained models to remove the influence of specific training samples, has attracted significant attention in recent years. Although extensive resear...
- Towards Operational Streamflow Forecasting in the Limpopo River Basin using Long Short-Term Memory Networks : Abstract: Robust hydrological simulation is key for sustainable development, water management strategies, and climate change adaptation. In recent years, deep learning methods have been demonstrated t...
- HAS-VQ: Hessian-Adaptive Sparse Vector Quantization for High-Fidelity LLM Compression : Abstract: Post-training quantization is essential for deploying Large Language Models (LLMs) on resource- constrained devices. However, standard integer quantization (e.g., INT4) fundamentally degrade...
- A Robust Certified Machine Unlearning Method Under Distribution Shift : Abstract: The Newton method has been widely adopted to achieve certified unlearning. A critical assumption in existing approaches is that the data requested for unlearning are selected i.i.d.(independ...
- Tight Analysis of Decentralized SGD: A Markov Chain Perspective : Abstract: We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show t...
- Explainable Deep Radiogenomic Molecular Imaging for MGMT Methylation Prediction in Glioblastoma : Abstract: Glioblastoma (GBM) is a highly aggressive primary brain tumor with limited therapeutic options and poor prognosis. The methylation status of the O6-methylguanine-DNA methyltransferase (MGMT)...
- Hallucinations Live in Variance : Abstract: Benchmarks measure whether a model is correct. They do not measure whether a model is reliable. This distinction is largely academic for single-shot inference, but becomes critical for agent...
- When Should We Introduce Safety Interventions During Pretraining? : Abstract: Ensuring the safety of language models in high-stakes settings remains a pressing challenge, as aligned behaviors are often brittle and easily undone by adversarial pressure or downstream fi...
- Reward-Preserving Attacks For Robust Reinforcement Learning : Abstract: Adversarial robustness in RL is difficult because perturbations affect entire trajectories: strong attacks can break learning, while weak attacks yield little robustness, and the appropriate...
- Towards Automated Diagnosis of Inherited Arrhythmias: Combined Arrhythmia Classification Using Lead-Aware Spatial Attention Networks : Abstract: Arrhythmogenic right ventricular cardiomyopathy (ARVC) and long QT syndrome (LQTS) are inherited arrhythmia syndromes associated with sudden cardiac death. Deep learning shows promise for EC...
- Generating readily synthesizable small molecule fluorophore scaffolds with reinforcement learning : Abstract: Developing new fluorophores for advanced imaging techniques requires exploring new chemical space. While generative AI approaches have shown promise in designing novel dye scaffolds, prior e...
- Stable On-Policy Distillation through Adaptive Target Reformulation : Abstract: Knowledge distillation (KD) is a widely adopted technique for transferring knowledge from large language models to smaller student models; however, conventional supervised KD often suffers f...
- Offline Meta-Reinforcement Learning with Flow-Based Task Inference and Adaptive Correction of Feature Overgeneralization : Abstract: Offline meta-reinforcement learning (OMRL) combines the strengths of learning from diverse datasets in offline RL with the adaptability to new tasks of meta-RL, promising safe and efficient ...
- Tree-Preconditioned Differentiable Optimization and Axioms as Layers : Abstract: This paper introduces a differentiable framework that embeds the axiomatic structure of Random Utility Models (RUM) directly into deep neural networks. Although projecting empirical choice d...
- CrossTrafficLLM: A Human-Centric Framework for Interpretable Traffic Intelligence via Large Language Model : Abstract: While accurate traffic forecasting is vital for Intelligent Transportation Systems (ITS), effectively communicating predicted conditions via natural language for human-centric decision suppo...
- Enabling Long FFT Convolutions on Memory-Constrained FPGAs via Chunking : Abstract: The need for long-context reasoning has led to alternative neural network architectures besides Transformers and self-attention, a popular model being Hyena, which employs causal 1D-convolut...
- The Hessian of tall-skinny networks is easy to invert : Abstract: We describe an exact algorithm for solving linear systems $Hx=b$ where $H$ is the Hessian of a deep net. The method computes Hessian-inverse-vector products without storing the Hessian or it...
- Filtering Beats Fine Tuning: A Bayesian Kalman View of In Context Learning in LLMs : Abstract: We present a theory-first framework that interprets inference-time adaptation in large language models (LLMs) as online Bayesian state estimation. Rather than modeling rapid adaptation as im...
- The Impact of Post-training on Data Contamination : Abstract: We present a controlled study of how dataset contamination interacts with the post-training stages now standard in large language model training pipelines. Starting from clean checkpoints of...
- Australian Bushfire Intelligence with AI-Driven Environmental Analytics : Abstract: Bushfires are among the most destructive natural hazards in Australia, causing significant ecological, economic, and social damage. Accurate prediction of bushfire intensity is therefore ess...
- Judge Model for Large-scale Multimodality Benchmarks : Abstract: We propose a dedicated multimodal Judge Model designed to provide reliable, explainable evaluation across a diverse suite of tasks. Our benchmark spans text, audio, image, and video modaliti...
- GroupSegment-SHAP: Shapley Value Explanations with Group-Segment Players for Multivariate Time Series : Abstract: Multivariate time-series models achieve strong predictive performance in healthcare, industry, energy, and finance, but how they combine cross-variable interactions with temporal dynamics re...
- Stress Testing Machine Learning at $10^{10}$ Scale: A Comprehensive Study of Adversarial Robustness on Algebraically Structured Integer Streams : Abstract: This paper presents a large-scale stress test of machine learning systems using structured mathematical data as a benchmark. We evaluate the robustness of tree-based classifiers at an unprec...
- L2CU: Learning to Complement Unseen Users : Abstract: Recent research highlights the potential of machine learning models to learn to complement (L2C) human strengths; however, generalizing this capability to unseen users remains a significant ...
- Latent Space Communication via K-V Cache Alignment : Abstract: Solving increasingly complex problems with large language models (LLMs) necessitates a move beyond individual models and towards multi-model systems that can effectively collaborate. While t...
- Learning Minimally-Congested Drive Times from Sparse Open Networks: A Lightweight RF-Based Estimator for Urban Roadway Operations : Abstract: Accurate roadway travel-time prediction is foundational to transportation systems analysis, yet widespread reliance on either data-intensive congestion models or overly naïve heuristics limi...
- AIS-CycleGen: A CycleGAN-Based Framework for High-Fidelity Synthetic AIS Data Generation and Augmentation : Abstract: Automatic Identification System (AIS) data are vital for maritime domain awareness, yet they often suffer from domain shifts, data sparsity, and class imbalance, which hinder the performance...
- A Review of Online Diffusion Policy RL Algorithms for Scalable Robotic Control : Abstract: Diffusion policies have emerged as a powerful approach for robotic control, demonstrating superior expressiveness in modeling multimodal action distributions compared to conventional policy ...
- DeeperBrain: A Neuro-Grounded EEG Foundation Model Towards Universal BCI : Abstract: Electroencephalography (EEG) foundation models hold significant promise for universal Brain-Computer Interfaces (BCIs). However, existing approaches often rely on end-to-end fine-tuning and ...
- Attention in Geometry: Scalable Spatial Modeling via Adaptive Density Fields and FAISS-Accelerated Kernels : Abstract: This work introduces Adaptive Density Fields (ADF), a geometric attention framework that formulates spatial aggregation as a query-conditioned, metric-induced attention operator in continuou...
- RainBalance: Alleviating Dual Imbalance in GNSS-based Precipitation Nowcasting via Continuous Probability Modeling : Abstract: Global navigation satellite systems (GNSS) station-based Precipitation Nowcasting aims to predict rainfall within the next 0-6 hours by leveraging a GNSS station's historical observations of...
- Causal and Federated Multimodal Learning for Cardiovascular Risk Prediction under Heterogeneous Populations : Abstract: Cardiovascular disease (CVD) continues to be the major cause of death globally, calling for predictive models that not only handle diverse and high-dimensional biomedical signals but also ma...
- LLM Flow Processes for Text-Conditioned Regression : Abstract: Meta-learning methods for regression like Neural (Diffusion) Processes achieve impressive results, but with these models it can be difficult to incorporate expert prior knowledge and informa...
- A Foundation Model Approach for Fetal Stress Prediction During Labor From cardiotocography (CTG) recordings : Abstract: Intrapartum cardiotocography (CTG) is widely used for fetal monitoring during labor, yet its interpretation suffers from high inter-observer variability and limited predictive accuracy. Deep...
- PromptPort: A Reliability Layer for Cross-Model Structured Extraction : Abstract: Structured extraction with LLMs fails in production not because models lack understanding, but because output formatting is unreliable across models and prompts. A prompt that returns clean ...
- ECLIPTICA - A Framework for Switchable LLM Alignment via CITA - Contrastive Instruction-Tuned Alignment : Abstract: Alignment in large language models (LLMs) is still largely static: after training, the policy is frozen. DPO, GRPO methods typically imprint one behavior into the weights, leaving little run...
- Can we Improve Prediction of Psychotherapy Outcomes Through Pretraining With Simulated Data? : Abstract: In the context of personalized medicine, machine learning algorithms are growing in popularity. These algorithms require substantial information, which can be acquired effectively through th...
- Forget Many, Forget Right: Scalable and Precise Concept Unlearning in Diffusion Models : Abstract: Text-to-image diffusion models have achieved remarkable progress, yet their use raises copyright and misuse concerns, prompting research into machine unlearning. However, extending multi-con...
- Parent-Guided Adaptive Reliability (PGAR): A Behavioural Meta-Learning Framework for Stable and Trustworthy AI : Abstract: Parent-Guided Adaptive Reliability (PGAR) is a lightweight behavioural meta-learning framework that adds a supervisory "parent" layer on top of a standard learner to improve stability, calib...
- MixDPO: Modeling Preference Strength for Pluralistic Alignment : Abstract: Preference based alignment objectives implicitly assume that all human preferences are expressed with equal strength. In practice, however, preference strength varies across individuals and ...
- Data-Driven Reduced-Complexity Modeling of Fluid Flows: A Community Challenge : Abstract: We introduce a community challenge designed to facilitate direct comparisons between data-driven methods for compression, forecasting, and sensing of complex aerospace flows. The challenge i...
- Time-Series Anomaly Classification for Launch Vehicle Propulsion Systems: Fast Statistical Detectors Enhancing LSTM Accuracy and Data Quality : Abstract: Supporting Go/No-Go decisions prior to launch requires assessing real-time telemetry data against redline limits established during the design qualification phase. Family data from ground te...
- TimeGNN-Augmented Hybrid-Action MARL for Fine-Grained Task Partitioning and Energy-Aware Offloading in MEC : Abstract: With the rapid growth of IoT devices and latency-sensitive applications, the demand for both real-time and energy-efficient computing has surged, placing significant pressure on traditional ...
- MLB: A Scenario-Driven Benchmark for Evaluating Large Language Models in Clinical Applications : Abstract: The proliferation of Large Language Models (LLMs) presents transformative potential for healthcare, yet practical deployment is hindered by the absence of frameworks that assess real-world c...
- EntroLnn: Entropy-Guided Liquid Neural Networks for Operando Refinement of Battery Capacity Fade Trajectories : Abstract: Battery capacity degradation prediction has long been a central topic in battery health analytics, and most studies focus on state of health (SoH) estimation and end of life (EoL) prediction...
- Manifold-based Sampling for In-Context Hallucination Detection in Large Language Models : Abstract: Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval au...
- Dynamics-inspired Structure Hallucination for Protein-protein Interaction Modeling : Abstract: Protein-protein interaction (PPI) represents a central challenge within the biology field, and accurately predicting the consequences of mutations in this context is crucial for drug design ...
- CEEMDAN-Based Multiscale CNN for Wind Turbine Gearbox Fault Detection : Abstract: Wind turbines play a critical role in the shift toward sustainable energy generation. Their operation relies on multiple interconnected components, and a failure in any of these can compromi...
- Breaking Model Lock-in: Cost-Efficient Zero-Shot LLM Routing via a Universal Latent Space : Abstract: The rapid proliferation of Large Language Models (LLMs) has led to a fragmented and inefficient ecosystem, a state of ``model lock-in'' where seamlessly integrating novel models remains a si...
- LDTC: Lifelong deep temporal clustering for multivariate time series : Abstract: Clustering temporal and dynamically changing multivariate time series from real-world fields, called temporal clustering for short, has been a major challenge due to inherent complexities. A...
- Projecting Out the Malice: A Global Subspace Approach to LLM Detoxification : Abstract: Large language models (LLMs) exhibit exceptional performance but pose inherent risks of generating toxic content, restricting their safe deployment. While traditional methods (e.g., alignmen...
Research Sources: 592 | Generated: 1/13/2026
