AI RESEARCH PAPERS & ACADEMIC SOURCES
- SAMCL: Empowering SAM to Continually Learn from Dynamic Domains with Extreme Storage Efficiency : Abstract: Segment Anything Model (SAM) struggles in open-world scenarios with diverse domains. In such settings, naive fine-tuning with a well-designed learning module is inadequate and often causes c...
- Towards Unsupervised Domain Bridging via Image Degradation in Semantic Segmentation : Abstract: Semantic segmentation suffers from significant performance degradation when the trained network is applied to a different domain. To address this issue, unsupervised domain adaptation (UDA) ...
- Causal Interpretability for Adversarial Robustness: A Hybrid Generative Classification Approach : Abstract: Deep learning-based discriminative classifiers, despite their remarkable success, remain vulnerable to adversarial examples that can mislead model predictions. While adversarial training can...
- Asynchronous Bioplausible Neuron for SNN for Event Vision : Abstract: Spiking Neural Networks (SNNs) offer a biologically inspired approach to computer vision that can lead to more efficient processing of visual data with reduced energy consumption. However, m...
- Tyche: Stochastic In-Context Learning for Medical Image Segmentation : Abstract: Existing learning-based solutions to medical image segmentation have two important shortcomings. First, for most new segmentation task, a new model has to be trained or fine-tuned. This requ...
- Self-supervised Learning-based Reconstruction of High-resolution 4D Light Fields : Abstract: Hand-held light field (LF) cameras often exhibit low spatial resolution due to the inherent trade-off between spatial and angular dimensions. Existing supervised learning-based LF spatial su...
- CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation : Abstract: Despite the significant success achieved by deep learning methods in medical image segmentation, researchers still struggle in the computer-aided diagnosis of abdominal lymph nodes due to th...
- RAVE: Rate-Adaptive Visual Encoding for 3D Gaussian Splatting : Abstract: Recent advances in neural scene representations have transformed immersive multimedia, with 3D Gaussian Splatting (3DGS) enabling real-time photorealistic rendering. Despite its efficiency, ...
- Persistent Homology-Guided Frequency Filtering for Image Compression : Abstract: Feature extraction in noisy image datasets presents many challenges in model reliability. In this paper, we use the discrete Fourier transform in conjunction with persistent homology analysi...
- Context-measure: Contextualizing Metric for Camouflage : Abstract: Camouflage is primarily context-dependent yet current metrics for camouflaged scenarios overlook this critical factor. Instead, these metrics are originally designed for evaluating general o...
- COREA: Coarse-to-Fine 3D Representation Alignment Between Relightable 3D Gaussians and SDF via Bidirectional 3D-to-3D Supervision : Abstract: We present COREA, the first unified framework that jointly learns relightable 3D Gaussians and a Signed Distance Field (SDF) for accurate geometry reconstruction and faithful relighting. Whi...
- MSN: Multi-directional Similarity Network for Hand-crafted and Deep-synthesized Copy-Move Forgery Detection : Abstract: Copy-move image forgery aims to duplicate certain objects or to hide specific contents with copy-move operations, which can be achieved by a sequence of manual manipulations as well as up-to...
- Training-free Clothing Region of Interest Self-correction for Virtual Try-On : Abstract: VTON (Virtual Try-ON) aims at synthesizing the target clothing on a certain person, preserving the details of the target clothing while keeping the rest of the person unchanged. Existing met...
- MulCLIP: A Multi-level Alignment Framework for Enhancing Fine-grained Long-context CLIP : Abstract: Vision-language models like CLIP show impressive ability to align images and text, but their training on short, concise captions makes them struggle with lengthy, detailed descriptions. Rece...
- CHIMERA: Adaptive Cache Injection and Semantic Anchor Prompting for Zero-shot Image Morphing with Morphing-oriented Metrics : Abstract: Diffusion models exhibit remarkable generative ability, yet achieving smooth and semantically consistent image morphing remains a challenge. Existing approaches often yield abrupt transition...
- MuSASplat: Efficient Sparse-View 3D Gaussian Splats via Lightweight Multi-Scale Adaptation : Abstract: Sparse-view 3D Gaussian splatting seeks to render high-quality novel views of 3D scenes from a limited set of input images. While recent pose-free feed-forward methods leveraging pre-trained...
- When Privacy Meets Recovery: The Overlooked Half of Surrogate-Driven Privacy Preservation for MLLM Editing : Abstract: Privacy leakage in Multimodal Large Language Models (MLLMs) has long been an intractable problem. Existing studies, though effectively obscure private information in MLLMs, often overlook th...
- TIDE: Two-Stage Inverse Degradation Estimation with Guided Prior Disentanglement for Underwater Image Restoration : Abstract: Underwater image restoration is essential for marine applications ranging from ecological monitoring to archaeological surveys, but effectively addressing the complex and spatially varying n...
- Integrating Multi-scale and Multi-filtration Topological Features for Medical Image Classification : Abstract: Modern deep neural networks have shown remarkable performance in medical image classification. However, such networks either emphasize pixel-intensity features instead of fundamental anatomi...
- RefLSM: Linearized Structural-Prior Reflectance Model for Medical Image Segmentation and Bias-Field Correction : Abstract: Medical image segmentation remains challenging due to intensity inhomogeneity, noise, blurred boundaries, and irregular structures. Traditional level set methods, while effective in certain ...
- HVQ-CGIC: Enabling Hyperprior Entropy Modeling for VQ-Based Controllable Generative Image Compression : Abstract: Generative learned image compression methods using Vector Quantization (VQ) have recently shown impressive potential in balancing distortion and perceptual quality. However, these methods ty...
- SUCCESS-GS: Survey of Compactness and Compression for Efficient Static and Dynamic Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful explicit representation enabling real-time, high-fidelity 3D reconstruction and novel view synthesis. However, its practical use is hin...
- MMRPT: MultiModal Reinforcement Pre-Training via Masked Vision-Dependent Reasoning : Abstract: Multimodal pre-training remains constrained by the descriptive bias of image-caption pairs, leading models to favor surface linguistic cues over grounded visual understanding. We introduce M...
- Object Pose Distribution Estimation for Determining Revolution and Reflection Uncertainty in Point Clouds : Abstract: Object pose estimation is crucial to robotic perception and typically provides a single-pose estimate. However, a single estimate cannot capture pose uncertainty deriving from visual ambigui...
- ReLKD: Inter-Class Relation Learning with Knowledge Distillation for Generalized Category Discovery : Abstract: Generalized Category Discovery (GCD) faces the challenge of categorizing unlabeled data containing both known and novel classes, given only labels for known classes. Previous studies often t...
- STRinGS: Selective Text Refinement in Gaussian Splatting : Abstract: Text as signs, labels, or instructions is a critical element of real-world scenes as they can convey important contextual information. 3D representations such as 3D Gaussian Splatting (3DGS)...
- Unified Camera Positional Encoding for Controlled Video Generation : Abstract: Transformers have emerged as a universal backbone across 3D perception, video generation, and world models for autonomous driving and embodied AI, where understanding camera geometry is esse...
- Squeezed-Eff-Net: Edge-Computed Boost of Tomography Based Brain Tumor Classification leveraging Hybrid Neural Network Architecture : Abstract: Brain tumors are one of the most common and dangerous neurological diseases which require a timely and correct diagnosis to provide the right treatment procedures. Even with the promotion of...
- Zero-Shot Textual Explanations via Translating Decision-Critical Features : Abstract: Textual explanations make image classifier decisions transparent by describing the prediction rationale in natural language. Large vision-language models can generate captions but are design...
- See More, Change Less: Anatomy-Aware Diffusion for Contrast Enhancement : Abstract: Image enhancement improves visual quality and helps reveal details that are hard to see in the original image. In medical imaging, it can support clinical decision-making, but current models...
- RVLF: A Reinforcing Vision-Language Framework for Gloss-Free Sign Language Translation : Abstract: Gloss-free sign language translation (SLT) is hindered by two key challenges: **inadequate sign representation** that fails to capture nuanced visual cues, and **sentence-level semantic misa...
- Geo3DVQA: Evaluating Vision-Language Models for 3D Geospatial Reasoning from Aerial Imagery : Abstract: Three-dimensional geospatial analysis is critical to applications in urban planning, climate adaptation, and environmental assessment. Current methodologies depend on costly, specialized sen...
- Reevaluating Automated Wildlife Species Detection: A Reproducibility Study on a Custom Image Dataset : Abstract: This study revisits the findings of Carl et al., who evaluated the pre-trained Google Inception-ResNet-v2 model for automated detection of European wild mammal species in camera trap images....
- The Inductive Bottleneck: Data-Driven Emergence of Representational Sparsity in Vision Transformers : Abstract: Vision Transformers (ViTs) lack the hierarchical inductive biases inherent to Convolutional Neural Networks (CNNs), theoretically allowing them to maintain high-dimensional representations t...
- Generalized Referring Expression Segmentation on Aerial Photos : Abstract: Referring expression segmentation is a fundamental task in computer vision that integrates natural language understanding with precise visual localization of target regions. Considering aeri...
- Debiasing Diffusion Priors via 3D Attention for Consistent Gaussian Splatting : Abstract: Versatile 3D tasks (e.g., generation or editing) that distill from Text-to-Image (T2I) diffusion models have attracted significant research interest for not relying on extensive 3D training ...
- MICo-150K: A Comprehensive Dataset Advancing Multi-Image Composition : Abstract: In controllable image generation, synthesizing coherent and consistent images from multiple reference inputs, i.e., Multi-Image Composition (MICo), remains a challenging problem, partly hind...
- Enhancing Small Object Detection with YOLO: A Novel Framework for Improved Accuracy and Efficiency : Abstract: This paper investigates and develops methods for detecting small objects in large-scale aerial images. Current approaches for detecting small objects in aerial images often involve image cro...
- Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects : Abstract: 3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to ...
- LogicCBMs: Logic-Enhanced Concept-Based Learning : Abstract: Concept Bottleneck Models (CBMs) provide a basis for semantic abstractions within a neural network architecture. Such models have primarily been seen through the lens of interpretability so ...
- How Far are Modern Trackers from UAV-Anti-UAV? A Million-Scale Benchmark and New Baseline : Abstract: Unmanned Aerial Vehicles (UAVs) offer wide-ranging applications but also pose significant safety and privacy violation risks in areas like airport and infrastructure inspection, spurring the...
- GlimmerNet: A Lightweight Grouped Dilated Depthwise Convolutions for UAV-Based Emergency Monitoring : Abstract: Convolutional Neural Networks (CNNs) have proven highly effective for edge and mobile vision tasks due to their computational efficiency. While many recent works seek to enhance CNNs with gl...
- Reconstructing Objects along Hand Interaction Timelines in Egocentric Video : Abstract: We introduce the task of Reconstructing Objects along Hand Interaction Timelines (ROHIT). We first define the Hand Interaction Timeline (HIT) from a rigid object's perspective. In a HIT, an ...
- InterAgent: Physics-based Multi-agent Command Execution via Diffusion on Interaction Graphs : Abstract: Humanoid agents are expected to emulate the complex coordination inherent in human social behaviors. However, existing methods are largely confined to single-agent scenarios, overlooking the...
- Unified Video Editing with Temporal Reasoner : Abstract: Existing video editing methods face a critical trade-off: expert models offer precision but rely on task-specific priors like masks, hindering unification; conversely, unified temporal in-co...
- Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance : Abstract: While traditional and neural video codecs (NVCs) have achieved remarkable rate-distortion performance, improving perceptual quality at low bitrates remains challenging. Some NVCs incorporate...
- Towards Robust DeepFake Detection under Unstable Face Sequences: Adaptive Sparse Graph Embedding with Order-Free Representation and Explicit Laplacian Spectral Prior : Abstract: Ensuring the authenticity of video content remains challenging as DeepFake generation becomes increasingly realistic and robust against detection. Most existing detectors implicitly assume t...
- MultiMotion: Multi Subject Video Motion Transfer via Video Diffusion Transformer : Abstract: Multi-object video motion transfer poses significant challenges for Diffusion Transformer (DiT) architectures due to inherent motion entanglement and lack of object-level control. We present...
- SJD++: Improved Speculative Jacobi Decoding for Training-free Acceleration of Discrete Auto-regressive Text-to-Image Generation : Abstract: Large autoregressive models can generate high-quality, high-resolution images but suffer from slow generation speed, because these models require hundreds to thousands of sequential forward ...
- ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points : Abstract: Recent text-to-image models, such as Stable Diffusion, have achieved impressive visual quality, yet they often suffer from geometric inconsistencies that undermine the structural realism of ...
- MeshRipple: Structured Autoregressive Generation of Artist-Meshes : Abstract: Meshes serve as a primary representation for 3D assets. Autoregressive mesh generators serialize faces into sequences and train on truncated segments with sliding-window inference to cope wi...
- From Orbit to Ground: Generative City Photogrammetry from Extreme Off-Nadir Satellite Images : Abstract: City-scale 3D reconstruction from satellite imagery presents the challenge of extreme viewpoint extrapolation, where our goal is to synthesize ground-level novel views from sparse orbital im...
- All You Need Are Random Visual Tokens? Demystifying Token Pruning in VLLMs : Abstract: Vision Large Language Models (VLLMs) incur high computational costs due to their reliance on hundreds of visual tokens to represent images. While token pruning offers a promising solution fo...
- LongCat-Image Technical Report : Abstract: We introduce LongCat-Image, a pioneering open-source and bilingual (Chinese-English) foundation model for image generation, designed to address core challenges in multilingual text rendering...
- Robust Variational Model Based Tailored UNet: Leveraging Edge Detector and Mean Curvature for Improved Image Segmentation : Abstract: To address the challenge of segmenting noisy images with blurred or fragmented boundaries, this paper presents a robust version of Variational Model Based Tailored UNet (VM_TUNet), a hybrid ...
- More than Segmentation: Benchmarking SAM 3 for Segmentation, 3D Perception, and Reconstruction in Robotic Surgery : Abstract: The recent Segment Anything Model (SAM) 3 has introduced significant advancements over its predecessor, SAM 2, particularly with the integration of language-based segmentation and enhanced 3...
- Online Segment Any 3D Thing as Instance Tracking : Abstract: Online, real-time, and fine-grained 3D segmentation constitutes a fundamental capability for embodied intelligent agents to perceive and comprehend their operational environments. Recent adv...
- Decomposition Sampling for Efficient Region Annotations in Active Learning : Abstract: Active learning improves annotation efficiency by selecting the most informative samples for annotation and model training. While most prior work has focused on selecting informative images ...
- MoCA: Mixture-of-Components Attention for Scalable Compositional 3D Generation : Abstract: Compositionality is critical for 3D object and scene generation, but existing part-aware 3D generation methods suffer from poor scalability due to quadratic global attention costs when incre...
- Liver Fibrosis Quantification and Analysis: The LiQA Dataset and Baseline Method : Abstract: Liver fibrosis represents a significant global health burden, necessitating accurate staging for effective clinical management. This report introduces the LiQA (Liver Fibrosis Quantification...
- Optimization-Guided Diffusion for Interactive Scene Generation : Abstract: Realistic and diverse multi-agent driving scenes are crucial for evaluating autonomous vehicles, but safety-critical events which are essential for this task are rare and underrepresented in...
- EgoCampus: Egocentric Pedestrian Eye Gaze Model and Dataset : Abstract: We address the challenge of predicting human visual attention during real-world navigation by measuring and modeling egocentric pedestrian eye gaze in an outdoor campus setting. We introduce...
- sim2art: Accurate Articulated Object Modeling from a Single Video using Synthetic Training Data Only : Abstract: Understanding articulated objects is a fundamental challenge in robotics and digital twin creation. To effectively model such objects, it is essential to recover both part segmentation and t...
- UnCageNet: Tracking and Pose Estimation of Caged Animal : Abstract: Animal tracking and pose estimation systems, such as STEP (Simultaneous Tracking and Pose Estimation) and ViTPose, experience substantial performance drops when processing images and videos ...
- ViSA: 3D-Aware Video Shading for Real-Time Upper-Body Avatar Creation : Abstract: Generating high-fidelity upper-body 3D avatars from one-shot input image remains a significant challenge. Current 3D avatar generation methods, which rely on large reconstruction models, are...
- SpatialDreamer: Incentivizing Spatial Reasoning via Active Mental Imagery : Abstract: Despite advancements in Multi-modal Large Language Models (MLLMs) for scene understanding, their performance on complex spatial reasoning tasks requiring mental simulation remains significan...
- HLTCOE Evaluation Team at TREC 2025: VQA Track : Abstract: The HLTCOE Evaluation team participated in TREC VQA's Answer Generation (AG) task, for which we developed a listwise learning framework that aims to improve semantic precision and ranking co...
- DiffusionDriveV2: Reinforcement Learning-Constrained Truncated Diffusion Modeling in End-to-End Autonomous Driving : Abstract: Generative diffusion models for end-to-end autonomous driving often suffer from mode collapse, tending to generate conservative and homogeneous behaviors. While DiffusionDrive employs predef...
- Unison: A Fully Automatic, Task-Universal, and Low-Cost Framework for Unified Understanding and Generation : Abstract: Unified understanding and generation is a highly appealing research direction in multimodal learning. There exist two approaches: one trains a transformer via an auto-regressive paradigm, an...
- UltrasODM: A Dual Stream Optical Flow Mamba Network for 3D Freehand Ultrasound Reconstruction : Abstract: Clinical ultrasound acquisition is highly operator-dependent, where rapid probe motion and brightness fluctuations often lead to reconstruction errors that reduce trust and clinical utility....
- Modality-Aware Bias Mitigation and Invariance Learning for Unsupervised Visible-Infrared Person Re-Identification : Abstract: Unsupervised visible-infrared person re-identification (USVI-ReID) aims to match individuals across visible and infrared cameras without relying on any annotation. Given the significant gap ...
- GorillaWatch: An Automated System for In-the-Wild Gorilla Re-Identification and Population Monitoring : Abstract: Monitoring critically endangered western lowland gorillas is currently hampered by the immense manual effort required to re-identify individuals from vast archives of camera trap footage. Th...
- Distribution Matching Variational AutoEncoder : Abstract: Most visual generative models compress images into a latent space before applying diffusion or autoregressive modelling. Yet, existing approaches such as VAEs and foundation model aligned en...
- OneStory: Coherent Multi-Shot Video Generation with Adaptive Memory : Abstract: Storytelling in real-world videos often unfolds through multiple shots -- discontinuous yet semantically connected clips that together convey a coherent narrative. However, existing multi-sh...
- Multi-view Pyramid Transformer: Look Coarser to See Broader : Abstract: We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds of images in a single forward...
- Lang3D-XL: Language Embedded 3D Gaussians for Large-scale Scenes : Abstract: Embedding a language field in a 3D representation enables richer semantic understanding of spatial environments by linking geometry with descriptive meaning. This allows for a more intuitive...
- OpenVE-3M: A Large-Scale High-Quality Dataset for Instruction-Guided Video Editing : Abstract: The quality and diversity of instruction-based image editing datasets are continuously increasing, yet large-scale, high-quality datasets for instruction-based video editing remain scarce. T...
- UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation : Abstract: Recent video generation models demonstrate impressive synthesis capabilities but remain limited by single-modality conditioning, constraining their holistic world understanding. This stems f...
- Voxify3D: Pixel Art Meets Volumetric Rendering : Abstract: Voxel art is a distinctive stylization widely used in games and digital media, yet automated generation from 3D meshes remains challenging due to conflicting requirements of geometric abstra...
- Stronger is not better: Better Augmentations in Contrastive Learning for Medical Image Segmentation : Abstract: Self-supervised contrastive learning is among the recent representation learning methods that have shown performance gains in several downstream tasks including semantic segmentation. This p...
- Semantic Temporal Single-photon LiDAR : Abstract: Temporal single-photon (TSP-) LiDAR presents a promising solution for imaging-free target recognition over long distances with reduced size, cost, and power consumption. However, existing TS...
- GuideNav: User-Informed Development of a Vision-Only Robotic Navigation Assistant For Blind Travelers : Abstract: While commendable progress has been made in user-centric research on mobile assistive systems for blind and low-vision (BLV) individuals, references that directly inform robot navigation des...
- OmniSafeBench-MM: A Unified Benchmark and Toolbox for Multimodal Jailbreak Attack-Defense Evaluation : Abstract: Recent advances in multi-modal large language models (MLLMs) have enabled unified perception-reasoning capabilities, yet these systems remain highly vulnerable to jailbreak attacks that bypa...
- MIND-V: Hierarchical Video Generation for Long-Horizon Robotic Manipulation with RL-based Physical Alignment : Abstract: Embodied imitation learning is constrained by the scarcity of diverse, long-horizon robotic manipulation data. Existing video generation models for this domain are limited to synthesizing sh...
- XM-ALIGN: Unified Cross-Modal Embedding Alignment for Face-Voice Association : Abstract: This paper introduces our solution, XM-ALIGN (Unified Cross-Modal Embedding Alignment Framework), proposed for the FAME challenge at ICASSP 2026. Our framework combines explicit and implicit...
- Dynamic Visual SLAM using a General 3D Prior : Abstract: Reliable incremental estimation of camera poses and 3D reconstruction is key to enable various applications including robotics, interactive visualization, and augmented reality. However, thi...
- Mimir: Hierarchical Goal-Driven Diffusion with Uncertainty Propagation for End-to-End Autonomous Driving : Abstract: End-to-end autonomous driving has emerged as a pivotal direction in the field of autonomous systems. Recent works have demonstrated impressive performance by incorporating high-level guidanc...
- Affine Subspace Models and Clustering for Patch-Based Image Denoising : Abstract: Image tile-based approaches are popular in many image processing applications such as denoising (e.g., non-local means). A key step in their use is grouping the images into clusters, which u...
- Human Geometry Distribution for 3D Animation Generation : Abstract: Generating realistic human geometry animations remains a challenging task, as it requires modeling natural clothing dynamics with fine-grained geometric details under limited data. To addres...
- Precise Liver Tumor Segmentation in CT Using a Hybrid Deep Learning-Radiomics Framework : Abstract: Accurate three-dimensional delineation of liver tumors on contrast-enhanced CT is a prerequisite for treatment planning, navigation and response assessment, yet manual contouring is slow, ob...
- Bimodal SegNet: Instance Segmentation Fusing Events and RGB Frames for Robotic Grasping : Abstract: Object segmentation for robotic grasping under dynamic conditions often faces challenges such as occlusion, low light conditions, motion blur and object size variance. To address these chall...
- Diffusion Models for Image Restoration and Enhancement: A Comprehensive Survey : Abstract: Image restoration (IR) has been an indispensable and challenging task in the low-level vision field, which strives to improve the subjective quality of images distorted by various forms of d...
- Event-Customized Image Generation : Abstract: Customized Image Generation, generating customized images with user-specified concepts, has raised significant attention due to its creativity and novelty. With impressive progress achieved ...
- RepLDM: Reprogramming Pretrained Latent Diffusion Models for High-Quality, High-Efficiency, High-Resolution Image Generation : Abstract: While latent diffusion models (LDMs), such as Stable Diffusion, are designed for high-resolution (HR) image generation, they often struggle with significant structural distortions when gener...
- TinyViM: Frequency Decoupling for Tiny Hybrid Vision Mamba : Abstract: Mamba has shown great potential for computer vision due to its linear complexity in modeling the global context with respect to the input length. However, existing lightweight Mamba-based ba...
- Explaining Object Detectors via Collective Contribution of Pixels : Abstract: Visual explanations for object detectors are crucial for enhancing their reliability. Object detectors identify and localize instances by assessing multiple visual features collectively. Whe...
- Golden Touchstone: A Comprehensive Bilingual Benchmark for Evaluating Financial Large Language Models : Abstract: As large language models (LLMs) increasingly permeate the financial sector, there is a pressing need for a standardized method to comprehensively assess their performance. Existing financial...
- Is Self-Supervised Learning Enough to Fill in the Gap? A Study on Speech Inpainting : Abstract: Speech inpainting consists in reconstructing corrupted or missing speech segments using surrounding context, a process that closely resembles the pretext tasks in Self-Supervised Learning (S...
- EmoDiffTalk:Emotion-aware Diffusion for Editable 3D Gaussian Talking Head : Abstract: Recent photo-realistic 3D talking head via 3D Gaussian Splatting still has significant shortcoming in emotional expression manipulation, especially for fine-grained and expansive dynamics em...
- FishDetector-R1: Unified MLLM-Based Framework with Reinforcement Fine-Tuning for Weakly Supervised Fish Detection, Segmentation, and Counting : Abstract: Analyzing underwater fish imagery is critical for ecological monitoring but remains difficult due to visual degradation and costly annotations. We introduce FishDetector-R1, a unified MLLM-b...
- PrunedCaps: A Case For Primary Capsules Discrimination : Abstract: Capsule Networks (CapsNets) are a generation of image classifiers with proven advantages over Convolutional Neural Networks (CNNs). Better robustness to affine transformation and overlapping...
- Fast and Flexible Robustness Certificates for Semantic Segmentation : Abstract: Deep Neural Networks are vulnerable to small perturbations that can drastically alter their predictions for perceptually unchanged inputs. The literature on adversarially robust Deep Learnin...
- High-Throughput Unsupervised Profiling of the Morphology of 316L Powder Particles for Use in Additive Manufacturing : Abstract: Selective Laser Melting (SLM) is a powder-bed additive manufacturing technique whose part quality depends critically on feedstock morphology. However, conventional powder characterization me...
- VAT: Vision Action Transformer by Unlocking Full Representation of ViT : Abstract: In robot learning, Vision Transformers (ViTs) are standard for visual perception, yet most methods discard valuable information by using only the final layer's features. We argue this provid...
- Benchmarking CXR Foundation Models With Publicly Available MIMIC-CXR and NIH-CXR14 Datasets : Abstract: Recent foundation models have demonstrated strong performance in medical image representation learning, yet their comparative behaviour across datasets remains underexplored. This work bench...
- Neural reconstruction of 3D ocean wave hydrodynamics from camera sensing : Abstract: Precise three-dimensional (3D) reconstruction of wave free surfaces and associated velocity fields is essential for developing a comprehensive understanding of ocean physics. To address the ...
- Representation Learning for Point Cloud Understanding : Abstract: With the rapid advancement of technology, 3D data acquisition and utilization have become increasingly prevalent across various fields, including computer vision, robotics, and geospatial an...
- Shoot-Bounce-3D: Single-Shot Occlusion-Aware 3D from Lidar by Decomposing Two-Bounce Light : Abstract: 3D scene reconstruction from a single measurement is challenging, especially in the presence of occluded regions and specular materials, such as mirrors. We address these challenges by lever...
- BeLLA: End-to-End Birds Eye View Large Language Assistant for Autonomous Driving : Abstract: The rapid development of Vision-Language models (VLMs) and Multimodal Language Models (MLLMs) in autonomous driving research has significantly reshaped the landscape by enabling richer scene...
- SpectraIrisPAD: Leveraging Vision Foundation Models for Spectrally Conditioned Multispectral Iris Presentation Attack Detection : Abstract: Iris recognition is widely recognized as one of the most accurate biometric modalities. However, its growing deployment in real-world applications raises significant concerns regarding its v...
- Tracking-Guided 4D Generation: Foundation-Tracker Motion Priors for 3D Model Animation : Abstract: Generating dynamic 4D objects from sparse inputs is difficult because it demands joint preservation of appearance and motion coherence across views and time while suppressing artifacts and t...
- Automated Annotation of Shearographic Measurements Enabling Weakly Supervised Defect Detection : Abstract: Shearography is an interferometric technique sensitive to surface displacement gradients, providing high sensitivity for detecting subsurface defects in safety-critical components. A key lim...
- Physics-Grounded Shadow Generation from Monocular 3D Geometry Priors and Approximate Light Direction : Abstract: Shadow generation aims to produce photorealistic shadows that are visually consistent with object geometry and scene illumination. In the physics of shadow formation, the occluder blocks som...
- Physics-Grounded Attached Shadow Detection Using Approximate 3D Geometry and Light Direction : Abstract: Attached shadows occur on the surface of the occluder where light cannot reach because of self-occlusion. They are crucial for defining the three-dimensional structure of objects and enhanci...
- SPOOF: Simple Pixel Operations for Out-of-Distribution Fooling : Abstract: Deep neural networks (DNNs) excel across image recognition tasks, yet continue to exhibit overconfidence on inputs that bear no resemblance to natural images. Revisiting the "fooling images"...
- Revisiting SVD and Wavelet Difference Reduction for Lossy Image Compression: A Reproducibility Study : Abstract: This work presents an independent reproducibility study of a lossy image compression technique that integrates singular value decomposition (SVD) and wavelet difference reduction (WDR). The ...
- GPU-GLMB: Assessing the Scalability of GPU-Accelerated Multi-Hypothesis Tracking : Abstract: Much recent research on multi-target tracking has focused on multi-hypothesis approaches leveraging random finite sets. Of particular interest are labeled random finite set methods that main...
- NexusFlow: Unifying Disparate Tasks under Partial Supervision via Invertible Flow Networks : Abstract: Partially Supervised Multi-Task Learning (PS-MTL) aims to leverage knowledge across tasks when annotations are incomplete. Existing approaches, however, have largely focused on the simpler s...
- Language-driven Fine-grained Retrieval : Abstract: Existing fine-grained image retrieval (FGIR) methods learn discriminative embeddings by adopting semantically sparse one-hot labels derived from category names as supervision. While effectiv...
- Knowing the Answer Isn't Enough: Fixing Reasoning Path Failures in LVLMs : Abstract: We reveal a critical yet underexplored flaw in Large Vision-Language Models (LVLMs): even when these models know the correct answer, they frequently arrive there through incorrect reasoning ...
- TriaGS: Differentiable Triangulation-Guided Geometric Consistency for 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting is crucial for real-time novel view synthesis due to its efficiency and ability to render photorealistic images. However, building a 3D Gaussian is guided solely by pho...
- FacePhys: State of the Heart Learning : Abstract: Vital sign measurement using cameras presents opportunities for comfortable, ubiquitous health monitoring. Remote photoplethysmography (rPPG), a foundational technology, enables cardiac meas...
- A Sleep Monitoring System Based on Audio, Video and Depth Information : Abstract: For quantitative evaluation of sleep disturbances, a noninvasive monitoring system is developed by introducing an event-based method. We observe sleeping in home context and classify the sle...
- StrokeNet: Unveiling How to Learn Fine-Grained Interactions in Online Handwritten Stroke Classification : Abstract: Stroke classification remains challenging due to variations in writing style, ambiguous content, and dynamic writing positions. The core challenge in stroke classification is modeling the se...
- ReCAD: Reinforcement Learning Enhanced Parametric CAD Model Generation with Vision-Language Models : Abstract: We present ReCAD, a reinforcement learning (RL) framework that bootstraps pretrained large models (PLMs) to generate precise parametric computer-aided design (CAD) models from multimodal inp...
- S2WMamba: A Spectral-Spatial Wavelet Mamba for Pansharpening : Abstract: Pansharpening fuses a high-resolution PAN image with a low-resolution multispectral (LRMS) image to produce an HRMS image. A key difficulty is that jointly processing PAN and MS often entang...
- CryoHype: Reconstructing a thousand cryo-EM structures with transformer-based hypernetworks : Abstract: Cryo-electron microscopy (cryo-EM) is an indispensable technique for determining the 3D structures of dynamic biomolecular complexes. While typically applied to image a single molecular spec...
- Beyond Hallucinations: A Multimodal-Guided Task-Aware Generative Image Compression for Ultra-Low Bitrate : Abstract: Generative image compression has recently shown impressive perceptual quality, but often suffers from semantic deviations caused by generative hallucinations at ultra-low bitrate (bpp < 0.05...
- CLUENet: Cluster Attention Makes Neural Networks Have Eyes : Abstract: Despite the success of convolution- and attention-based models in vision tasks, their rigid receptive fields and complex architectures limit their ability to model irregular spatial patterns...
- TreeQ: Pushing the Quantization Boundary of Diffusion Transformer via Tree-Structured Mixed-Precision Search : Abstract: Diffusion Transformers (DiTs) have emerged as a highly scalable and effective backbone for image generation, outperforming U-Net architectures in both scalability and performance. However, t...
- Rectifying Latent Space for Generative Single-Image Reflection Removal : Abstract: Single-image reflection removal is a highly ill-posed problem, where existing methods struggle to reason about the composition of corrupted regions, causing them to fail at recovery and gene...
- Spoofing-aware Prompt Learning for Unified Physical-Digital Facial Attack Detection : Abstract: Real-world face recognition systems are vulnerable to both physical presentation attacks (PAs) and digital forgery attacks (DFs). We aim to achieve comprehensive protection of biometric data...
- Human3R: Incorporating Human Priors for Better 3D Dynamic Reconstruction from Monocular Videos : Abstract: Monocular dynamic video reconstruction faces significant challenges in dynamic human scenes due to geometric inconsistencies and resolution degradation issues. Existing methods lack 3D human...
- VG-Refiner: Towards Tool-Refined Referring Grounded Reasoning via Agentic Reinforcement Learning : Abstract: Tool-integrated visual reasoning (TiVR) has demonstrated great potential in enhancing multimodal problem-solving. However, existing TiVR paradigms mainly focus on integrating various visual ...
- Are AI-Generated Driving Videos Ready for Autonomous Driving? A Diagnostic Evaluation Framework : Abstract: Recent text-to-video models have enabled the generation of high-resolution driving scenes from natural language prompts. These AI-generated driving videos (AIGVs) offer a low-cost, scalable ...
- VAD-Net: Multidimensional Facial Expression Recognition in Intelligent Education System : Abstract: Current FER (Facial Expression Recognition) dataset is mostly labeled by emotion categories, such as happy, angry, sad, fear, disgust, surprise, and neutral which are limited in expressivene...
- OCFER-Net: Recognizing Facial Expression in Online Learning System : Abstract: Recently, online learning is very popular, especially under the global epidemic of COVID-19. Besides knowledge distribution, emotion interaction is also very important. It can be obtained by...
- Perceptual Region-Driven Infrared-Visible Co-Fusion for Extreme Scene Enhancement : Abstract: In photogrammetry, accurately fusing infrared (IR) and visible (VIS) spectra while preserving the geometric fidelity of visible features and incorporating thermal radiation is a significant ...
- A Perception CNN for Facial Expression Recognition : Abstract: Convolutional neural networks (CNNs) can automatically learn data patterns to express face images for facial expression recognition (FER). However, they may ignore effect of facial segmentat...
- DragMesh: Interactive 3D Generation Made Easy : Abstract: While generative models have excelled at creating static 3D content, the pursuit of systems that understand how objects move and respond to interactions remains a fundamental challenge. Curr...
- AGORA: Adversarial Generation Of Real-time Animatable 3D Gaussian Head Avatars : Abstract: The generation of high-fidelity, animatable 3D human avatars remains a core challenge in computer graphics and vision, with applications in VR, telepresence, and entertainment. Existing appr...
- Towards Stable Cross-Domain Depression Recognition under Missing Modalities : Abstract: Depression poses serious public health risks, including suicide, underscoring the urgency of timely and scalable screening. Multimodal automatic depression detection (ADD) offers a promising...
- Sanvaad: A Multimodal Accessibility Framework for ISL Recognition and Voice-Based Interaction : Abstract: Communication between deaf users, visually im paired users, and the general hearing population often relies on tools that support only one direction of interaction. To address this limitatio...
- Bridging spatial awareness and global context in medical image segmentation : Abstract: Medical image segmentation is a fundamental task in computer-aided diagnosis, requiring models that balance segmentation accuracy and computational efficiency. However, existing segmentation...
- GNC-Pose: Geometry-Aware GNC-PnP for Accurate 6D Pose Estimation : Abstract: We present GNC-Pose, a fully learning-free monocular 6D object pose estimation pipeline for textured objects that combines rendering-based initialization, geometry-aware correspondence weigh...
- MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding : Abstract: Large vision-language models struggle with medical video understanding, where spatial precision, temporal reasoning, and clinical semantics are critical. To address this, we first introduce ...
- From Remote Sensing to Multiple Time Horizons Forecasts: Transformers Model for CyanoHAB Intensity in Lake Champlain : Abstract: Cyanobacterial Harmful Algal Blooms (CyanoHABs) pose significant threats to aquatic ecosystems and public health globally. Lake Champlain is particularly vulnerable to recurring CyanoHAB eve...
- Learning Relative Gene Expression Trends from Pathology Images in Spatial Transcriptomics : Abstract: Gene expression estimation from pathology images has the potential to reduce the RNA sequencing cost. Point-wise loss functions have been widely used to minimize the discrepancy between pred...
- Hierarchical Deep Learning for Diatom Image Classification: A Multi-Level Taxonomic Approach : Abstract: Accurate taxonomic identification of diatoms is essential for aquatic ecosystem monitoring, yet conventional methods depend heavily on expert taxonomists. Recent deep learning approaches imp...
- Personalized Image Descriptions from Attention Sequences : Abstract: People can view the same image differently: they focus on different regions, objects, and details in varying orders and describe them in distinct linguistic styles. This leads to substantial...
- CoT4Det: A Chain-of-Thought Framework for Perception-Oriented Vision-Language Tasks : Abstract: Large Vision-Language Models (LVLMs) have demonstrated remarkable success in a broad range of vision-language tasks, such as general visual question answering and optical character recogniti...
- 1 + 1 > 2: Detector-Empowered Video Large Language Model for Spatio-Temporal Grounding and Reasoning : Abstract: Spatio-temporal grounding and reasoning aims to locate the temporal segment and spatial region of an event in a video given a user query, while also reasoning about semantics such as causali...
- RunawayEvil: Jailbreaking the Image-to-Video Generative Models : Abstract: Image-to-Video (I2V) generation synthesizes dynamic visual content from image and text inputs, providing significant creative control. However, the security of such multimodal systems, parti...
- EMGauss: Continuous Slice-to-3D Reconstruction via Dynamic Gaussian Modeling in Volume Electron Microscopy : Abstract: Volume electron microscopy (vEM) enables nanoscale 3D imaging of biological structures but remains constrained by acquisition trade-offs, leading to anisotropic volumes with limited axial re...
- Lightweight Wasserstein Audio-Visual Model for Unified Speech Enhancement and Separation : Abstract: Speech Enhancement (SE) and Speech Separation (SS) have traditionally been treated as distinct tasks in speech processing. However, real-world audio often involves both background noise and ...
- Graph Convolutional Long Short-Term Memory Attention Network for Post-Stroke Compensatory Movement Detection Based on Skeleton Data : Abstract: Most stroke patients experience upper limb motor dysfunction. Compensatory movements are prevalent during rehabilitation training, which is detrimental to patients' long-term recovery. There...
- FedSCAl: Leveraging Server and Client Alignment for Unsupervised Federated Source-Free Domain Adaptation : Abstract: We address the Federated source-Free Domain Adaptation (FFreeDA) problem, with clients holding unlabeled data with significant inter-client domain gaps. The FFreeDA setup constrains the FL f...
- UARE: A Unified Vision-Language Model for Image Quality Assessment, Restoration, and Enhancement : Abstract: Image quality assessment (IQA) and image restoration are fundamental problems in low-level vision. Although IQA and restoration are closely connected conceptually, most existing work treats ...
- JOCA: Task-Driven Joint Optimisation of Camera Hardware and Adaptive Camera Control Algorithms : Abstract: The quality of captured images strongly influences the performance of downstream perception tasks. Recent works on co-designing camera systems with perception tasks have shown improved task ...
- Physics Informed Human Posture Estimation Based on 3D Landmarks from Monocular RGB-Videos : Abstract: Applications providing automated coaching for physical training are increasing in popularity, for example physical therapy. These applications rely on accurate and robust pose estimation usi...
- Generalized Geometry Encoding Volume for Real-time Stereo Matching : Abstract: Real-time stereo matching methods primarily focus on enhancing in-domain performance but often overlook the critical importance of generalization in real-world applications. In contrast, rec...
- VDOT: Efficient Unified Video Creation via Optimal Transport Distillation : Abstract: The rapid development of generative models has significantly advanced image and video applications. Among these, video creation, aimed at generating videos under various conditions, has gain...
- MeshSplatting: Differentiable Rendering with Opaque Meshes : Abstract: Primitive-based splatting methods like 3D Gaussian Splatting have revolutionized novel view synthesis with real-time rendering. However, their point-based representations remain incompatible...
- SparseCoop: Cooperative Perception with Kinematic-Grounded Queries : Abstract: Cooperative perception is critical for autonomous driving, overcoming the inherent limitations of a single vehicle, such as occlusions and constrained fields-of-view. However, current approa...
- CADE: Continual Weakly-supervised Video Anomaly Detection with Ensembles : Abstract: Video anomaly detection (VAD) has long been studied as a crucial problem in public security and crime prevention. In recent years, weakly-supervised VAD (WVAD) have attracted considerable at...
- Pseudo Anomalies Are All You Need: Diffusion-Based Generation for Weakly-Supervised Video Anomaly Detection : Abstract: Deploying video anomaly detection in practice is hampered by the scarcity and collection cost of real abnormal footage. We address this by training without any real abnormal videos while eva...
- Omni-Referring Image Segmentation : Abstract: In this paper, we propose a novel task termed Omni-Referring Image Segmentation (OmniRIS) towards highly generalized image segmentation. Compared with existing unimodally conditioned segment...
- Boosting Unsupervised Video Instance Segmentation with Automatic Quality-Guided Self-Training : Abstract: Video Instance Segmentation (VIS) faces significant annotation challenges due to its dual requirements of pixel-level masks and temporal consistency labels. While recent unsupervised methods...
- Spatial Retrieval Augmented Autonomous Driving : Abstract: Existing autonomous driving systems rely on onboard sensors (cameras, LiDAR, IMU, etc) for environmental perception. However, this paradigm is limited by the drive-time perception horizon an...
- Towards Robust Pseudo-Label Learning in Semantic Segmentation: An Encoding Perspective : Abstract: Pseudo-label learning is widely used in semantic segmentation, particularly in label-scarce scenarios such as unsupervised domain adaptation (UDA) and semisupervised learning (SSL). Despite ...
- SceneMixer: Exploring Convolutional Mixing Networks for Remote Sensing Scene Classification : Abstract: Remote sensing scene classification plays a key role in Earth observation by enabling the automatic identification of land use and land cover (LULC) patterns from aerial and satellite imager...
- Hierarchical Image-Guided 3D Point Cloud Segmentation in Industrial Scenes via Multi-View Bayesian Fusion : Abstract: Reliable 3D segmentation is critical for understanding complex scenes with dense layouts and multi-scale objects, as commonly seen in industrial environments. In such scenarios, heavy occlus...
- Balanced Learning for Domain Adaptive Semantic Segmentation : Abstract: Unsupervised domain adaptation (UDA) for semantic segmentation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Despite the effectiveness of self-traini...
- Overcoming Small Data Limitations in Video-Based Infant Respiration Estimation : Abstract: The development of contactless respiration monitoring for infants could enable advances in the early detection and treatment of breathing irregularities, which are associated with neurodevel...
- Scaling Zero-Shot Reference-to-Video Generation : Abstract: Reference-to-video (R2V) generation aims to synthesize videos that align with a text prompt while preserving the subject identity from reference images. However, current R2V methods are hind...
- Can We Go Beyond Visual Features? Neural Tissue Relation Modeling for Relational Graph Analysis in Non-Melanoma Skin Histology : Abstract: Histopathology image segmentation is essential for delineating tissue structures in skin cancer diagnostics, but modeling spatial context and inter-tissue relationships remains a challenge, ...
- SSP-GNN: Learning to Track via Bilevel Optimization : Abstract: We propose a graph-based tracking formulation for multi-object tracking (MOT) where target detections contain kinematic information and re-identification features (attributes). Our method ap...
- Alpha-VI DeepONet: A prior-robust variational Bayesian approach for enhancing DeepONets with uncertainty quantification : Abstract: We introduce a novel deep operator network (DeepONet) framework that incorporates generalised variational inference (GVI) using Rényi's $α$-divergence to learn complex operators while quanti...
- Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Handy Appetizer : Abstract: This book explores the role of Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) in driving the progress of big data analytics and management. The book focuses on s...
- Deep Learning and Machine Learning, Advancing Big Data Analytics and Management: Unveiling AI's Potential Through Tools, Techniques, and Applications : Abstract: Artificial intelligence (AI), machine learning, and deep learning have become transformative forces in big data analytics and management, enabling groundbreaking advancements across diverse ...
- A Data Envelopment Analysis Approach for Assessing Fairness in Resource Allocation: Application to Kidney Exchange Programs : Abstract: Kidney exchange programs have substantially increased transplantation rates but also raise critical concerns about fairness in organ allocation. We propose a novel framework leveraging Data ...
- Data Taggants: Dataset Ownership Verification via Harmless Targeted Data Poisoning : Abstract: Dataset ownership verification, the process of determining if a dataset is used in a model's training data, is necessary for detecting unauthorized data usage and data contamination. Existin...
- Can GNNs Learn Link Heuristics? A Concise Review and Evaluation of Link Prediction Methods : Abstract: This paper explores the ability of Graph Neural Networks (GNNs) in learning various forms of information for link prediction, alongside a brief review of existing link prediction methods. Ou...
- Bi-ICE: An Inner Interpretable Framework for Image Classification via Bi-directional Interactions between Concept and Input Embeddings : Abstract: Inner interpretability is a promising field aiming to uncover the internal mechanisms of AI systems through scalable, automated methods. While significant research has been conducted on larg...
- Graceful forgetting: Memory as a process : Abstract: A rational framework is proposed to explain how we accommodate unbounded sensory input within bounded memory. Memory is stored as statistics organized into structures that are repeatedly sum...
- HPC-Driven Modeling with ML-Based Surrogates for Magnon-Photon Dynamics in Hybrid Quantum Systems : Abstract: Simulating hybrid magnonic quantum systems remains a challenge due to the large disparity between the timescales of the two systems. We present a massively parallel GPU-based simulation fram...
- Morphologically-Informed Tokenizers for Languages with Non-Concatenative Morphology: A case study of Yolox\'ochtil Mixtec ASR : Abstract: This paper investigates the impact of using morphologically-informed tokenizers to aid and streamline the interlinear gloss annotation of an audio corpus of Yoloxóchitl Mixtec (YM) using a c...
- Policy-based Sentence Simplification: Replacing Parallel Corpora with LLM-as-a-Judge : Abstract: Sentence simplification aims to modify a sentence to make it easier to read and understand while preserving the meaning. Different applications require distinct simplification policies, such...
- LOCUS: A System and Method for Low-Cost Customization for Universal Specialization : Abstract: We present LOCUS (LOw-cost Customization for Universal Specialization), a pipeline that consumes few-shot data to streamline the construction and training of NLP models through targeted retr...
- Nanbeige4-3B Technical Report: Exploring the Frontier of Small Language Models : Abstract: We present Nanbeige4-3B, a family of small-scale but high-performing language models. Pretrained on 23T high-quality tokens and finetuned on over 30 million diverse instructions, we extend t...
- Modeling Contextual Passage Utility for Multihop Question Answering : Abstract: Multihop Question Answering (QA) requires systems to identify and synthesize information from multiple text passages. While most prior retrieval methods assist in identifying relevant passag...
- Knowing What's Missing: Assessing Information Sufficiency in Question Answering : Abstract: Determining whether a provided context contains sufficient information to answer a question is a critical challenge for building reliable question-answering systems. While simple prompting s...
- ProSocialAlign: Preference Conditioned Test Time Alignment in Language Models : Abstract: Current language model safety paradigms often fall short in emotionally charged or high-stakes settings, where refusal-only approaches may alienate users and naive compliance can amplify ris...
- Adapting AlignScore Mertic for Factual Consistency Evaluation of Text in Russian: A Student Abstract : Abstract: Ensuring factual consistency in generated text is crucial for reliable natural language processing applications. However, there is a lack of evaluation tools for factual consistency in Russi...
- The Online Discourse of Virtual Reality and Anxiety : Abstract: VR in the treatment of clinical concerns such as generalized anxiety disorder or social anxiety. VR has created additional pathways to support patient well-being and care. Understanding onli...
- CMV-Fuse: Cross Modal-View Fusion of AMR, Syntax, and Knowledge Representations for Aspect Based Sentiment Analysis : Abstract: Natural language understanding inherently depends on integrating multiple complementary perspectives spanning from surface syntax to deep semantics and world knowledge. However, current Aspe...
- PersonaMem-v2: Towards Personalized Intelligence via Learning Implicit User Personas and Agentic Memory : Abstract: Personalization is one of the next milestones in advancing AI capability and alignment. We introduce PersonaMem-v2, the state-of-the-art dataset for LLM personalization that simulates 1,000 ...
- Think-While-Generating: On-the-Fly Reasoning for Personalized Long-Form Generation : Abstract: Preference alignment has enabled large language models (LLMs) to better reflect human expectations, but current methods mostly optimize for population-level preferences, overlooking individu...
- TopiCLEAR: Topic extraction by CLustering Embeddings with Adaptive dimensional Reduction : Abstract: Rapid expansion of social media platforms such as X (formerly Twitter), Facebook, and Reddit has enabled large-scale analysis of public perceptions on diverse topics, including social issues...
- Parameter-Efficient Fine-Tuning with Differential Privacy for Robust Instruction Adaptation in Large Language Models : Abstract: This study addresses the issues of privacy protection and efficiency in instruction fine-tuning of large-scale language models by proposing a parameter-efficient method that integrates diffe...
- One Word Is Not Enough: Simple Prompts Improve Word Embeddings : Abstract: Text embedding models are designed for sentence-level applications like retrieval and semantic similarity, and are primarily evaluated on sentence-level benchmarks. Their behavior on isolate...
- LLM4SFC: Sequential Function Chart Generation via Large Language Models : Abstract: While Large Language Models (LLMs) are increasingly used for synthesizing textual PLC programming languages like Structured Text (ST) code, other IEC 61131-3 standard graphical languages lik...
- Large Language Model-Based Generation of Discharge Summaries : Abstract: Discharge Summaries are documents written by medical professionals that detail a patient's visit to a care facility. They contain a wealth of information crucial for patient care, and automa...
- AquaFusionNet: Lightweight VisionSensor Fusion Framework for Real-Time Pathogen Detection and Water Quality Anomaly Prediction on Edge Devices : Abstract: Evidence from many low and middle income regions shows that microbial contamination in small scale drinking water systems often fluctuates rapidly, yet existing monitoring tools capture only...
- Rhea: Role-aware Heuristic Episodic Attention for Conversational LLMs : Abstract: Large Language Models (LLMs) have achieved remarkable performance on single-turn tasks, yet their effectiveness deteriorates in multi-turn conversations. We define this phenomenon as cumulat...
- An Analysis of Large Language Models for Simulating User Responses in Surveys : Abstract: Using Large Language Models (LLMs) to simulate user opinions has received growing attention. Yet LLMs, especially trained with reinforcement learning from human feedback (RLHF), are known to...
- Automated PRO-CTCAE Symptom Selection based on Prior Adverse Event Profiles : Abstract: The PRO-CTCAE is an NCI-developed patient-reported outcome system for capturing symptomatic adverse events in oncology trials. It comprises a large library drawn from the CTCAE vocabulary, a...
- Large Language Models and Forensic Linguistics: Navigating Opportunities and Threats in the Age of Generative AI : Abstract: Large language models (LLMs) present a dual challenge for forensic linguistics. They serve as powerful analytical tools enabling scalable corpus analysis and embedding-based authorship attri...
- XAM: Interactive Explainability for Authorship Attribution Models : Abstract: We present IXAM, an Interactive eXplainability framework for Authorship Attribution Models. Given an authorship attribution (AA) task and an embedding-based AA model, our tool enables users ...
- Progress Ratio Embeddings: An Impatience Signal for Robust Length Control in Neural Text Generation : Abstract: Modern neural language models achieve high accuracy in text generation, yet precise control over generation length remains underdeveloped. In this paper, we first investigate a recent length...
- Replicating TEMPEST at Scale: Multi-Turn Adversarial Attacks Against Trillion-Parameter Frontier Models : Abstract: Despite substantial investment in safety alignment, the vulnerability of large language models to sophisticated multi-turn adversarial attacks remains poorly characterized, and whether model...
- SETUP: Sentence-level English-To-Uniform Meaning Representation Parser : Abstract: Uniform Meaning Representation (UMR) is a novel graph-based semantic representation which captures the core meaning of a text, with flexibility incorporated into the annotation schema such t...
- Do Large Language Models Truly Understand Cross-cultural Differences? : Abstract: In recent years, large language models (LLMs) have demonstrated strong performance on multilingual tasks. Given its wide range of applications, cross-cultural understanding capability is a c...
- GUMBridge: a Corpus for Varieties of Bridging Anaphora : Abstract: Bridging is an anaphoric phenomenon where the referent of an entity in a discourse is dependent on a previous, non-identical entity for interpretation, such as in "There is 'a house'. 'The d...
- Ensembling LLM-Induced Decision Trees for Explainable and Robust Error Detection : Abstract: Error detection (ED), which aims to identify incorrect or inconsistent cell values in tabular data, is important for ensuring data quality. Recent state-of-the-art ED methods leverage the pr...
- TeluguST-46: A Benchmark Corpus and Comprehensive Evaluation for Telugu-English Speech Translation : Abstract: Despite Telugu being spoken by over 80 million people, speech translation research for this morphologically rich language remains severely underexplored. We address this gap by developing a ...
- Efficient ASR for Low-Resource Languages: Leveraging Cross-Lingual Unlabeled Data : Abstract: Automatic speech recognition for low-resource languages remains fundamentally constrained by the scarcity of labeled data and computational resources required by state-of-the-art models. We ...
- Investigating Training and Generalization in Faithful Self-Explanations of Large Language Models : Abstract: Large language models have the potential to generate explanations for their own predictions in a variety of styles based on user instructions. Recent research has examined whether these self...
- Multilingual corpora for the study of new concepts in the social sciences and humanities: : Abstract: This article presents a hybrid methodology for building a multilingual corpus designed to support the study of emerging concepts in the humanities and social sciences (HSS), illustrated here...
- Training Language Models to Use Prolog as a Tool : Abstract: Ensuring reliable tool use is critical for safe agentic AI systems. Language models frequently produce unreliable reasoning with plausible but incorrect solutions that are difficult to verif...
- Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning : Abstract: We introduce Native Parallel Reasoner (NPR), a teacher-free framework that enables Large Language Models (LLMs) to self-evolve genuine parallel reasoning capabilities. NPR transforms the mod...
- Enhancing Agentic RL with Progressive Reward Shaping and Value-based Sampling Policy Optimization : Abstract: Large Language Models (LLMs) empowered with Tool-Integrated Reasoning (TIR) can iteratively plan, call external tools, and integrate returned information to solve complex, long-horizon reaso...
- Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs : Abstract: Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Stan...
- SwissGov-RSD: A Human-annotated, Cross-lingual Benchmark for Token-level Recognition of Semantic Differences Between Related Documents : Abstract: Recognizing semantic differences across documents, especially in different languages, is crucial for text generation evaluation and multilingual content alignment. However, as a standalone t...
- Most over-representation of phonological features in basic vocabulary disappears when controlling for spatial and phylogenetic effects : Abstract: The statistical over-representation of phonological features in the basic vocabulary of languages is often interpreted as reflecting potentially universal sound symbolic patterns. However, m...
- Performance of the SafeTerm AI-Based MedDRA Query System Against Standardised MedDRA Queries : Abstract: In pre-market drug safety review, grouping related adverse event terms into SMQs or OCMQs is critical for signal detection. We assess the performance of SafeTerm Automated Medical Query (AMQ...
- A Simple Method to Enhance Pre-trained Language Models with Speech Tokens for Classification : Abstract: This paper presents a simple method that allows to easily enhance textual pre-trained large language models with speech information, when fine-tuned for a specific classification task. A cla...
- Bridging Code Graphs and Large Language Models for Better Code Understanding : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance in code intelligence tasks such as code generation, summarization, and translation. However, their reliance on linearize...
- HalluShift++: Bridging Language and Vision through Internal Representation Shifts for Hierarchical Hallucinations in MLLMs : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable capabilities in vision-language understanding tasks. While these models often produce linguistically coherent output, th...
- Automated Generation of Custom MedDRA Queries Using SafeTerm Medical Map : Abstract: In pre-market drug safety review, grouping related adverse event terms into standardised MedDRA queries or the FDA Office of New Drugs Custom Medical Queries (OCMQs) is critical for signal d...
- Mary, the Cheeseburger-Eating Vegetarian: Do LLMs Recognize Incoherence in Narratives? : Abstract: Leveraging a dataset of paired narratives, we investigate the extent to which large language models (LLMs) can reliably separate incoherent and coherent stories. A probing study finds that L...
- On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models : Abstract: Recent reinforcement learning (RL) techniques have yielded impressive reasoning improvements in language models, yet it remains unclear whether post-training truly extends a model's reasonin...
- AI-Generated Compromises for Coalition Formation: Modeling, Simulation, and a Textual Case Study : Abstract: The challenge of finding compromises between agent proposals is fundamental to AI sub-fields such as argumentation, mediation, and negotiation. Building on this tradition, Elkind et al. (202...
- Small Language Models Reshape Higher Education: Courses, Textbooks, and Teaching : Abstract: While large language models (LLMs) have introduced novel paradigms in science and education, their adoption in higher education is constrained by inherent limitations. These include a tenden...
- An Index-based Approach for Efficient and Effective Web Content Extraction : Abstract: As web agents (e.g., Deep Research) routinely consume massive volumes of web pages to gather and analyze information, LLM context management -- under large token budgets and low signal densi...
- Look Twice before You Leap: A Rational Agent Framework for Localized Adversarial Anonymization : Abstract: Current LLM-based text anonymization frameworks usually rely on remote API services from powerful LLMs, which creates an inherent "privacy paradox": users must somehow disclose data to untru...
- MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning : Abstract: Recent advances in video multimodal large language models (Video MLLMs) have significantly enhanced video understanding and multi-modal interaction capabilities. While most existing systems ...
- MATEX: A Multi-Agent Framework for Explaining Ethereum Transactions : Abstract: Understanding a complicated Ethereum transaction remains challenging: multi-hop token flows, nested contract calls, and opaque execution paths routinely lead users to blind signing. Based on...
- Think-Reflect-Revise: A Policy-Guided Reflective Framework for Safety Alignment in Large Vision Language Models : Abstract: As multimodal reasoning improves the overall capabilities of Large Vision Language Models (LVLMs), recent studies have begun to explore safety-oriented reasoning, aiming to enhance safety aw...
- Generating Storytelling Images with Rich Chains-of-Reasoning : Abstract: An image can convey a compelling story by presenting rich, logically connected visual clues. These connections form Chains-of-Reasoning (CoRs) within the image, enabling viewers to infer eve...
- Living the Novel: A System for Generating Self-Training Timeline-Aware Conversational Agents from Novels : Abstract: We present the Living Novel, an end-to-end system that transforms any literary work into an immersive, multi-character conversational experience. This system is designed to solve two fundame...
- Surveying the MLLM Landscape: A Meta-Review of Current Surveys : Abstract: The rise of Multimodal Large Language Models (MLLMs) has become a transformative force in the field of artificial intelligence, enabling machines to process and generate content across multi...
- LLMs are Biased Evaluators But Not Biased for Retrieval Augmented Generation : Abstract: Recent studies have demonstrated that large language models (LLMs) exhibit significant biases in evaluation tasks, particularly in preferentially rating and favoring self-generated content. ...
- A Systematic Assessment of Language Models with Linguistic Minimal Pairs in Chinese : Abstract: We present ZhoBLiMP, the largest linguistic minimal pair benchmark for Chinese, with over 100 paradigms, ranging from topicalization to the \textit{Ba} construction. We then train from scrat...
- A Latent Variable Framework for Scaling Laws in Large Language Models : Abstract: We propose a statistical framework built on latent variable modeling for scaling laws of large language models (LLMs). Our work is motivated by the rapid emergence of numerous new LLM famili...
- Proof of Concept for Mammography Classification with Enhanced Compactness and Separability Modules : Abstract: This study presents a validation and extension of a recent methodological framework for medical image classification. While an improved ConvNeXt Tiny architecture, integrating Global Average...
- Latent Nonlinear Denoising Score Matching for Enhanced Learning of Structured Distributions : Abstract: We present latent nonlinear denoising score matching (LNDSM), a novel training objective for score-based generative models that integrates nonlinear forward dynamics with the VAE-based laten...
- Learning to Hedge Swaptions : Abstract: This paper investigates the deep hedging framework, based on reinforcement learning (RL), for the dynamic hedging of swaptions, contrasting its performance with traditional sensitivity-based...
- FedDSR: Federated Deep Supervision and Regularization Towards Autonomous Driving : Abstract: Federated Learning (FL) enables collaborative training of autonomous driving (AD) models across distributed vehicles while preserving data privacy. However, FL encounters critical challenges...
- ADAM Optimization with Adaptive Batch Selection : Abstract: Adam is a widely used optimizer in neural network training due to its adaptive learning rate. However, because different data samples influence model updates to varying degrees, treating the...
- A Physics-Aware Attention LSTM Autoencoder for Early Fault Diagnosis of Battery Systems : Abstract: Battery safety is paramount for electric vehicles. Early fault diagnosis remains a challenge due to the subtle nature of anomalies and the interference of dynamic operating noise. Existing d...
- Hide-and-Seek Attribution: Weakly Supervised Segmentation of Vertebral Metastases in CT : Abstract: Accurate segmentation of vertebral metastasis in CT is clinically important yet difficult to scale, as voxel-level annotations are scarce and both lytic and blastic lesions often resemble be...
- MINES: Explainable Anomaly Detection through Web API Invariant Inference : Abstract: Detecting the anomalies of web applications, important infrastructures for running modern companies and governments, is crucial for providing reliable web services. Many modern web applicati...
- Energy-Efficient Navigation for Surface Vehicles in Vortical Flow Fields : Abstract: For centuries, khalasi have skillfully harnessed ocean currents to navigate vast waters with minimal effort. Emulating this intuition in autonomous systems remains a significant challenge, p...
- Symmetric Aggregation of Conformity Scores for Efficient Uncertainty Sets : Abstract: Access to multiple predictive models trained for the same task, whether in regression or classification, is increasingly common in many applications. Aggregating their predictive uncertainti...
- PARIS: Pruning Algorithm via the Representer theorem for Imbalanced Scenarios : Abstract: The challenge of \textbf{imbalanced regression} arises when standard Empirical Risk Minimization (ERM) biases models toward high-frequency regions of the data distribution, causing severe de...
- Statistical analysis of Inverse Entropy-regularized Reinforcement Learning : Abstract: Inverse reinforcement learning aims to infer the reward function that explains expert behavior observed through trajectories of state--action pairs. A long-standing difficulty in classical I...
- Learning Conditional Independence Differential Graphs From Time-Dependent Data : Abstract: Estimation of differences in conditional independence graphs (CIGs) of two time series Gaussian graphical models (TSGGMs) is investigated where the two TSGGMs are known to have similar struc...
- Joint Learning of Feasibility-Aware Signal Temporal Logic and BarrierNet for Robust and Correct Control : Abstract: Control Barrier Functions (CBFs) have emerged as a powerful tool for enforcing safety in optimization-based controllers, and their integration with Signal Temporal Logic (STL) has enabled th...
- Physics-Guided Diffusion Priors for Multi-Slice Reconstruction in Scientific Imaging : Abstract: Accurate multi-slice reconstruction from limited measurement data is crucial to speed up the acquisition process in medical and scientific imaging. However, it remains challenging due to the...
- Selective Masking based Self-Supervised Learning for Image Semantic Segmentation : Abstract: This paper proposes a novel self-supervised learning method for semantic segmentation using selective masking image reconstruction as the pretraining task. Our proposed method replaces the r...
- Evaluating and Preserving High-level Fidelity in Super-Resolution : Abstract: Recent image Super-Resolution (SR) models are achieving impressive effects in reconstructing details and delivering visually pleasant outputs. However, the overpowering generative ability ca...
- Ideal Attribution and Faithful Watermarks for Language Models : Abstract: We introduce ideal attribution mechanisms, a formal abstraction for reasoning about attribution decisions over strings. At the core of this abstraction lies the ledger, an append-only log of...
- DFIR-DETR: Frequency Domain Enhancement and Dynamic Feature Aggregation for Cross-Scene Small Object Detection : Abstract: Detecting small objects in UAV remote sensing images and identifying surface defects in industrial inspection remain difficult tasks. These applications face common obstacles: features are s...
- Chromatic Feature Vectors for 2-Trees: Exact Formulas for Partition Enumeration with Network Applications : Abstract: We establish closed-form enumeration formulas for chromatic feature vectors of 2-trees under the bichromatic triangle constraint. These efficiently computable structural features derive from...
- DeepSVM: Learning Stochastic Volatility Models with Physics-Informed Deep Operator Networks : Abstract: Real-time calibration of stochastic volatility models (SVMs) is computationally bottlenecked by the need to repeatedly solve coupled partial differential equations (PDEs). In this work, we p...
- Understanding Diffusion Models via Code Execution : Abstract: Diffusion models have achieved remarkable performance in generative modeling, yet their theoretical foundations are often intricate, and the gap between mathematical formulations in papers a...
- AutoLugano: A Deep Learning Framework for Fully Automated Lymphoma Segmentation and Lugano Staging on FDG-PET/CT : Abstract: Purpose: To develop a fully automated deep learning system, AutoLugano, for end-to-end lymphoma classification by performing lesion segmentation, anatomical localization, and automated Lugan...
- Coherent Audio-Visual Editing via Conditional Audio Generation Following Video Edits : Abstract: We introduce a novel pipeline for joint audio-visual editing that enhances the coherence between edited video and its accompanying audio. Our approach first applies state-of-the-art video ed...
- MUSE: A Simple Yet Effective Multimodal Search-Based Framework for Lifelong User Interest Modeling : Abstract: Lifelong user interest modeling is crucial for industrial recommender systems, yet existing approaches rely predominantly on ID-based features, suffering from poor generalization on long-tai...
- Clinical Interpretability of Deep Learning Segmentation Through Shapley-Derived Agreement and Uncertainty Metrics : Abstract: Segmentation is the identification of anatomical regions of interest, such as organs, tissue, and lesions, serving as a fundamental task in computer-aided diagnosis in medical imaging. Altho...
- AdLift: Lifting Adversarial Perturbations to Safeguard 3D Gaussian Splatting Assets Against Instruction-Driven Editing : Abstract: Recent studies have extended diffusion-based instruction-driven 2D image editing pipelines to 3D Gaussian Splatting (3DGS), enabling faithful manipulation of 3DGS assets and greatly advancin...
- Non-negative DAG Learning from Time-Series Data : Abstract: This work aims to learn the directed acyclic graph (DAG) that captures the instantaneous dependencies underlying a multivariate time series. The observed data follow a linear structural vect...
- A graph generation pipeline for critical infrastructures based on heuristics, images and depth data : Abstract: Virtual representations of physical critical infrastructures, such as water or energy plants, are used for simulations and digital twins to ensure resilience and continuity of their services...
- Verifiable Deep Quantitative Group Testing : Abstract: We present a neural network-based framework for solving the quantitative group testing (QGT) problem that achieves both high decoding accuracy and structural verifiability. In QGT, the objec...
- Equivariant Diffusion for Crystal Structure Prediction : Abstract: In addressing the challenge of Crystal Structure Prediction (CSP), symmetry-aware deep learning models, particularly diffusion models, have been extensively studied, which treat CSP as a con...
- Two-dimensional RMSD projections for reaction path visualization and validation : Abstract: Transition state or minimum energy path finding methods constitute a routine component of the computational chemistry toolkit. Standard analysis involves trajectories conventionally plotted ...
- Machine learning in an expectation-maximisation framework for nowcasting : Abstract: Decision making often occurs in the presence of incomplete information, leading to the under- or overestimation of risk. Leveraging the observable information to learn the complete informati...
- PrivORL: Differentially Private Synthetic Dataset for Offline Reinforcement Learning : Abstract: Recently, offline reinforcement learning (RL) has become a popular RL paradigm. In offline RL, data providers share pre-collected datasets -- either as individual transitions or sequences of...
- Microseismic event classification with a lightweight Fourier Neural Operator model : Abstract: Real-time monitoring of induced seismicity is crucial for mitigating operational hazards, relying on the rapid and accurate classification of microseismic events from continuous data streams...
- Optimized Machine Learning Methods for Studying the Thermodynamic Behavior of Complex Spin Systems : Abstract: This paper presents a systematic study of the application of convolutional neural networks (CNNs) as an efficient and versatile tool for the analysis of critical and low-temperature phase st...
- Affordance Field Intervention: Enabling VLAs to Escape Memory Traps in Robotic Manipulation : Abstract: Vision-Language-Action (VLA) models have shown great performance in robotic manipulation by mapping visual observations and language instructions directly to actions. However, they remain br...
- High-Dimensional Change Point Detection using Graph Spanning Ratio : Abstract: Inspired by graph-based methodologies, we introduce a novel graph-spanning algorithm designed to identify changes in both offline and online data across low to high dimensions. This versatil...
- On Conditional Independence Graph Learning From Multi-Attribute Gaussian Dependent Time Series : Abstract: Estimation of the conditional independence graph (CIG) of high-dimensional multivariate Gaussian time series from multi-attribute data is considered. Existing methods for graph estimation fo...
- $\phi$-test: Global Feature Selection and Inference for Shapley Additive Explanations : Abstract: We propose $φ$-test, a global feature-selection and significance procedure for black-box predictors that combines Shapley attributions with selective inference. Given a trained model and an ...
- Exploring Test-time Scaling via Prediction Merging on Large-Scale Recommendation : Abstract: Inspired by the success of language models (LM), scaling up deep learning recommendation systems (DLRS) has become a recent trend in the community. All previous methods tend to scale up the ...
- Delay-Aware Diffusion Policy: Bridging the Observation-Execution Gap in Dynamic Tasks : Abstract: As a robot senses and selects actions, the world keeps changing. This inference delay creates a gap of tens to hundreds of milliseconds between the observed state and the state at execution....
- PVeRA: Probabilistic Vector-Based Random Matrix Adaptation : Abstract: Large foundation models have emerged in the last years and are pushing performance boundaries for a variety of tasks. Training or even finetuning such models demands vast datasets and comput...
- A scalable and real-time neural decoder for topological quantum codes : Abstract: Fault-tolerant quantum computing will require error rates far below those achievable with physical qubits. Quantum error correction (QEC) bridges this gap, but depends on decoders being simu...
- Physics-Informed Neural Networks for Source Inversion and Parameters Estimation in Atmospheric Dispersion : Abstract: Recent studies have shown the success of deep learning in solving forward and inverse problems in engineering and scientific computing domains, such as physics-informed neural networks (PINN...
- Distribution-informed Online Conformal Prediction : Abstract: Conformal prediction provides a pivotal and flexible technique for uncertainty quantification by constructing prediction sets with a predefined coverage rate. Many online conformal predictio...
- LUNA: LUT-Based Neural Architecture for Fast and Low-Cost Qubit Readout : Abstract: Qubit readout is a critical operation in quantum computing systems, which maps the analog response of qubits into discrete classical states. Deep neural networks (DNNs) have recently emerged...
- Graph-Based Learning of Spectro-Topographical EEG Representations with Gradient Alignment for Brain-Computer Interfaces : Abstract: We present a novel graph-based learning of EEG representations with gradient alignment (GEEGA) that leverages multi-domain information to learn EEG representations for brain-computer interfa...
- An Adaptive Multi-Layered Honeynet Architecture for Threat Behavior Analysis via Deep Learning : Abstract: The escalating sophistication and variety of cyber threats have rendered static honeypots inadequate, necessitating adaptive, intelligence-driven deception. In this work, ADLAH is introduced...
- Do Generalisation Results Generalise? : Abstract: A large language model's (LLM's) out-of-distribution (OOD) generalisation ability is crucial to its deployment. Previous work assessing LLMs' generalisation performance, however, typically f...
- The Optimal Approximation Factor in Density Estimation : Abstract: Consider the following problem: given two arbitrary densities $q_1,q_2$ and a sample-access to an unknown target density $p$, find which of the $q_i$'s is closer to $p$ in total variation. ...
- Attacking All Tasks at Once Using Adversarial Examples in Multi-Task Learning : Abstract: Visual content understanding frequently relies on multi-task models to extract robust representations of a single visual input for multiple downstream tasks. However, in comparison to extens...
- Hidden Minima in Two-Layer ReLU Networks : Abstract: We consider the optimization problem arising from fitting two-layer ReLU networks with $d$ inputs under the square loss, where labels are generated by a target network. Two infinite families...
- Ensemble Learning of Machine Learning Force Fields : Abstract: Machine learning force fields (MLFFs) are a promising approach to balance the accuracy of quantum mechanics with the efficiency of classical potentials, yet selecting an optimal model amid i...
- SDT-GNN: Streaming-based Distributed Training Framework for Graph Neural Networks : Abstract: Recently, distributed GNN training frameworks, such as DistDGL and PyG, have been developed to enable training GNN models on large graphs by leveraging multiple GPUs in a distributed manner....
- Covariate-Elaborated Robust Partial Information Transfer with Conditional Spike-and-Slab Prior : Abstract: The popularity of transfer learning stems from the fact that it can borrow information from useful auxiliary datasets. Existing statistical transfer learning methods usually adopt a global s...
- Fast training and sampling of Restricted Boltzmann Machines : Abstract: Restricted Boltzmann Machines (RBMs) are powerful tools for modeling complex systems and extracting insights from data, but their training is hindered by the slow mixing of Markov Chain Mont...
- Evaluating Model Performance Under Worst-case Subpopulations : Abstract: The performance of ML models degrades when the training population is different from that seen under operation. Towards assessing distributional robustness, we study the worst-case performan...
- Mastering AI: Big Data, Deep Learning, and the Evolution of Large Language Models -- AutoML from Basics to State-of-the-Art Techniques : Abstract: A comprehensive guide to Automated Machine Learning (AutoML) is presented, covering fundamental principles, practical implementations, and future trends. The paper is structured to assist bo...
- SeqProFT: Sequence-only Protein Property Prediction with LoRA Finetuning : Abstract: Protein language models (PLMs) have demonstrated remarkable capabilities in learning relationships between protein sequences and functions. However, finetuning these large models requires su...
- JaGuard: Jamming Correction of GNSS Deviation with Deep Temporal Graphs : Abstract: Global Navigation Satellite Systems (GNSS) are increasingly exposed to intentional jamming, threatening reliability when accurate positioning and timing are most critical. We address this pr...
- K-DAREK: Distance Aware Error for Kurkova Kolmogorov Networks : Abstract: Neural networks are powerful parametric function approximators, while Gaussian processes (GPs) are nonparametric probabilistic models that place distributions over functions via kernel-defin...
- Real-time Air Pollution prediction model based on Spatiotemporal Big data : Abstract: Air pollution is one of the most concerns for urban areas. Many countries have constructed monitoring stations to hourly collect pollution values. Recently, there is a research in Daegu city...
- Exploiting Supply Chain Interdependencies for Stock Return Prediction: A Full-State Graph Convolutional LSTM : Abstract: Stock return prediction is fundamental to financial decision-making, yet traditional time series models fail to capture the complex interdependencies between companies in modern markets. We ...
- Zero Generalization Error Theorem for Random Interpolators via Algebraic Geometry : Abstract: We theoretically demonstrate that the generalization error of interpolators for machine learning models under teacher-student settings becomes 0 once the number of training samples exceeds a...
- LLM-Upgraded Graph Reinforcement Learning for Carbon-Aware Job Scheduling in Smart Manufacturing : Abstract: This paper presents \textsc{Luca}, a \underline{l}arge language model (LLM)-\underline{u}pgraded graph reinforcement learning framework for \underline{c}arbon-\underline{a}ware flexible job ...
- DDFI: Diverse and Distribution-aware Missing Feature Imputation via Two-step Reconstruction : Abstract: Incomplete node features are ubiquitous in real-world scenarios, e.g., the attributes of web users may be partly private, which causes the performance of Graph Neural Networks (GNNs) to decl...
- Optimizing Optimizers for Fast Gradient-Based Learning : Abstract: We lay the theoretical foundation for automating optimizer design in gradient-based learning. Based on the greedy principle, we formulate the problem of designing optimizers as maximizing th...
- Hankel-FNO: Fast Underwater Acoustic Charting Via Physics-Encoded Fourier Neural Operator : Abstract: Fast and accurate underwater acoustic charting is crucial for downstream tasks such as environment-aware sensor placement optimization and autonomous vehicle path planning. Conventional meth...
- A new initialisation to Control Gradients in Sinusoidal Neural network : Abstract: Proper initialisation strategy is of primary importance to mitigate gradient explosion or vanishing when training neural networks. Yet, the impact of initialisation parameters still lacks a ...
- Neural expressiveness for beyond importance model compression : Abstract: Neural Network Pruning has been established as driving force in the exploration of memory and energy efficient solutions with high throughput both during training and at test time. In this p...
- BitStopper: An Efficient Transformer Attention Accelerator via Stage-fusion and Early Termination : Abstract: Attention-based large language models (LLMs) have transformed modern AI applications, but the quadratic cost of self-attention imposes significant compute and memory overhead. Dynamic sparsi...
- Optimizing LLMs Using Quantization for Mobile Execution : Abstract: Large Language Models (LLMs) offer powerful capabilities, but their significant size and computational requirements hinder deployment on resource-constrained mobile devices. This paper inves...
- Diagnosis-based mortality prediction for intensive care unit patients via transfer learning : Abstract: In the intensive care unit, the underlying causes of critical illness vary substantially across diagnoses, yet prediction models accounting for diagnostic heterogeneity have not been systema...
- Hierarchical geometric deep learning enables scalable analysis of molecular dynamics : Abstract: Molecular dynamics simulations can generate atomically detailed trajectories of complex systems, but analyzing these dynamics can be challenging when systems lack well-established quantitati...
- On fine-tuning Boltz-2 for protein-protein affinity prediction : Abstract: Accurate prediction of protein-protein binding affinity is vital for understanding molecular interactions and designing therapeutics. We adapt Boltz-2, a state-of-the-art structure-based pro...
- A Fast and Effective Solution to the Problem of Look-ahead Bias in LLMs : Abstract: Applying LLMs to predictive tasks in finance is challenging due to look-ahead bias resulting from their training on long time-series data. This precludes the backtests typically employed in ...
- Vector Quantization using Gaussian Variational Autoencoder : Abstract: Vector quantized variational autoencoder (VQ-VAE) is a discrete auto-encoder that compresses images into discrete tokens. It is difficult to train due to discretization. In this paper, we pr...
- Quantum Temporal Convolutional Neural Networks for Cross-Sectional Equity Return Prediction: A Comparative Benchmark Study : Abstract: Quantum machine learning offers a promising pathway for enhancing stock market prediction, particularly under complex, noisy, and highly dynamic financial environments. However, many classic...
- The Impact of Data Characteristics on GNN Evaluation for Detecting Fake News : Abstract: Graph neural networks (GNNs) are widely used for the detection of fake news by modeling the content and propagation structure of news articles on social media. We show that two of the most c...
- Estimating Black Carbon Concentration from Urban Traffic Using Vision-Based Machine Learning : Abstract: Black carbon (BC) emissions in urban areas are primarily driven by traffic, with hotspots near major roads disproportionately affecting marginalized communities. Because BC monitoring is typ...
- The Meta-Learning Gap: Combining Hydra and Quant for Large-Scale Time Series Classification : Abstract: Time series classification faces a fundamental trade-off between accuracy and computational efficiency. While comprehensive ensembles like HIVE-COTE 2.0 achieve state-of-the-art accuracy, th...
- State Diversity Matters in Offline Behavior Distillation : Abstract: Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be...
- Mitigating Barren plateaus in quantum denoising diffusion probabilistic models : Abstract: Quantum generative models leverage quantum superposition and entanglement to enhance learning efficiency for both classical and quantum data. The quantum denoising diffusion probabilistic mo...
- Pathway to $O(\sqrt{d})$ Complexity bound under Wasserstein metric of flow-based models : Abstract: We provide attainable analytical tools to estimate the error of flow-based generative models under the Wasserstein metric and to establish the optimal sampling iteration complexity bound wit...
- Decoding Motor Behavior Using Deep Learning and Reservoir Computing : Abstract: We present a novel approach to EEG decoding for non-invasive brain machine interfaces (BMIs), with a focus on motor-behavior classification. While conventional convolutional architectures su...
- KV-CAR: KV Cache Compression using Autoencoders and KV Reuse in Large Language Models : Abstract: As Large Language Models (LLMs) scale in size and context length, the memory requirements of the key value (KV) cache have emerged as a major bottleneck during autoregressive decoding. The K...
- Enhancing Interpretability of AR-SSVEP-Based Motor Intention Recognition via CNN-BiLSTM and SHAP Analysis on EEG Data : Abstract: Patients with motor dysfunction show low subjective engagement in rehabilitation training. Traditional SSVEP-based brain-computer interface (BCI) systems rely heavily on external visual stim...
- Multi-Scale Protein Structure Modelling with Geometric Graph U-Nets : Abstract: Geometric Graph Neural Networks (GNNs) and Transformers have become state-of-the-art for learning from 3D protein structures. However, their reliance on message passing prevents them from ca...
- Optimal Analysis for Bandit Learning in Matching Markets with Serial Dictatorship : Abstract: The problem of two-sided matching markets is well-studied in computer science and economics, owing to its diverse applications across numerous domains. Since market participants are usually ...
- Measuring Over-smoothing beyond Dirichlet energy : Abstract: While Dirichlet energy serves as a prevalent metric for quantifying over-smoothing, it is inherently restricted to capturing first-order feature derivatives. To address this limitation, we p...
- Small-Gain Nash: Certified Contraction to Nash Equilibria in Differentiable Games : Abstract: Classical convergence guarantees for gradient-based learning in games require the pseudo-gradient to be (strongly) monotone in Euclidean geometry as shown by rosen(1965), a condition that of...
- Neural Factorization-based Bearing Fault Diagnosis : Abstract: This paper studies the key problems of bearing fault diagnosis of high-speed train. As the core component of the train operation system, the health of bearings is directly related to the saf...
- Know your Trajectory -- Trustworthy Reinforcement Learning deployment through Importance-Based Trajectory Analysis : Abstract: As Reinforcement Learning (RL) agents are increasingly deployed in real-world applications, ensuring their behavior is transparent and trustworthy is paramount. A key component of trust is e...
- Parent-Guided Semantic Reward Model (PGSRM): Embedding-Based Reward Functions for Reinforcement Learning of Transformer Language Models : Abstract: We introduce the Parent-Guided Semantic Reward Model (PGSRM), a lightweight reward framework for reinforcement learning (RL) of transformer language models. PGSRM replaces binary correctness...
- Prediction with Expert Advice under Local Differential Privacy : Abstract: We study the classic problem of prediction with expert advice under the constraint of local differential privacy (LDP). In this context, we first show that a classical algorithm naturally sa...
- LLM-Driven Composite Neural Architecture Search for Multi-Source RL State Encoding : Abstract: Designing state encoders for reinforcement learning (RL) with multiple information sources -- such as sensor measurements, time-series signals, image observations, and textual instructions -...
- OXtal: An All-Atom Diffusion Model for Organic Crystal Structure Prediction : Abstract: Accurately predicting experimentally-realizable 3D molecular crystal structures from their 2D chemical graphs is a long-standing open challenge in computational chemistry called crystal stru...
- Toward Reliable Machine Unlearning: Theory, Algorithms, and Evaluation : Abstract: We propose new methodologies for both unlearning random set of samples and class unlearning and show that they outperform existing methods. The main driver of our unlearning methods is the s...
- Always Keep Your Promises: DynamicLRP, A Model-Agnostic Solution To Layer-Wise Relevance Propagation : Abstract: Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing imp...
- Block Sparse Flash Attention : Abstract: Modern large language models increasingly require long contexts for reasoning and multi-document tasks, but attention's quadratic complexity creates a severe computational bottleneck. We pre...
- Transformation of Biological Networks into Images via Semantic Cartography for Visual Interpretation and Scalable Deep Analysis : Abstract: Complex biological networks are fundamental to biomedical science, capturing interactions among molecules, cells, genes, and tissues. Deciphering these networks is critical for understanding...
- TRACE: A Generalizable Drift Detector for Streaming Data-Driven Optimization : Abstract: Many optimization tasks involve streaming data with unknown concept drifts, posing a significant challenge as Streaming Data-Driven Optimization (SDDO). Existing methods, while leveraging su...
- Dual Refinement Cycle Learning: Unsupervised Text Classification of Mamba and Community Detection on Text Attributed Graph : Abstract: Pretrained language models offer strong text understanding capabilities but remain difficult to deploy in real-world text-attributed networks due to their heavy dependence on labeled data. M...
- PlantBiMoE: A Bidirectional Foundation Model with SparseMoE for Plant Genomes : Abstract: Understanding the underlying linguistic rules of plant genomes remains a fundamental challenge in computational biology. Recent advances including AgroNT and PDLLMs have made notable progres...
- Improving the Throughput of Diffusion-based Large Language Models via a Training-Free Confidence-Aware Calibration : Abstract: We present CadLLM, a training-free method to accelerate the inference throughput of diffusion-based LLMs (dLLMs). We first investigate the dynamic nature of token unmasking confidence across...
- SPACE: Noise Contrastive Estimation Stabilizes Self-Play Fine-Tuning for Large Language Models : Abstract: Self-play fine-tuning has demonstrated promising abilities in adapting large language models (LLMs) to downstream tasks with limited real-world data. The basic principle is to iteratively re...
- UniDiff: A Unified Diffusion Framework for Multimodal Time Series Forecasting : Abstract: As multimodal data proliferates across diverse real-world applications, leveraging heterogeneous information such as texts and timestamps for accurate time series forecasting (TSF) has becom...
- Less is More: Non-uniform Road Segments are Efficient for Bus Arrival Prediction : Abstract: In bus arrival time prediction, the process of organizing road infrastructure network data into homogeneous entities is known as segmentation. Segmenting a road network is widely recognized ...
- Pay Less Attention to Function Words for Free Robustness of Vision-Language Models : Abstract: To address the trade-off between robustness and performance for robust VLM, we observe that function words could incur vulnerability of VLMs against cross-modal adversarial attacks, and prop...
- PINE: Pipeline for Important Node Exploration in Attributed Networks : Abstract: A graph with semantically attributed nodes are a common data structure in a wide range of domains. It could be interlinked web data or citation networks of scientific publications. The essen...
- Towards a Relationship-Aware Transformer for Tabular Data : Abstract: Deep learning models for tabular data typically do not allow for imposing a graph of external dependencies between samples, which can be useful for accounting for relatedness in tasks such a...
- Learning-Augmented Ski Rental with Discrete Distributions: A Bayesian Approach : Abstract: We revisit the classic ski rental problem through the lens of Bayesian decision-making and machine-learned predictions. While traditional algorithms minimize worst-case cost without assumpti...
- Recover-to-Forget: Gradient Reconstruction from LoRA for Efficient LLM Unlearning : Abstract: Unlearning in large foundation models (e.g., LLMs) is essential for enabling dynamic knowledge updates, enforcing data deletion rights, and correcting model behavior. However, existing unlea...
- LUNE: Efficient LLM Unlearning via LoRA Fine-Tuning with Negative Examples : Abstract: Large language models (LLMs) possess vast knowledge acquired from extensive training corpora, but they often cannot remove specific pieces of information when needed, which makes it hard to ...
- Towards Reliable Test-Time Adaptation: Style Invariance as a Correctness Likelihood : Abstract: Test-time adaptation (TTA) enables efficient adaptation of deployed models, yet it often leads to poorly calibrated predictive uncertainty - a critical issue in high-stakes domains such as a...
- Empirical Results for Adjusting Truncated Backpropagation Through Time while Training Neural Audio Effects : Abstract: This paper investigates the optimization of Truncated Backpropagation Through Time (TBPTT) for training neural networks in digital audio effect modeling, with a focus on dynamic range compre...
- Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning : Abstract: Effective traffic control is essential for mitigating congestion in transportation networks. Conventional traffic management strategies, including route guidance, ramp metering, and traffic ...
- Revolutionizing Mixed Precision Quantization: Towards Training-free Automatic Proxy Discovery via Large Language Models : Abstract: Mixed-Precision Quantization (MPQ) liberates the Deep Neural Networks (DNNs) from the Out-Of-Memory (OOM) bottleneck, which garnered increasing research attention. However, conventional meth...
- Mitigating Bias in Graph Hyperdimensional Computing : Abstract: Graph hyperdimensional computing (HDC) has emerged as a promising paradigm for cognitive tasks, emulating brain-like computation with high-dimensional vectors known as hypervectors. While HD...
- Parallel Algorithms for Combined Regularized Support Vector Machines: Application in Music Genre Classification : Abstract: In the era of rapid development of artificial intelligence, its applications span across diverse fields, relying heavily on effective data processing and model optimization. Combined Regular...
- Materium: An Autoregressive Approach for Material Generation : Abstract: We present Materium: an autoregressive transformer for generating crystal structures that converts 3D material representations into token sequences. These sequences include elements with oxi...
- Efficient Low-Tubal-Rank Tensor Estimation via Alternating Preconditioned Gradient Descent : Abstract: The problem of low-tubal-rank tensor estimation is a fundamental task with wide applications across high-dimensional signal processing, machine learning, and image science. Traditional appro...
- Machine Learning: Progress and Prospects : Abstract: This Inaugural Lecture was given at Royal Holloway University of London in 1996. It covers an introduction to machine learning and describes various theoretical advances and practical projec...
- FRWKV:Frequency-Domain Linear Attention for Long-Term Time Series Forecasting : Abstract: Traditional Transformers face a major bottleneck in long-sequence time series forecasting due to their quadratic complexity $(\mathcal{O}(T^2))$ and their limited ability to effectively expl...
- RRAEDy: Adaptive Latent Linearization of Nonlinear Dynamical Systems : Abstract: Most existing latent-space models for dynamical systems require fixing the latent dimension in advance, they rely on complex loss balancing to approximate linear dynamics, and they don't reg...
- ReLaX: Reasoning with Latent Exploration for Large Reasoning Models : Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has recently demonstrated remarkable potential in enhancing the reasoning capability of Large Reasoning Models (LRMs). However, RLVR oft...
- Depth-Wise Activation Steering for Honest Language Models : Abstract: Large language models sometimes assert falsehoods despite internally representing the correct answer, failures of honesty rather than accuracy, which undermines auditability and safety. Exis...
- A Bootstrap Perspective on Stochastic Gradient Descent : Abstract: Machine learning models trained with \emph{stochastic} gradient descent (SGD) can generalize better than those trained with deterministic gradient descent (GD). In this work, we study SGD's ...
- A multimodal Bayesian Network for symptom-level depression and anxiety prediction from voice and speech data : Abstract: During psychiatric assessment, clinicians observe not only what patients report, but important nonverbal signs such as tone, speech rate, fluency, responsiveness, and body language. Weighing...
- Formalized Hopfield Networks and Boltzmann Machines : Abstract: Neural networks are widely used, yet their analysis and verification remain challenging. In this work, we present a Lean 4 formalization of neural networks, covering both deterministic and s...
- GatedFWA: Linear Flash Windowed Attention with Gated Associative Memory : Abstract: Modern autoregressive models rely on attention, yet the Softmax full attention in Transformers scales quadratically with sequence length. Sliding Window Attention (SWA) achieves linear-time ...
- The Adoption and Usage of AI Agents: Early Evidence from Perplexity : Abstract: This paper presents the first large-scale field study of the adoption, usage intensity, and use cases of general-purpose AI agents operating in open-world web environments. Our analysis cent...
- Intrusion Detection on Resource-Constrained IoT Devices with Hardware-Aware ML and DL : Abstract: This paper proposes a hardware-aware intrusion detection system (IDS) for Internet of Things (IoT) and Industrial IoT (IIoT) networks; it targets scenarios where classification is essential ...
- Physics Enhanced Deep Surrogates for the Phonon Boltzmann Transport Equation : Abstract: Designing materials with controlled heat flow at the nano-scale is central to advances in microelectronics, thermoelectrics, and energy-conversion technologies. At these scales, phonon trans...
- Multi-resolution Physics-Aware Recurrent Convolutional Neural Network for Complex Flows : Abstract: We present MRPARCv2, Multi-resolution Physics-Aware Recurrent Convolutional Neural Network, designed to model complex flows by embedding the structure of advection-diffusion-reaction equatio...
- Closed-Loop Robotic Manipulation of Transparent Substrates for Self-Driving Laboratories using Deep Learning Micro-Error Correction : Abstract: Self-driving laboratories (SDLs) have accelerated the throughput and automation capabilities for discovering and improving chemistries and materials. Although these SDLs have automated many ...
- Unifying Entropy Regularization in Optimal Control: From and Back to Classical Objectives via Iterated Soft Policies and Path Integral Solutions : Abstract: This paper develops a unified perspective on several stochastic optimal control formulations through the lens of Kullback-Leibler regularization. We propose a central problem that separates ...
- Hardware Software Optimizations for Fast Model Recovery on Reconfigurable Architectures : Abstract: Model Recovery (MR) is a core primitive for physical AI and real-time digital twins, but GPUs often execute MR inefficiently due to iterative dependencies, kernel-launch overheads, underutil...
- Beyond Lux thresholds: a systematic pipeline for classifying biologically relevant light contexts from wearable data : Abstract: Background: Wearable spectrometers enable field quantification of biologically relevant light, yet reproducible pipelines for contextual classification remain under-specified. Objective: T...
- The MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024: Efficient and Robust Aggregation Methods for Federated Learning : Abstract: We present the design and results of the MICCAI Federated Tumor Segmentation (FeTS) Challenge 2024, which focuses on federated learning (FL) for glioma sub-region segmentation in multi-param...
- SparsePixels: Efficient Convolution for Sparse Data on FPGAs : Abstract: Inference of standard CNNs on FPGAs often incurs high latency and a long initiation interval due to the deep nested loops required to densely convolve every input pixel regardless of its fea...
- Forests of Uncertaint(r)ees: Using tree-based ensembles to estimate probability distributions of future conflict : Abstract: Predictions of fatalities from violent conflict on the PRIO-GRID-month (pgm) level are characterized by high levels of uncertainty, limiting their usefulness in practical applications. We di...
- A Broader View on Clustering under Cluster-Aware Norm Objectives : Abstract: We revisit the $(f,g)$-clustering problem that we introduced in a recent work [SODA'25], and which subsumes fundamental clustering problems such as $k$-Center, $k$-Median, Min-Sum of Radii, ...
- Automated Data Enrichment using Confidence-Aware Fine-Grained Debate among Open-Source LLMs for Mental Health and Online Safety : Abstract: Real-world indicators are important for improving natural language processing (NLP) tasks such as life events for mental health analysis and risky behaviour for online safety, yet labelling ...
- Opinion: Learning Intuitive Physics May Require More than Visual Data : Abstract: Humans expertly navigate the world by building rich internal models founded on an intuitive understanding of physics. Meanwhile, despite training on vast quantities of internet video data, s...
- Contextual Strongly Convex Simulation Optimization: Optimize then Predict with Inexact Solutions : Abstract: In this work, we study contextual strongly convex simulation optimization and adopt an "optimize then predict" (OTP) approach for real-time decision making. In the offline stage, simulation ...
- Interpretable Neural Approximation of Stochastic Reaction Dynamics with Guaranteed Reliability : Abstract: Stochastic Reaction Networks (SRNs) are a fundamental modeling framework for systems ranging from chemical kinetics and epidemiology to ecological and synthetic biological processes. A centr...
- Modeling Spatio-temporal Extremes via Conditional Variational Autoencoders : Abstract: Extreme weather events are widely studied in fields such as agriculture, ecology, and meteorology. The spatio-temporal co-occurrence of extreme events can strengthen or weaken under changing...
- Automated Deep Learning Estimation of Anthropometric Measurements for Preparticipation Cardiovascular Screening : Abstract: Preparticipation cardiovascular examination (PPCE) aims to prevent sudden cardiac death (SCD) by identifying athletes with structural or electrical cardiac abnormalities. Anthropometric meas...
- Canonical Tail Dependence for Soft Extremal Clustering of Multichannel Brain Signals : Abstract: We develop a novel characterization of extremal dependence between two cortical regions of the brain when its signals display extremely large amplitudes. We show that connectivity in the tai...
- On The Role of K-Space Acquisition in MRI Reconstruction Domain-Generalization : Abstract: Recent work has established learned k-space acquisition patterns as a promising direction for improving reconstruction quality in accelerated Magnetic Resonance Imaging (MRI). Despite encour...
- Approximate Multiplier Induced Error Propagation in Deep Neural Networks : Abstract: Deep Neural Networks (DNNs) rely heavily on dense arithmetic operations, motivating the use of Approximate Multipliers (AxMs) to reduce energy consumption in hardware accelerators. However, ...
- DataGovBench: Benchmarking LLM Agents for Real-World Data Governance Workflows : Abstract: Data governance ensures data quality, security, and compliance through policies and standards, a critical foundation for scaling modern AI development. Recently, large language models (LLMs)...
- AI-Assisted Game Management Decisions: A Fuzzy Logic Approach to Real-Time Soccer Substitutions : Abstract: In elite soccer, substitution decisions entail significant financial and sporting consequences yet remain heavily reliant on intuition or predictive models that merely mimic historical biase...
- Model-Based and Sample-Efficient AI-Assisted Math Discovery in Sphere Packing : Abstract: Sphere packing, Hilbert's eighteenth problem, asks for the densest arrangement of congruent spheres in n-dimensional Euclidean space. Although relevant to areas such as cryptography, crystal...
- Deep transfer learning for image classification: a survey : Abstract: Deep neural networks such as convolutional neural networks (CNNs) and transformers have achieved many successes in image classification in recent years. It has been consistently demonstrated...
- A Unified Perspective for Loss-Oriented Imbalanced Learning via Localization : Abstract: Due to the inherent imbalance in real-world datasets, naïve Empirical Risk Minimization (ERM) tends to bias the learning process towards the majority classes, hindering generalization to min...
- Deep Learning Meets Mechanism Design: Key Results and Some Novel Applications : Abstract: Mechanism design is essentially reverse engineering of games and involves inducing a game among strategic agents in a way that the induced game satisfies a set of desired properties in an eq...
- I Learn Better If You Speak My Language: Understanding the Superior Performance of Fine-Tuning Large Language Models with LLM-Generated Responses : Abstract: This paper explores an intriguing observation: fine-tuning a large language model (LLM) with responses generated by a LLM often yields better results than using responses generated by humans...
- Generative AI and Copyright: A Dynamic Perspective : Abstract: The rapid advancement of generative AI is poised to disrupt the creative industry. Amidst the immense excitement for this new technology, its future development and applications in the creat...
- TimeAutoDiff: A Unified Framework for Generation, Imputation, Forecasting, and Time-Varying Metadata Conditioning of Heterogeneous Time Series Tabular Data : Abstract: We present TimeAutoDiff, a unified latent-diffusion framework for four fundamental time-series tasks: unconditional generation, missing-data imputation, forecasting, and time-varying-metadat...
- Bridging Weighted First Order Model Counting and Graph Polynomials : Abstract: The Weighted First-Order Model Counting Problem (WFOMC) asks to compute the weighted sum of models of a given first-order logic sentence over a given domain. It can be solved in time polynom...
- Moyun: A Diffusion-Based Model for Style-Specific Chinese Calligraphy Generation : Abstract: Although Chinese calligraphy generation has achieved style transfer, generating calligraphy by specifying the calligrapher, font, and character style remains challenging. To address this, we...
- Rethinking Normalization Strategies and Convolutional Kernels for Multimodal Image Fusion : Abstract: Multimodal image fusion (MMIF) integrates information from different modalities to obtain a comprehensive image, aiding downstream tasks. However, existing research focuses on complementary ...
- Twisted Convolutional Networks (TCNs): Enhancing Feature Interactions for Non-Spatial Data Classification : Abstract: Twisted Convolutional Networks (TCNs) are proposed as a novel deep learning architecture for classifying one-dimensional data with arbitrary feature order and minimal spatial relationships. ...
- Charting the Shapes of Stories with Game Theory : Abstract: Stories are records of our experiences and their analysis reveals insights into the nature of being human. Successful analyses are often interdisciplinary, leveraging mathematical tools to e...
- MeshA*: Efficient Path Planning With Motion Primitives : Abstract: We study a path planning problem where the possible move actions are represented as a finite set of motion primitives aligned with the grid representation of the environment. That is, each p...
- Quantum-Classical Hybrid Quantized Neural Network : Abstract: In this work, we introduce a novel Quadratic Binary Optimization (QBO) framework for training a quantized neural network. The framework enables the use of arbitrary activation and loss funct...
- Ghost in the Transformer: Detecting Model Reuse with Invariant Spectral Signatures : Abstract: Large Language Models (LLMs) are widely adopted, but their high training cost leads many developers to fine-tune existing open-source models. While most adhere to open-source licenses, some ...
- When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models : Abstract: Frontier large language models (LLMs) such as ChatGPT, Grok and Gemini are increasingly used for mental-health support with anxiety, trauma and self-worth. Most work treats them as tools or ...
- CryptoTensors: A Light-Weight Large Language Model File Format for Highly-Secure Model Distribution : Abstract: To enhance the performance of large language models (LLMs) in various domain-specific applications, sensitive data such as healthcare, law, and finance are being used to privately customize ...
- A self-driving lab for solution-processed electrochromic thin films : Abstract: Solution-processed electrochromic materials offer high potential for energy-efficient smart windows and displays. Their performance varies with material choice and processing conditions. Ele...
- Memory-Amortized Inference: A Topological Unification of Search, Closure, and Structure : Abstract: Contemporary ML separates the static structure of parameters from the dynamic flow of inference, yielding systems that lack the sample efficiency and thermodynamic frugality of biological co...
- Deep learning recognition and analysis of Volatile Organic Compounds based on experimental and synthetic infrared absorption spectra : Abstract: Volatile Organic Compounds (VOCs) are organic molecules that have low boiling points and therefore easily evaporate into the air. They pose significant risks to human health, making their ac...
- ARC-AGI Without Pretraining : Abstract: Conventional wisdom in the age of LLMs dictates that solving IQ-test-like visual puzzles from the ARC-AGI-1 benchmark requires capabilities derived from massive pretraining. To counter this,...
- A Prescriptive Framework for Determining Optimal Days for Short-Term Traffic Counts : Abstract: The Federal Highway Administration (FHWA) mandates that state Departments of Transportation (DOTs) collect reliable Annual Average Daily Traffic (AADT) data. However, many U.S. DOTs struggle...
- gp2Scale: A Class of Compactly-Supported Non-Stationary Kernels and Distributed Computing for Exact Gaussian Processes on 10 Million Data Points : Abstract: Despite a large corpus of recent work on scaling up Gaussian processes, a stubborn trade-off between computational speed, prediction and uncertainty quantification accuracy, and customizabil...
- PMA-Diffusion: A Physics-guided Mask-Aware Diffusion Framework for TSE from Sparse Observations : Abstract: High-resolution highway traffic state information is essential for Intelligent Transportation Systems, but typical traffic data acquired from loop detectors and probe vehicles are often too ...
- How Should We Evaluate Data Deletion in Graph-Based ANN Indexes? : Abstract: Approximate Nearest Neighbor Search (ANNS) has recently gained significant attention due to its many applications, such as Retrieval-Augmented Generation. Such applications require ANNS algo...
- K2-V2: A 360-Open, Reasoning-Enhanced LLM : Abstract: We introduce K2-V2, a 360-open LLM built from scratch as a superior base for reasoning adaptation, in addition to functions such as conversation and knowledge retrieval from general LLMs. It...
- Average-reward reinforcement learning in semi-Markov decision processes via relative value iteration : Abstract: This paper applies the authors' recent results on asynchronous stochastic approximation (SA) in the Borkar-Meyn framework to reinforcement learning in average-reward semi-Markov decision pro...
- Back to Author Console Empowering GNNs for Domain Adaptation via Denoising Target Graph : Abstract: We explore the node classification task in the context of graph domain adaptation, which uses both source and target graph structures along with source labels to enhance the generalization c...
- Quantization Blindspots: How Model Compression Breaks Backdoor Defenses : Abstract: Backdoor attacks embed input-dependent malicious behavior into neural networks while preserving high clean accuracy, making them a persistent threat for deployed ML systems. At the same time...
- Learning When to Switch: Adaptive Policy Selection via Reinforcement Learning : Abstract: Autonomous agents often require multiple strategies to solve complex tasks, but determining when to switch between strategies remains challenging. This research introduces a reinforcement le...
- Learning Without Time-Based Embodiment Resets in Soft-Actor Critic : Abstract: When creating new reinforcement learning tasks, practitioners often accelerate the learning process by incorporating into the task several accessory components, such as breaking the environm...
- Theoretical Compression Bounds for Wide Multilayer Perceptrons : Abstract: Pruning and quantization techniques have been broadly successful in reducing the number of parameters needed for large neural networks, yet theoretical justification for their empirical succ...
- Importance-aware Topic Modeling for Discovering Public Transit Risk from Noisy Social Media : Abstract: Urban transit agencies increasingly turn to social media to monitor emerging service risks such as crowding, delays, and safety incidents, yet the signals of concern are sparse, short, and e...
- Multimodal Graph Neural Networks for Prognostic Modeling of Brain Network Reorganization : Abstract: Understanding the dynamic reorganization of brain networks is critical for predicting cognitive decline, neurological progression, and individual variability in clinical outcomes. This work ...
- Interpretive Efficiency: Information-Geometric Foundations of Data Usefulness : Abstract: Interpretability is central to trustworthy machine learning, yet existing metrics rarely quantify how effectively data support an interpretive representation. We propose Interpretive Efficie...
- TrajMoE: Scene-Adaptive Trajectory Planning with Mixture of Experts and Reinforcement Learning : Abstract: Current autonomous driving systems often favor end-to-end frameworks, which take sensor inputs like images and learn to map them into trajectory space via neural networks. Previous work has ...
- A Large-Scale Multimodal Dataset and Benchmarks for Human Activity Scene Understanding and Reasoning : Abstract: Multimodal human action recognition (HAR) leverages complementary sensors for activity classification. Beyond recognition, recent advances in large language models (LLMs) enable detailed des...
- Winning the Lottery by Preserving Network Training Dynamics with Concrete Ticket Search : Abstract: The Lottery Ticket Hypothesis asserts the existence of highly sparse, trainable subnetworks ('winning tickets') within dense, randomly initialized neural networks. However, state-of-the-art ...
- FlowLPS: Langevin-Proximal Sampling for Flow-based Inverse Problem Solvers : Abstract: Deep generative models have become powerful priors for solving inverse problems, and various training-free methods have been developed. However, when applied to latent flow models, existing ...
- JEPA as a Neural Tokenizer: Learning Robust Speech Representations with Density Adaptive Attention : Abstract: We introduce a two-stage self-supervised framework that combines the Joint-Embedding Predictive Architecture (JEPA) with a Density Adaptive Attention Mechanism (DAAM) for learning robust spe...
- Towards Unified Semantic and Controllable Image Fusion: A Diffusion Transformer Approach : Abstract: Image fusion aims to blend complementary information from multiple sensing modalities, yet existing approaches remain limited in robustness, adaptability, and controllability. Most current f...
- START: Spatial and Textual Learning for Chart Understanding : Abstract: Chart understanding is crucial for deploying multimodal large language models (MLLMs) in real-world scenarios such as analyzing scientific papers and technical reports. Unlike natural images...
- MASim: Multilingual Agent-Based Simulation for Social Science : Abstract: Multi-agent role-playing has recently shown promise for studying social behavior with language agents, but existing simulations are mostly monolingual and fail to model cross-lingual interac...
- Geometric Prior-Guided Federated Prompt Calibration : Abstract: Federated Prompt Learning (FPL) offers a parameter-efficient solution for collaboratively training large models, but its performance is severely hindered by data heterogeneity, which causes ...
- VFM-VLM: Vision Foundation Model and Vision Language Model based Visual Comparison for 3D Pose Estimation : Abstract: Vision Foundation Models (VFMs) and Vision Language Models (VLMs) have revolutionized computer vision by providing rich semantic and geometric representations. This paper presents a comprehe...
- NeSTR: A Neuro-Symbolic Abductive Framework for Temporal Reasoning in Large Language Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, temporal reasoning, particularly under complex tempor...
- Towards Robust Protective Perturbation against DeepFake Face Swapping : Abstract: DeepFake face swapping enables highly realistic identity forgeries, posing serious privacy and security risks. A common defence embeds invisible perturbations into images, but these are frag...
- Dropout Prompt Learning: Towards Robust and Adaptive Vision-Language Models : Abstract: Dropout is a widely used regularization technique which improves the generalization ability of a model by randomly dropping neurons. In light of this, we propose Dropout Prompt Learning, whi...
- IFFair: Influence Function-driven Sample Reweighting for Fair Classification : Abstract: Because machine learning has significantly improved efficiency and convenience in the society, it's increasingly used to assist or replace human decision-making. However, the data-based patt...
- DGGAN: Degradation Guided Generative Adversarial Network for Real-time Endoscopic Video Enhancement : Abstract: Endoscopic surgery relies on intraoperative video, making image quality a decisive factor for surgical safety and efficacy. Yet, endoscopic videos are often degraded by uneven illumination, ...
- SINRL: Socially Integrated Navigation with Reinforcement Learning using Spiking Neural Networks : Abstract: Integrating autonomous mobile robots into human environments requires human-like decision-making and energy-efficient, event-based computation. Despite progress, neuromorphic methods are rar...
- Effective Attention-Guided Multi-Scale Medical Network for Skin Lesion Segmentation : Abstract: In the field of healthcare, precise skin lesion segmentation is crucial for the early detection and accurate diagnosis of skin diseases. Despite significant advances in deep learning for ima...
- SIT-Graph: State Integrated Tool Graph for Multi-Turn Agents : Abstract: Despite impressive advances in agent systems, multi-turn tool-use scenarios remain challenging. It is mainly because intent is clarified progressively and the environment evolves with each t...
- Towards Accurate UAV Image Perception: Guiding Vision-Language Models with Stronger Task Prompts : Abstract: Existing image perception methods based on VLMs generally follow a paradigm wherein models extract and analyze image content based on user-provided textual task prompts. However, such method...
- Exact Synthetic Populations for Scalable Societal and Market Modeling : Abstract: We introduce a constraint-programming framework for generating synthetic populations that reproduce target statistics with high precision while enforcing full individual consistency. Unlike ...
- Radiance-Field Reinforced Pretraining: Scaling Localization Models with Unlabeled Wireless Signals : Abstract: Radio frequency (RF)-based indoor localization offers significant promise for applications such as indoor navigation, augmented reality, and pervasive computing. While deep learning has grea...
- DCO: Dynamic Cache Orchestration for LLM Accelerators through Predictive Management : Abstract: The rapid adoption of large language models (LLMs) is pushing AI accelerators toward increasingly powerful and specialized designs. Instead of further complicating software development with ...
- ContextAnyone: Context-Aware Diffusion for Character-Consistent Text-to-Video Generation : Abstract: Text-to-video (T2V) generation has advanced rapidly, yet maintaining consistent character identities across scenes remains a major challenge. Existing personalization methods often focus on ...
- Local-Curvature-Aware Knowledge Graph Embedding: An Extended Ricci Flow Approach : Abstract: Knowledge graph embedding (KGE) relies on the geometry of the embedding space to encode semantic and structural relations. Existing methods place all entities on one homogeneous manifold, Eu...
- Venus: An Efficient Edge Memory-and-Retrieval System for VLM-based Online Video Understanding : Abstract: Vision-language models (VLMs) have demonstrated impressive multimodal comprehension capabilities and are being deployed in an increasing number of online video understanding applications. Wh...
- DeepAgent: A Dual Stream Multi Agent Fusion for Robust Multimodal Deepfake Detection : Abstract: The increasing use of synthetic media, particularly deepfakes, is an emerging challenge for digital content verification. Although recent studies use both audio and visual information, most ...
- Structure-Aware Feature Rectification with Region Adjacency Graphs for Training-Free Open-Vocabulary Semantic Segmentation : Abstract: Benefiting from the inductive biases learned from large-scale datasets, open-vocabulary semantic segmentation (OVSS) leverages the power of vision-language models, such as CLIP, to achieve r...
- ESPADA: Execution Speedup via Semantics Aware Demonstration Data Downsampling for Imitation Learning : Abstract: Behavior-cloning based visuomotor policies enable precise manipulation but often inherit the slow, cautious tempo of human demonstrations, limiting practical deployment. However, prior studi...
- Asymptotic analysis of shallow and deep forgetting in replay with Neural Collapse : Abstract: A persistent paradox in continual learning (CL) is that neural networks often retain linearly separable representations of past tasks even when their output predictions fail. We formalize th...
- Do LLMs Trust the Code They Write? : Abstract: Despite the effectiveness of large language models (LLMs) for code generation, they often output incorrect code. One reason is that model output probabilities are often not well-correlated w...
- Data-driven Exploration of Mobility Interaction Patterns : Abstract: Understanding the movement behaviours of individuals and the way they react to the external world is a key component of any problem that involves the modelling of human dynamics at a physica...
- When normalization hallucinates: unseen risks in AI-powered whole slide image processing : Abstract: Whole slide image (WSI) normalization remains a vital preprocessing step in computational pathology. Increasingly driven by deep learning, these models learn to approximate data distribution...
- MIDG: Mixture of Invariant Experts with knowledge injection for Domain Generalization in Multimodal Sentiment Analysis : Abstract: Existing methods in domain generalization for Multimodal Sentiment Analysis (MSA) often overlook inter-modal synergies during invariant features extraction, which prevents the accurate captu...
- KAN-Dreamer: Benchmarking Kolmogorov-Arnold Networks as Function Approximators in World Models : Abstract: DreamerV3 is a state-of-the-art online model-based reinforcement learning (MBRL) algorithm known for remarkable sample efficiency. Concurrently, Kolmogorov-Arnold Networks (KANs) have emerge...
- Forget and Explain: Transparent Verification of GNN Unlearning : Abstract: Graph neural networks (GNNs) are increasingly used to model complex patterns in graph-structured data. However, enabling them to "forget" designated information remains challenging, especial...
- Social welfare optimisation in well-mixed and structured populations : Abstract: Research on promoting cooperation among autonomous, self-regarding agents has often focused on the bi-objective optimisation problem: minimising the total incentive cost while maximising the...
- Persian-Phi: Efficient Cross-Lingual Adaptation of Compact LLMs via Curriculum Learning : Abstract: The democratization of AI is currently hindered by the immense computational costs required to train Large Language Models (LLMs) for low-resource languages. This paper presents Persian-Phi,...
- Understanding LLM Agent Behaviours via Game Theory: Strategy Recognition, Biases and Multi-Agent Dynamics : Abstract: As Large Language Models (LLMs) increasingly operate as autonomous decision-makers in interactive and multi-agent systems and human societies, understanding their strategic behaviour has pro...
- From Real-World Traffic Data to Relevant Critical Scenarios : Abstract: The reliable operation of autonomous vehicles, automated driving functions, and advanced driver assistance systems across a wide range of relevant scenarios is critical for their development...
- Artificial Intelligence and Nuclear Weapons Proliferation: The Technological Arms Race for (In)visibility : Abstract: A robust nonproliferation regime has contained the spread of nuclear weapons to just nine states. Yet, emerging and disruptive technologies are reshaping the landscape of nuclear risks, pres...
- AutoICE: Automatically Synthesizing Verifiable C Code via LLM-driven Evolution : Abstract: Automatically synthesizing verifiable code from natural language requirements ensures software correctness and reliability while significantly lowering the barrier to adopting the techniques...
- Exploring possible vector systems for faster training of neural networks with preconfigured latent spaces : Abstract: The overall neural network (NN) performance is closely related to the properties of its embedding distribution in latent space (LS). It has recently been shown that predefined vector systems...
- SPAD: Seven-Source Token Probability Attribution with Syntactic Aggregation for Detecting Hallucinations in RAG : Abstract: Detecting hallucinations in Retrieval-Augmented Generation (RAG) remains a challenge. Prior approaches attribute hallucinations to a binary conflict between internal knowledge (stored in FFN...
- LIME: Making LLM Data More Efficient with Linguistic Metadata Embeddings : Abstract: Pre-training decoder-only language models relies on vast amounts of high-quality data, yet the availability of such data is increasingly reaching its limits. While metadata is commonly used ...
- Model-Based Reinforcement Learning Under Confounding : Abstract: We investigate model-based reinforcement learning in contextual Markov decision processes (C-MDPs) in which the context is unobserved and induces confounding in the offline dataset. In such ...
- VulnLLM-R: Specialized Reasoning LLM with Agent Scaffold for Vulnerability Detection : Abstract: We propose VulnLLM-R, the~\emph{first specialized reasoning LLM} for vulnerability detection. Our key insight is that LLMs can reason about program states and analyze the potential vulnerabi...
- Minimum Bayes Risk Decoding for Error Span Detection in Reference-Free Automatic Machine Translation Evaluation : Abstract: Error Span Detection (ESD) is a subtask of automatic machine translation evaluation that localizes error spans in translations and labels their severity. State-of-the-art generative ESD meth...
- MoCoRP: Modeling Consistent Relations between Persona and Response for Persona-based Dialogue : Abstract: As dialogue systems become increasingly important across various domains, a key challenge in persona-based dialogue is generating engaging and context-specific interactions while ensuring th...
- Toward More Reliable Artificial Intelligence: Reducing Hallucinations in Vision-Language Models : Abstract: Vision-language models (VLMs) frequently generate hallucinated content plausible but incorrect claims about image content. We propose a training-free self-correction framework enabling VLMs ...
- Dual-Stream Cross-Modal Representation Learning via Residual Semantic Decorrelation : Abstract: Cross-modal learning has become a fundamental paradigm for integrating heterogeneous information sources such as images, text, and structured attributes. However, multimodal representations ...
- Weighted Contrastive Learning for Anomaly-Aware Time-Series Forecasting : Abstract: Reliable forecasting of multivariate time series under anomalous conditions is crucial in applications such as ATM cash logistics, where sudden demand shifts can disrupt operations. Modern d...
- R2MF-Net: A Recurrent Residual Multi-Path Fusion Network for Robust Multi-directional Spine X-ray Segmentation : Abstract: Accurate segmentation of spinal structures in X-ray images is a prerequisite for quantitative scoliosis assessment, including Cobb angle measurement, vertebral translation estimation and cur...
- Complementary Learning Approach for Text Classification using Large Language Models : Abstract: In this study, we propose a structured methodology that utilizes large language models (LLMs) in a cost-efficient and parsimonious manner, integrating the strengths of scholars and machines ...
- Metric-Fair Prompting: Treating Similar Samples Similarly : Abstract: We introduce \emph{Metric-Fair Prompting}, a fairness-aware prompting framework that guides large language models (LLMs) to make decisions under metric-fairness constraints. In the applicati...
- PCMind-2.1-Kaiyuan-2B Technical Report : Abstract: The rapid advancement of Large Language Models (LLMs) has resulted in a significant knowledge gap between the open-source community and industry, primarily because the latter relies on close...
- Time Series Foundation Models for Process Model Forecasting : Abstract: Process Model Forecasting (PMF) aims to predict how the control-flow structure of a process evolves over time by modeling the temporal dynamics of directly-follows (DF) relations, complement...
- Incorporating Structure and Chord Constraints in Symbolic Transformer-based Melodic Harmonization : Abstract: Transformer architectures offer significant advantages regarding the generation of symbolic music; their capabilities for incorporating user preferences toward what they generate is being st...
- A Mathematical Theory of Top-$k$ Sparse Attention via Total Variation Distance : Abstract: We develop a unified mathematical framework for certified Top-$k$ attention truncation that quantifies approximation error at both the distribution and output levels. For a single attention ...
- An AI-Powered Autonomous Underwater System for Sea Exploration and Scientific Research : Abstract: Traditional sea exploration faces significant challenges due to extreme conditions, limited visibility, and high costs, resulting in vast unexplored ocean regions. This paper presents an inn...
- DIST-CLIP: Arbitrary Metadata and Image Guided MRI Harmonization via Disentangled Anatomy-Contrast Representations : Abstract: Deep learning holds immense promise for transforming medical image analysis, yet its clinical generalization remains profoundly limited. A major barrier is data heterogeneity. This is partic...
- When Large Language Models Do Not Work: Online Incivility Prediction through Graph Neural Networks : Abstract: Online incivility has emerged as a widespread and persistent problem in digital communities, imposing substantial social and psychological burdens on users. Although many platforms attempt t...
- Guiding What Not to Generate: Automated Negative Prompting for Text-Image Alignment : Abstract: Despite substantial progress in text-to-image generation, achieving precise text-image alignment remains challenging, particularly for prompts with rich compositional structure or imaginativ...
- In-Context and Few-Shots Learning for Forecasting Time Series Data based on Large Language Models : Abstract: Existing data-driven approaches in modeling and predicting time series data include ARIMA (Autoregressive Integrated Moving Average), Transformer-based models, LSTM (Long Short-Term Memory) ...
- Enabling Delayed-Full Charging Through Transformer-Based Real-Time-to-Departure Modeling for EV Battery Longevity : Abstract: Electric vehicles (EVs) are key to sustainable mobility, yet their lithium-ion batteries (LIBs) degrade more rapidly under prolonged high states of charge (SOC). This can be mitigated by del...
- The Native Spiking Microarchitecture: From Iontronic Primitives to Bit-Exact FP8 Arithmetic : Abstract: The 2025 Nobel Prize in Chemistry for Metal-Organic Frameworks (MOFs) and recent breakthroughs by Huanting Wang's team at Monash University establish angstrom-scale channels as promising pos...
- Improving action classification with brain-inspired deep networks : Abstract: Action recognition is also key for applications ranging from robotics to healthcare monitoring. Action information can be extracted from the body pose and movements, as well as from the back...
- SAVE: Sparse Autoencoder-Driven Visual Information Enhancement for Mitigating Object Hallucination : Abstract: Although Multimodal Large Language Models (MLLMs) have advanced substantially, they remain vulnerable to object hallucination caused by language priors and visual information loss. To addres...
- Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support : Abstract: LLM-based agents are rapidly being plugged into expert decision-support, yet in messy, high-stakes settings they rarely make the team smarter: human-AI teams often underperform the best indi...
- Group Representational Position Encoding : Abstract: We present GRAPE (Group RepresentAtional Position Encoding), a unified framework for positional encoding based on group actions. GRAPE brings together two families of mechanisms: (i) multipl...
- Understanding Privacy Risks in Code Models Through Training Dynamics: A Causal Approach : Abstract: Large language models for code (LLM4Code) have greatly improved developer productivity but also raise privacy concerns due to their reliance on open-source repositories containing abundant p...
- Provable Long-Range Benefits of Next-Token Prediction : Abstract: Why do modern language models, trained to do well on next-word prediction, appear to generate coherent documents and capture long-range structure? Here we show that next-token prediction is ...
- WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling : Abstract: Recent video generators achieve striking photorealism, yet remain fundamentally inconsistent in 3D. We present WorldReel, a 4D video generator that is natively spatio-temporally consistent. ...
- One Layer Is Enough: Adapting Pretrained Visual Encoders for Image Generation : Abstract: Visual generative models (e.g., diffusion models) typically operate in compressed latent spaces to balance training efficiency and sample quality. In parallel, there has been growing interes...
- Relational Visual Similarity : Abstract: Humans do not just see attribute similarity -- we also see relational similarity. An apple is like a peach because both are reddish fruit, but the Earth is also like a peach: its crust, mant...
- When Gender is Hard to See: Multi-Attribute Support for Long-Range Recognition : Abstract: Accurate gender recognition from extreme long-range imagery remains a challenging problem due to limited spatial resolution, viewpoint variability, and loss of facial cues. For such purpose,...
- Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices : Abstract: Large language models (LLMs) are increasingly deployed on edge devices. To meet strict resource constraints, real-world deployment has pushed LLM quantization from 8-bit to 4-bit, 2-bit, and...
- Instance Dependent Testing of Samplers using Interval Conditioning : Abstract: Sampling algorithms play a pivotal role in probabilistic AI. However, verifying if a sampler program indeed samples from the claimed distribution is a notoriously hard problem. Provably corr...
- Why Goal-Conditioned Reinforcement Learning Works: Relation to Dual Control : Abstract: Goal-conditioned reinforcement learning (RL) concerns the problem of training an agent to maximize the probability of reaching target goal states. This paper presents an analysis of the goal...
- Classifying German Language Proficiency Levels Using Large Language Models : Abstract: Assessing language proficiency is essential for education, as it enables instruction tailored to learners needs. This paper investigates the use of Large Language Models (LLMs) for automatic...
- PRIMRose: Insights into the Per-Residue Energy Metrics of Proteins with Double InDel Mutations using Deep Learning : Abstract: Understanding how protein mutations affect protein structure is essential for advancements in computational biology and bioinformatics. We introduce PRIMRose, a novel approach that predicts ...
- Method of UAV Inspection of Photovoltaic Modules Using Thermal and RGB Data Fusion : Abstract: The subject of this research is the development of an intelligent, integrated framework for the automated inspection of photovoltaic (PV) infrastructure that addresses the critical shortcomi...
- AI as "Co-founder": GenAI for Entrepreneurship : Abstract: This paper studies whether, how, and for whom generative artificial intelligence (GenAI) facilitates firm creation. Our identification strategy exploits the November 2022 release of ChatGPT ...
- ShadowWolf -- Automatic Labelling, Evaluation and Model Training Optimised for Camera Trap Wildlife Images : Abstract: The continuous growth of the global human population is leading to the expansion of human habitats, resulting in decreasing wildlife spaces and increasing human-wildlife interactions. These ...
- Novel Deep Learning Architectures for Classification and Segmentation of Brain Tumors from MRI Images : Abstract: Brain tumors pose a significant threat to human life, therefore it is very much necessary to detect them accurately in the early stages for better diagnosis and treatment. Brain tumors can b...
- Beyond Token-level Supervision: Unlocking the Potential of Decoding-based Regression via Reinforcement Learning : Abstract: Decoding-based regression, which reformulates regression as a sequence generation task, has emerged as a promising paradigm of applying large language models for numerical prediction. Howeve...
- A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation : Abstract: Decoupled loss has been a successful reinforcement learning (RL) algorithm to deal with the high data staleness under the asynchronous RL setting. Decoupled loss improves coupled-loss style ...
- BEACON: A Unified Behavioral-Tactical Framework for Explainable Cybercrime Analysis with Large Language Models : Abstract: Cybercrime increasingly exploits human cognitive biases in addition to technical vulnerabilities, yet most existing analytical frameworks focus primarily on operational aspects and overlook ...
- Securing the Model Context Protocol: Defending LLMs Against Tool Poisoning and Adversarial Attacks : Abstract: The Model Context Protocol (MCP) enables Large Language Models to integrate external tools through structured descriptors, increasing autonomy in decision-making, task execution, and multi-a...
- SUGAR: A Sweeter Spot for Generative Unlearning of Many Identities : Abstract: Recent advances in 3D-aware generative models have enabled high-fidelity image synthesis of human identities. However, this progress raises urgent questions around user consent and the abili...
- Deep Manifold Part 2: Neural Network Mathematics : Abstract: This work develops the global equations of neural networks through stacked piecewise manifolds, fixed-point theory, and boundary-conditioned iteration. Once fixed coordinates and operators a...
- QL-LSTM: A Parameter-Efficient LSTM for Stable Long-Sequence Modeling : Abstract: Recurrent neural architectures such as LSTM and GRU remain widely used in sequence modeling, but they continue to face two core limitations: redundant gate-specific parameters and reduced ab...
- Towards Efficient Hypergraph and Multi-LLM Agent Recommender Systems : Abstract: Recommender Systems (RSs) have become the cornerstone of various applications such as e-commerce and social media platforms. The evolution of RSs is paramount in the digital era, in which pe...
- Beyond Satisfaction: From Placebic to Actionable Explanations For Enhanced Understandability : Abstract: Explainable AI (XAI) presents useful tools to facilitate transparency and trustworthiness in machine learning systems. However, current evaluations of system explainability often rely heavil...
- ChargingBoul: A Competitive Negotiating Agent with Novel Opponent Modeling : Abstract: Automated negotiation has emerged as a critical area of research in multiagent systems, with applications spanning e-commerce, resource allocation, and autonomous decision-making. This paper...
- Memory Power Asymmetry in Human-AI Relationships: Preserving Mutual Forgetting in the Digital Age : Abstract: As artificial intelligence (AI) becomes embedded in personal and professional relationships, a new kind of power imbalance emerges from asymmetric memory capabilities. Human relationships ha...
- Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution : Abstract: Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In...
- Financial Fraud Identification and Interpretability Study for Listed Companies Based on Convolutional Neural Network : Abstract: Since the emergence of joint-stock companies, financial fraud by listed firms has repeatedly undermined capital markets. Fraud is difficult to detect because of covert tactics and the high l...
- Adaptive Test-Time Training for Predicting Need for Invasive Mechanical Ventilation in Multi-Center Cohorts : Abstract: Accurate prediction of the need for invasive mechanical ventilation (IMV) in intensive care units (ICUs) patients is crucial for timely interventions and resource allocation. However, variab...
- GSAE: Graph-Regularized Sparse Autoencoders for Robust LLM Safety Steering : Abstract: Large language models (LLMs) face critical safety challenges, as they can be manipulated to generate harmful content through adversarial prompts and jailbreak attacks. Many defenses are typi...
- TextMamba: Scene Text Detector with Mamba : Abstract: In scene text detection, Transformer-based methods have addressed the global feature extraction limitations inherent in traditional convolution neural network-based methods. However, most di...
- Towards Small Language Models for Security Query Generation in SOC Workflows : Abstract: Analysts in Security Operations Centers routinely query massive telemetry streams using Kusto Query Language (KQL). Writing correct KQL requires specialized expertise, and this dependency cr...
- Rethinking Robustness: A New Approach to Evaluating Feature Attribution Methods : Abstract: This paper studies the robustness of feature attribution methods for deep neural networks. It challenges the current notion of attributional robustness that largely ignores the difference in...
- GradientSpace: Unsupervised Data Clustering for Improved Instruction Tuning : Abstract: Instruction tuning is one of the key steps required for adapting large language models (LLMs) to a broad spectrum of downstream applications. However, this procedure is difficult because rea...
- Mechanistic Interpretability of GPT-2: Lexical and Contextual Layers in Sentiment Analysis : Abstract: We present a mechanistic interpretability study of GPT-2 that causally examines how sentiment information is processed across its transformer layers. Using systematic activation patching acr...
- Predictive Modeling of I/O Performance for Machine Learning Training Pipelines: A Data-Driven Approach to Storage Optimization : Abstract: Modern machine learning training is increasingly bottlenecked by data I/O rather than compute. GPUs often sit idle at below 50% utilization waiting for data. This paper presents a machine le...
- A Novel Multimodal RUL Framework for Remaining Useful Life Estimation with Layer-wise Explanations : Abstract: Estimating the Remaining Useful Life (RUL) of mechanical systems is pivotal in Prognostics and Health Management (PHM). Rolling-element bearings are among the most frequent causes of machine...
- A Novel Deep Neural Network Architecture for Real-Time Water Demand Forecasting : Abstract: Short-term water demand forecasting (StWDF) is the foundation stone in the derivation of an optimal plan for controlling water supply systems. Deep learning (DL) approaches provide the most ...
- The Role of Entropy in Visual Grounding: Analysis and Optimization : Abstract: Recent advances in fine-tuning multimodal large language models (MLLMs) using reinforcement learning have achieved remarkable progress, particularly with the introduction of various entropy ...
- "The Dentist is an involved parent, the bartender is not": Revealing Implicit Biases in QA with Implicit BBQ : Abstract: Existing benchmarks evaluating biases in large language models (LLMs) primarily rely on explicit cues, declaring protected attributes like religion, race, gender by name. However, real-world...
- A Patient-Doctor-NLP-System to contest inequality for less privileged : Abstract: Transfer Learning (TL) has accelerated the rapid development and availability of large language models (LLMs) for mainstream natural language processing (NLP) use cases. However, training an...
- Arc Gradient Descent: A Mathematically Derived Reformulation of Gradient Descent with Phase-Aware, User-Controlled Step Dynamics : Abstract: The paper presents the formulation, implementation, and evaluation of the ArcGD optimiser. The evaluation is conducted initially on a non-convex benchmark function and subsequently on a real...
- Task-Model Alignment: A Simple Path to Generalizable AI-Generated Image Detection : Abstract: Vision Language Models (VLMs) are increasingly adopted for AI-generated images (AIGI) detection, yet converting VLMs into detectors requires substantial resource, while the resulting models ...
- PrivLLMSwarm: Privacy-Preserving LLM-Driven UAV Swarms for Secure IoT Surveillance : Abstract: Large Language Models (LLMs) are emerging as powerful enablers for autonomous reasoning and natural-language coordination in unmanned aerial vehicle (UAV) swarms operating within Internet of...
- Becoming Experienced Judges: Selective Test-Time Learning for Evaluators : Abstract: Automatic evaluation with large language models, commonly known as LLM-as-a-judge, is now standard across reasoning and alignment tasks. Despite evaluating many samples in deployment, these ...
- VisChainBench: A Benchmark for Multi-Turn, Multi-Image Visual Reasoning Beyond Language Priors : Abstract: Understanding multi-image, multi-turn scenarios is a critical yet underexplored capability for Large Vision-Language Models (LVLMs). Existing benchmarks predominantly focus on static or hori...
- Stitch and Tell: A Structured Multimodal Data Augmentation Method for Spatial Understanding : Abstract: Existing vision-language models often suffer from spatial hallucinations, i.e., generating incorrect descriptions about the relative positions of objects in an image. We argue that this prob...
- RDSplat: Robust Watermarking Against Diffusion Editing for 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting (3DGS) has enabled the creation of digital assets and downstream applications, underscoring the need for robust copyright protection via digital watermarking. However, ...
- From Next-Token to Next-Block: A Principled Adaptation Path for Diffusion LLMs : Abstract: Large language models (LLMs) excel at generation but dominant autoregressive (AR) decoding is inherently sequential, creating a throughput bottleneck. Diffusion Language Models (DLMs)--espec...
- From Description to Score: Can LLMs Quantify Vulnerabilities? : Abstract: Manual vulnerability scoring, such as assigning Common Vulnerability Scoring System (CVSS) scores, is a resource-intensive process that is often influenced by subjective interpretation. This...
- Angular Regularization for Positive-Unlabeled Learning on the Hypersphere : Abstract: Positive-Unlabeled (PU) learning addresses classification problems where only a subset of positive examples is labeled and the remaining data is unlabeled, making explicit negative supervisi...
- Optimal and Diffusion Transports in Machine Learning : Abstract: Several problems in machine learning are naturally expressed as the design and analysis of time-evolving probability distributions. This includes sampling via diffusion methods, optimizing t...
- RMAdapter: Reconstruction-based Multi-Modal Adapter for Vision-Language Models : Abstract: Pre-trained Vision-Language Models (VLMs), \textit{e.g.} CLIP, have become essential tools in multimodal transfer learning. However, fine-tuning VLMs in few-shot scenarios poses significant ...
- Partial Inverse Design of High-Performance Concrete Using Cooperative Neural Networks for Constraint-Aware Mix Generation : Abstract: High-performance concrete offers exceptional strength and durability but requires complex mix designs involving many interdependent variables and practical constraints. While data-driven met...
- CAuSE: Decoding Multimodal Classifiers using Faithful Natural Language Explanation : Abstract: Multimodal classifiers function as opaque black box models. While several techniques exist to interpret their predictions, very few of them are as intuitive and accessible as natural languag...
- Leveraging LLMs to support co-evolution between definitions and instances of textual DSLs : Abstract: Software languages evolve over time for various reasons, such as the addition of new features. When the language's grammar definition evolves, textual instances that originally conformed to ...
- Formal that "Floats" High: Formal Verification of Floating Point Arithmetic : Abstract: Formal verification of floating-point arithmetic remains challenging due to non-linear arithmetic behavior and the tight coupling between control and datapath logic. Existing approaches ofte...
- ArchPower: Dataset for Architecture-Level Power Modeling of Modern CPU Design : Abstract: Power is the primary design objective of large-scale integrated circuits (ICs), especially for complex modern processors (i.e., CPUs). Accurate CPU power evaluation requires designers to go ...
- Less Is More, but Where? Dynamic Token Compression via LLM-Guided Keyframe Prior : Abstract: Recent advances in Video Large Language Models (VLLMs) have achieved remarkable video understanding capabilities, yet face critical efficiency bottlenecks due to quadratic computational grow...
- WisPaper: Your AI Scholar Search Engine : Abstract: Researchers struggle to efficiently locate and manage relevant literature within the exponentially growing body of scientific publications. We present \textsc{WisPaper}, an intelligent acade...
- JoPano: Unified Panorama Generation via Joint Modeling : Abstract: Panorama generation has recently attracted growing interest in the research community, with two core tasks, text-to-panorama and view-to-panorama generation. However, existing methods still ...
- BabelCoder: Agentic Code Translation with Specification Alignment : Abstract: As software systems evolve, developers increasingly work across multiple programming languages and often face the need to migrate code from one language to another. While automatic code tran...
- SoK: Trust-Authorization Mismatch in LLM Agent Interactions : Abstract: Large Language Models (LLMs) are rapidly evolving into autonomous agents capable of interacting with the external world, significantly expanding their capabilities through standardized inter...
- NeuroABench: A Multimodal Evaluation Benchmark for Neurosurgical Anatomy Identification : Abstract: Multimodal Large Language Models (MLLMs) have shown significant potential in surgical video understanding. With improved zero-shot performance and more effective human-machine interaction, t...
- Deep Reinforcement Learning for Phishing Detection with Transformer-Based Semantic Features : Abstract: Phishing is a cybercrime in which individuals are deceived into revealing personal information, often resulting in financial loss. These attacks commonly occur through fraudulent messages, m...
- Evaluating the Sensitivity of BiLSTM Forecasting Models to Sequence Length and Input Noise : Abstract: Deep learning (DL) models, a specialized class of multilayer neural networks, have become central to time-series forecasting in critical domains such as environmental monitoring and the Inte...
- Adaptive Normalization Mamba with Multi Scale Trend Decomposition and Patch MoE Encoding : Abstract: Time series forecasting in real world environments faces significant challenges non stationarity, multi scale temporal patterns, and distributional shifts that degrade model stability and ac...
- Hidden Leaks in Time Series Forecasting: How Data Leakage Affects LSTM Evaluation Across Configurations and Validation Strategies : Abstract: Deep learning models, particularly Long Short-Term Memory (LSTM) networks, are widely used in time series forecasting due to their ability to capture complex temporal dependencies. However, ...
- A Unifying Human-Centered AI Fairness Framework : Abstract: The increasing use of Artificial Intelligence (AI) in critical societal domains has amplified concerns about fairness, particularly regarding unequal treatment across sensitive attributes su...
- Task adaptation of Vision-Language-Action model: 1st Place Solution for the 2025 BEHAVIOR Challenge : Abstract: We present a vision-action policy that won 1st place in the 2025 BEHAVIOR Challenge - a large-scale benchmark featuring 50 diverse long-horizon household tasks in photo-realistic simulation,...
- VideoVLA: Video Generators Can Be Generalizable Robot Manipulators : Abstract: Generalization in robot manipulation is essential for deploying robots in open-world environments and advancing toward artificial general intelligence. While recent Vision-Language-Action (V...
- Comparing BFGS and OGR for Second-Order Optimization : Abstract: Estimating the Hessian matrix, especially for neural network training, is a challenging problem due to high dimensionality and cost. In this work, we compare the classical Sherman-Morrison u...
- Flash Multi-Head Feed-Forward Network : Abstract: We explore Multi-Head FFN (MH-FFN) as a replacement of FFN in the Transformer architecture, motivated by the structural similarity between single-head attention and FFN. While multi-head mec...
- Prompting-in-a-Series: Psychology-Informed Contents and Embeddings for Personality Recognition With Decoder-Only Models : Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various natural language processing tasks. This research introduces a novel "Prompting-in-a-Series" algorithm, t...
- Singing Timbre Popularity Assessment Based on Multimodal Large Foundation Model : Abstract: Automated singing assessment is crucial for education and entertainment. However, existing systems face two fundamental limitations: reliance on reference tracks, which stifles creative expr...
- Benchmarking Deep Neural Networks for Modern Recommendation Systems : Abstract: This paper examines the deployment of seven different neural network architectures CNN, RNN, GNN, Autoencoder, Transformer, NCF, and Siamese Networks on three distinct datasets: Retail E-com...
- Multi-Accent Mandarin Dry-Vocal Singing Dataset: Benchmark for Singing Accent Recognition : Abstract: Singing accent research is underexplored compared to speech accent studies, primarily due to the scarcity of suitable datasets. Existing singing datasets often suffer from detail loss, frequ...
- Optimizing video analytics inference pipelines: a case study : Abstract: Cost-effective and scalable video analytics are essential for precision livestock monitoring, where high-resolution footage and near-real-time monitoring needs from commercial farms generate...
- FVA-RAG: Falsification-Verification Alignment for Mitigating Sycophantic Hallucinations : Abstract: Retrieval-Augmented Generation (RAG) systems have significantly reduced hallucinations in Large Language Models (LLMs) by grounding responses in external context. However, standard RAG archi...
- Latency-Response Theory Model: Evaluating Large Language Models via Response Accuracy and Chain-of-Thought Length : Abstract: The proliferation of Large Language Models (LLMs) necessitates valid evaluation methods to provide guidance for both downstream applications and actionable future improvements. The Item Resp...
- Transferring Clinical Knowledge into ECGs Representation : Abstract: Deep learning models have shown high accuracy in classifying electrocardiograms (ECGs), but their black box nature hinders clinical adoption due to a lack of trust and interpretability. To a...
- Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization : Abstract: Bug localization remains a critical yet time-consuming challenge in large-scale software repositories. Traditional information retrieval-based bug localization (IRBL) methods rely on unchang...
- A Comprehensive Study of Supervised Machine Learning Models for Zero-Day Attack Detection: Analyzing Performance on Imbalanced Data : Abstract: Among the various types of cyberattacks, identifying zero-day attacks is problematic because they are unknown to security systems as their pattern and characteristics do not match known blac...
- Power of Boundary and Reflection: Semantic Transparent Object Segmentation using Pyramid Vision Transformer with Transparent Cues : Abstract: Glass is a prevalent material among solid objects in everyday life, yet segmentation methods struggle to distinguish it from opaque materials due to its transparency and reflection. While it...
- DAUNet: A Lightweight UNet Variant with Deformable Convolutions and Parameter-Free Attention for Medical Image Segmentation : Abstract: Medical image segmentation plays a pivotal role in automated diagnostic and treatment planning systems. In this work, we present DAUNet, a novel lightweight UNet variant that integrates Defo...
- $\mathrm{D}^{\mathrm{3}}$-Predictor: Noise-Free Deterministic Diffusion for Dense Prediction : Abstract: Although diffusion models with strong visual priors have emerged as powerful dense prediction backboens, they overlook a core limitation: the stochastic noise at the core of diffusion sampli...
- Self-Supervised Learning on Molecular Graphs: A Systematic Investigation of Masking Design : Abstract: Self-supervised learning (SSL) plays a central role in molecular representation learning. Yet, many recent innovations in masking-based pretraining are introduced as heuristics and lack prin...
- Procrustean Bed for AI-Driven Retrosynthesis: A Unified Framework for Reproducible Evaluation : Abstract: Progress in computer-aided synthesis planning (CASP) is obscured by the lack of standardized evaluation infrastructure and the reliance on metrics that prioritize topological completion over...
- ThinkTrap: Denial-of-Service Attacks against Black-box LLM Services via Infinite Thinking : Abstract: Large Language Models (LLMs) have become foundational components in a wide range of applications, including natural language understanding and generation, embodied intelligence, and scientif...
- Leveraging KV Similarity for Online Structured Pruning in LLMs : Abstract: Pruning has emerged as a promising direction for accelerating large language model (LLM) inference, yet existing approaches often suffer from instability because they rely on offline calibra...
- The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models : Abstract: Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tu...
- FOAM: Blocked State Folding for Memory-Efficient LLM Training : Abstract: Large language models (LLMs) have demonstrated remarkable performance due to their large parameter counts and extensive training data. However, their scale leads to significant memory bottle...
- RisConFix: LLM-based Automated Repair of Risk-Prone Drone Configurations : Abstract: Flight control software is typically designed with numerous configurable parameters governing multiple functionalities, enabling flexible adaptation to mission diversity and environmental un...
- DART: Leveraging Multi-Agent Disagreement for Tool Recruitment in Multimodal Reasoning : Abstract: Specialized visual tools can augment large language models or vision language models with expert knowledge (e.g., grounding, spatial reasoning, medical knowledge, etc.), but knowing which to...
- Decouple to Generalize: Context-First Self-Evolving Learning for Data-Scarce Vision-Language Reasoning : Abstract: Recent vision-language models (VLMs) achieve remarkable reasoning through reinforcement learning (RL), which provides a feasible solution for realizing continuous self-evolving large vision-...
- JT-DA: Enhancing Data Analysis with Tool-Integrated Table Reasoning Large Language Models : Abstract: In this work, we present JT-DA-8B (JiuTian Data Analyst 8B), a specialized large language model designed for complex table reasoning tasks across diverse real-world scenarios. To address the...
- Do Persona-Infused LLMs Affect Performance in a Strategic Reasoning Game? : Abstract: Although persona prompting in large language models appears to trigger different styles of generated text, it is unclear whether these translate into measurable behavioral differences, much ...
- On Memory: A comparison of memory mechanisms in world models : Abstract: World models enable agents to plan within imagined environments by predicting future states conditioned on past observations and actions. However, their ability to plan over long horizons is...
- Utilizing Multi-Agent Reinforcement Learning with Encoder-Decoder Architecture Agents to Identify Optimal Resection Location in Glioblastoma Multiforme Patients : Abstract: Currently, there is a noticeable lack of AI in the medical field to support doctors in treating heterogenous brain tumors such as Glioblastoma Multiforme (GBM), the deadliest human cancer in...
- ClinNoteAgents: An LLM Multi-Agent System for Predicting and Interpreting Heart Failure 30-Day Readmission from Clinical Notes : Abstract: Heart failure (HF) is one of the leading causes of rehospitalization among older adults in the United States. Although clinical notes contain rich, detailed patient information and make up a...
- VIGIL: A Reflective Runtime for Self-Healing Agents : Abstract: Agentic LLM frameworks promise autonomous behavior via task decomposition, tool use, and iterative planning, but most deployed systems remain brittle. They lack runtime introspection, cannot...
- A Neural Affinity Framework for Abstract Reasoning: Diagnosing the Compositional Gap in Transformer Architectures via Procedural Task Taxonomy : Abstract: Responding to Hodel et al.'s (2024) call for a formal definition of task relatedness in re-arc, we present the first 9-category taxonomy of all 400 tasks, validated at 97.5% accuracy via rul...
- ContextualSHAP : Enhancing SHAP Explanations Through Contextual Language Generation : Abstract: Explainable Artificial Intelligence (XAI) has become an increasingly important area of research, particularly as machine learning models are deployed in high-stakes domains. Among various XA...
- PICKT: Practical Interlinked Concept Knowledge Tracing for Personalized Learning using Knowledge Map Concept Relations : Abstract: With the recent surge in personalized learning, Intelligent Tutoring Systems (ITS) that can accurately track students' individual knowledge states and provide tailored learning paths based o...
- Sample from What You See: Visuomotor Policy Learning via Diffusion Bridge with Observation-Embedded Stochastic Differential Equation : Abstract: Imitation learning with diffusion models has advanced robotic control by capturing multi-modal action distributions. However, existing approaches typically treat observations as high-level c...
- Cross-platform Product Matching Based on Entity Alignment of Knowledge Graph with RAEA model : Abstract: Product matching aims to identify identical or similar products sold on different platforms. By building knowledge graphs (KGs), the product matching problem can be converted to the Entity A...
- M-STAR: Multi-Scale Spatiotemporal Autoregression for Human Mobility Modeling : Abstract: Modeling human mobility is vital for extensive applications such as transportation planning and epidemic modeling. With the rise of the Artificial Intelligence Generated Content (AIGC) parad...
- A Geometric Unification of Concept Learning with Concept Cones : Abstract: Two traditions of interpretability have evolved side by side but seldom spoken to each other: Concept Bottleneck Models (CBMs), which prescribe what a concept should be, and Sparse Autoencod...
- LocalSearchBench: Benchmarking Agentic Search in Real-World Local Life Services : Abstract: Recent advances in large reasoning models (LRMs) have enabled agentic search systems to perform complex multi-step reasoning across multiple sources. However, most studies focus on general i...
- How Do LLMs Fail In Agentic Scenarios? A Qualitative Analysis of Success and Failure Scenarios of Various LLMs in Agentic Simulations : Abstract: We investigate how large language models (LLMs) fail when operating as autonomous agents with tool-use capabilities. Using the Kamiwaza Agentic Merit Index (KAMI) v0.1 benchmark, we analyze ...
- Comparative Analysis and Parametric Tuning of PPO, GRPO, and DAPO for LLM Reasoning Enhancement : Abstract: This study presents a systematic comparison of three Reinforcement Learning (RL) algorithms (PPO, GRPO, and DAPO) for improving complex reasoning in large language models (LLMs). Our main co...
- The Agent Capability Problem: Predicting Solvability Through Information-Theoretic Bounds : Abstract: When should an autonomous agent commit resources to a task? We introduce the Agent Capability Problem (ACP), a framework for predicting whether an agent can solve a problem under resource co...
- Each Prompt Matters: Scaling Reinforcement Learning Without Wasting Rollouts on Hundred-Billion-Scale MoE : Abstract: We present CompassMax-V3-Thinking, a hundred-billion-scale MoE reasoning model trained with a new RL framework built on one principle: each prompt must matter. Scaling RL to this size expose...
- RL-MTJail: Reinforcement Learning for Automated Black-Box Multi-Turn Jailbreaking of Large Language Models : Abstract: Large language models are vulnerable to jailbreak attacks, threatening their safe deployment in real-world applications. This paper studies black-box multi-turn jailbreaks, aiming to train a...
- ReasonBENCH: Benchmarking the (In)Stability of LLM Reasoning : Abstract: Large language models (LLMs) are increasingly deployed in settings where reasoning, such as multi-step problem solving and chain-of-thought, is essential. Yet, current evaluation practices o...
- Large Causal Models from Large Language Models : Abstract: We introduce a new paradigm for building large causal models (LCMs) that exploits the enormous potential latent in today's large language models (LLMs). We describe our ongoing experiments w...
- Auditing Games for Sandbagging : Abstract: Future AI systems could conceal their capabilities ('sandbagging') during evaluations, potentially misleading developers and auditors. We stress-tested sandbagging detection techniques using...
- Video Models Start to Solve Chess, Maze, Sudoku, Mental Rotation, and Raven' Matrices : Abstract: We show that video generation models could reason now. Testing on tasks such as chess, maze, Sudoku, mental rotation, and Raven's Matrices, leading models such as Sora-2 achieve sixty percen...
- A Multi-objective Optimization Approach for Feature Selection in Gentelligent Systems : Abstract: The integration of advanced technologies, such as Artificial Intelligence (AI), into manufacturing processes is attracting significant attention, paving the way for the development of intell...
- Accelerating Materials Discovery: Learning a Universal Representation of Chemical Processes for Cross-Domain Property Prediction : Abstract: Experimental validation of chemical processes is slow and costly, limiting exploration in materials discovery. Machine learning can prioritize promising candidates, but existing data in pate...
- FlockVote: LLM-Empowered Agent-Based Modeling for Simulating U.S. Presidential Elections : Abstract: Modeling complex human behavior, such as voter decisions in national elections, is a long-standing challenge for computational social science. Traditional agent-based models (ABMs) are limit...
- Adaptive Dataset Quantization: A New Direction for Dataset Pruning : Abstract: This paper addresses the challenges of storage and communication costs for large-scale datasets in resource-constrained edge devices by proposing a novel dataset quantization approach to red...
- VG3T: Visual Geometry Grounded Gaussian Transformer : Abstract: Generating a coherent 3D scene representation from multi-view images is a fundamental yet challenging task. Existing methods often struggle with multi-view fusion, leading to fragmented 3D r...
- Domain-Specific Foundation Model Improves AI-Based Analysis of Neuropathology : Abstract: Foundation models have transformed computational pathology by providing generalizable representations from large-scale histology datasets. However, existing models are predominantly trained ...
- KidSpeak: A General Multi-purpose LLM for Kids' Speech Recognition and Screening : Abstract: With the rapid advancement of conversational and diffusion-based AI, there is a growing adoption of AI in educational services, ranging from grading and assessment tools to personalized lear...
- POrTAL: Plan-Orchestrated Tree Assembly for Lookahead : Abstract: Assigning tasks to robots often involves supplying the robot with an overarching goal, such as through natural language, and then relying on the robot to uncover and execute a plan to achiev...
- Simple Agents Outperform Experts in Biomedical Imaging Workflow Optimization : Abstract: Adapting production-level computer vision tools to bespoke scientific datasets is a critical "last mile" bottleneck. Current solutions are impractical: fine-tuning requires large annotated d...
- Uncovering Students' Inquiry Patterns in GenAI-Supported Clinical Practice: An Integration of Epistemic Network Analysis and Sequential Pattern Mining : Abstract: Assessment of medication history-taking has traditionally relied on human observation, limiting scalability and detailed performance data. While Generative AI (GenAI) platforms enable extens...
- PrefGen: Multimodal Preference Learning for Preference-Conditioned Image Generation : Abstract: Preference-conditioned image generation seeks to adapt generative models to individual users, producing outputs that reflect personal aesthetic choices beyond the given textual prompt. Despi...
- The SAM2-to-SAM3 Gap in the Segment Anything Model Family: Why Prompt-Based Expertise Fails in Concept-Driven Image Segmentation : Abstract: This paper investigates the fundamental discontinuity between the latest two Segment Anything Models: SAM2 and SAM3. We explain why the expertise in prompt-based segmentation of SAM2 does no...
- Physics-Guided Deepfake Detection for Voice Authentication Systems : Abstract: Voice authentication systems deployed at the network edge face dual threats: a) sophisticated deepfake synthesis attacks and b) control-plane poisoning in distributed federated learning prot...
- Auto-SPT: Automating Semantic Preserving Transformations for Code : Abstract: Machine learning (ML) models for code clone detection determine whether two pieces of code are semantically equivalent, which in turn is a key building block for software-engineering tasks l...
- Beyond Prototyping: Autonomous, Enterprise-Grade Frontend Development from Pixel to Production via a Specialized Multi-Agent Framework : Abstract: We present AI4UI, a framework of autonomous front-end development agents purpose-built to meet the rigorous requirements of enterprise-grade application delivery. Unlike general-purpose code...
- The Road of Adaptive AI for Precision in Cybersecurity : Abstract: Cybersecurity's evolving complexity presents unique challenges and opportunities for AI research and practice. This paper shares key lessons and insights from designing, building, and operat...
- Reinforcement Learning Integrated Agentic RAG for Software Test Cases Authoring : Abstract: This paper introduces a framework that integrates reinforcement learning (RL) with autonomous agents to enable continuous improvement in the automated process of software test cases authorin...
- When Privacy Isn't Synthetic: Hidden Data Leakage in Generative AI Models : Abstract: Generative models are increasingly used to produce privacy-preserving synthetic data as a safe alternative to sharing sensitive training datasets. However, we demonstrate that such synthetic...
- EgoEdit: Dataset, Real-Time Streaming Model, and Benchmark for Egocentric Video Editing : Abstract: We study instruction-guided editing of egocentric videos for interactive AR applications. While recent AI video editors perform well on third-person footage, egocentric views present unique ...
- Empathy by Design: Aligning Large Language Models for Healthcare Dialogue : Abstract: General-purpose large language models (LLMs) have demonstrated remarkable generative and reasoning capabilities but remain limited in healthcare and caregiving applications due to two key de...
- JaxWildfire: A GPU-Accelerated Wildfire Simulator for Reinforcement Learning : Abstract: Artificial intelligence methods are increasingly being explored for managing wildfires and other natural hazards. In particular, reinforcement learning (RL) is a promising path towards impro...
- Explainable Melanoma Diagnosis with Contrastive Learning and LLM-based Report Generation : Abstract: Deep learning has demonstrated expert-level performance in melanoma classification, positioning it as a powerful tool in clinical dermatology. However, model opacity and the lack of interpre...
- Future You: Designing and Evaluating Multimodal AI-generated Digital Twins for Strengthening Future Self-Continuity : Abstract: What if users could meet their future selves today? AI-generated future selves simulate meaningful encounters with a digital twin decades in the future. As AI systems advance, combining clon...
- WAM-Flow: Parallel Coarse-to-Fine Motion Planning via Discrete Flow Matching for Autonomous Driving : Abstract: We introduce WAM-Flow, a vision-language-action (VLA) model that casts ego-trajectory planning as discrete flow matching over a structured token space. In contrast to autoregressive decoders...
- Toward Patch Robustness Certification and Detection for Deep Learning Systems Beyond Consistent Samples : Abstract: Patch robustness certification is an emerging kind of provable defense technique against adversarial patch attacks for deep learning systems. Certified detection ensures the detection of all...
- Physics-Informed Neural Koopman Machine for Interpretable Longitudinal Personalized Alzheimer's Disease Forecasting : Abstract: Early forecasting of individual cognitive decline in Alzheimer's disease (AD) is central to disease evaluation and management. Despite advances, it is as of yet challenging for existing meth...
- Learning Invariant Graph Representations Through Redundant Information : Abstract: Learning invariant graph representations for out-of-distribution (OOD) generalization remains challenging because the learned representations often retain spurious components. To address thi...
- DEFEND: Poisoned Model Detection and Malicious Client Exclusion Mechanism for Secure Federated Learning-based Road Condition Classification : Abstract: Federated Learning (FL) has drawn the attention of the Intelligent Transportation Systems (ITS) community. FL can train various models for ITS tasks, notably camera-based Road Condition Clas...
- Multi-Modal Zero-Shot Prediction of Color Trajectories in Food Drying : Abstract: Food drying is widely used to reduce moisture content, ensure safety, and extend shelf life. Color evolution of food samples is an important indicator of product quality in food drying. Alth...
- Do You Feel Comfortable? Detecting Hidden Conversational Escalation in AI Chatbots : Abstract: Large Language Models (LLM) are increasingly integrated into everyday interactions, serving not only as information assistants but also as emotional companions. Even in the absence of explic...
- Quantifying Memory Use in Reinforcement Learning with Temporal Range : Abstract: How much does a trained RL policy actually use its past observations? We propose \emph{Temporal Range}, a model-agnostic metric that treats first-order sensitivities of multiple vector outpu...
- Auto-exploration for online reinforcement learning : Abstract: The exploration-exploitation dilemma in reinforcement learning (RL) is a fundamental challenge to efficient RL algorithms. Existing algorithms for finite state and action discounted RL probl...
- DUET: Agentic Design Understanding via Experimentation and Testing : Abstract: AI agents powered by large language models (LLMs) are being used to solve increasingly complex software engineering challenges, but struggle with hardware design tasks. Register Transfer Lev...
- Convergence of Outputs When Two Large Language Models Interact in a Multi-Agentic Setup : Abstract: In this work, we report what happens when two large language models respond to each other for many turns without any outside input in a multi-agent setup. The setup begins with a short seed ...
- Who Will Top the Charts? Multimodal Music Popularity Prediction via Adaptive Fusion of Modality Experts and Temporal Engagement Modeling : Abstract: Predicting a song's commercial success prior to its release remains an open and critical research challenge for the music industry. Early prediction of music popularity informs strategic dec...
- Networked Restless Multi-Arm Bandits with Reinforcement Learning : Abstract: Restless Multi-Armed Bandits (RMABs) are a powerful framework for sequential decision-making, widely applied in resource allocation and intervention optimization challenges in public health....
- RefBench-PRO: Perceptual and Reasoning Oriented Benchmark for Referring Expression Comprehension : Abstract: Referring Expression Comprehension (REC) is a vision-language task that localizes a specific image region based on a textual description. Existing REC benchmarks primarily evaluate perceptua...
- Unleashing the Intrinsic Visual Representation Capability of Multimodal Large Language Models : Abstract: Multimodal Large Language Models (MLLMs) have demonstrated remarkable proficiency in multimodal tasks. Despite their impressive performance, MLLMs suffer from the modality imbalance issue, w...
- Entropic Confinement and Mode Connectivity in Overparameterized Neural Networks : Abstract: Modern neural networks exhibit a striking property: basins of attraction in the loss landscape are often connected by low-loss paths, yet optimization dynamics generally remain confined to a...
- Chemistry Integrated Language Model using Hierarchical Molecular Representation for Polymer Informatics : Abstract: Machine learning has transformed material discovery for inorganic compounds and small molecules, yet polymers remain largely inaccessible to these methods. While data scarcity is often cited...
- Degrading Voice: A Comprehensive Overview of Robust Voice Conversion Through Input Manipulation : Abstract: Identity, accent, style, and emotions are essential components of human speech. Voice conversion (VC) techniques process the speech signals of two input speakers and other modalities of auxi...
- Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation : Abstract: Human pose estimation focuses on predicting body keypoints to analyze human motion. Event cameras provide high temporal resolution and low latency, enabling robust estimation under challengi...
- When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models : Abstract: Reward models are central to Large Language Model (LLM) alignment within the framework of RLHF. The standard objective used in reward modeling is the Bradley-Terry (BT) loss, which learns fr...
- Why They Disagree: Decoding Differences in Opinions about AI Risk on the Lex Fridman Podcast : Abstract: The emergence of transformative technologies often surfaces deep societal divisions, nowhere more evident than in contemporary debates about artificial intelligence (AI). A striking feature ...
- Proportional integral derivative booster for neural networks-based time-series prediction: Case of water demand prediction : Abstract: Multi-step time-series prediction is an essential supportive step for decision-makers in several industrial areas. Artificial intelligence techniques, which use a neural network component in...
- Protecting Bystander Privacy via Selective Hearing in LALMs : Abstract: Large audio language models (LALMs) are increasingly deployed in real-world settings where they inevitably capture speech from unintended nearby bystanders, raising privacy risks that existi...
- Web Technologies Security in the AI Era: A Survey of CDN-Enhanced Defenses : Abstract: The modern web stack, which is dominated by browser-based applications and API-first backends, now operates under an adversarial equilibrium where automated, AI-assisted attacks evolve conti...
- RLAX: Large-Scale, Distributed Reinforcement Learning for Large Language Models on TPUs : Abstract: Reinforcement learning (RL) has emerged as the de-facto paradigm for improving the reasoning capabilities of large language models (LLMs). We have developed RLAX, a scalable RL framework on ...
- AgenticCyber: A GenAI-Powered Multi-Agent System for Multimodal Threat Detection and Adaptive Response in Cybersecurity : Abstract: The increasing complexity of cyber threats in distributed environments demands advanced frameworks for real-time detection and response across multimodal data streams. This paper introduces ...
- Rethinking Training Dynamics in Scale-wise Autoregressive Generation : Abstract: Recent advances in autoregressive (AR) generative models have produced increasingly powerful systems for media synthesis. Among them, next-scale prediction has emerged as a popular paradigm,...
- Going All-In on LLM Accuracy: Fake Prediction Markets, Real Confidence Signals : Abstract: Large language models are increasingly used to evaluate other models, yet these judgments typically lack any representation of confidence. This pilot study tests whether framing an evaluatio...
- Deep learning for autism detection using clinical notes: A comparison of transfer learning for a transparent and black-box approach : Abstract: Autism spectrum disorder (ASD) is a complex neurodevelopmental condition whose rising prevalence places increasing demands on a lengthy diagnostic process. Machine learning (ML) has shown pr...
- ARCANE: A Multi-Agent Framework for Interpretable and Configurable Alignment : Abstract: As agents based on large language models are increasingly deployed to long-horizon tasks, maintaining their alignment with stakeholder preferences becomes critical. Effective alignment in su...
- On measuring grounding and generalizing grounding problems : Abstract: The symbol grounding problem asks how tokens like cat can be about cats, as opposed to mere shapes manipulated in a calculus. We recast grounding from a binary judgment into an audit across ...
- AI Application in Anti-Money Laundering for Sustainable and Transparent Financial Systems : Abstract: Money laundering and financial fraud remain major threats to global financial stability, costing trillions annually and challenging regulatory oversight. This paper reviews how artificial in...
- How Sharp and Bias-Robust is a Model? Dual Evaluation Perspectives on Knowledge Graph Completion : Abstract: Knowledge graph completion (KGC) aims to predict missing facts from the observed KG. While a number of KGC models have been studied, the evaluation of KGC still remain underexplored. In this...
- DaGRPO: Rectifying Gradient Conflict in Reasoning via Distinctiveness-Aware Group Relative Policy Optimization : Abstract: The evolution of Large Language Models (LLMs) has catalyzed a paradigm shift from superficial instruction following to rigorous long-horizon reasoning. While Group Relative Policy Optimizati...
- Less Is More for Multi-Step Logical Reasoning of LLM Generalisation Under Rule Removal, Paraphrasing, and Compression : Abstract: Large language models (LLMs) excel across many natural language tasks, yet their generalisation to structural perturbations in logical contexts remains poorly understood. We introduce a cont...
- GENIUS: An Agentic AI Framework for Autonomous Design and Execution of Simulation Protocols : Abstract: Predictive atomistic simulations have propelled materials discovery, yet routine setup and debugging still demand computer specialists. This know-how gap limits Integrated Computational Mate...
- UncertaintyZoo: A Unified Toolkit for Quantifying Predictive Uncertainty in Deep Learning Systems : Abstract: Large language models(LLMs) are increasingly expanding their real-world applications across domains, e.g., question answering, autonomous driving, and automatic software development. Despite...
- Smart Spatial Planning in Egypt: An Algorithm-Driven Approach to Public Service Evaluation in Qena City : Abstract: National planning standards for public services in Egypt often fail to align with unique local characteristics. Addressing this gap, this study develops a tailored planning model for Qena Ci...
- The Effect of Belief Boxes and Open-mindedness on Persuasion : Abstract: As multi-agent systems are increasingly utilized for reasoning and decision-making applications, there is a greater need for LLM-based agents to have something resembling propositional belie...
- FlatFormer: A Flat Transformer Knowledge Tracing Model Based on Cognitive Bias Injection : Abstract: Knowledge Tracing (KT) models face a critical ``Performance-Complexity Trap'': capturing complex cognitive dynamics like learning sessions and memory decay typically requires deep hierarchic...
- LightSearcher: Efficient DeepSearch via Experiential Memory : Abstract: DeepSearch paradigms have become a core enabler for deep reasoning models, allowing them to invoke external search tools to access up-to-date, domain-specific knowledge beyond parametric bou...
- Academic journals' AI policies fail to curb the surge in AI-assisted academic writing : Abstract: The rapid integration of generative AI into academic writing has prompted widespread policy responses from journals and publishers. However, the effectiveness of these policies remains uncle...
- Stochasticity in Agentic Evaluations: Quantifying Inconsistency with Intraclass Correlation : Abstract: As large language models become components of larger agentic systems, evaluation reliability becomes critical: unreliable sub-agents introduce brittleness into downstream system behavior. Ye...
- Cognitive Control Architecture (CCA): A Lifecycle Supervision Framework for Robustly Aligned AI Agents : Abstract: Autonomous Large Language Model (LLM) agents exhibit significant vulnerability to Indirect Prompt Injection (IPI) attacks. These attacks hijack agent behavior by polluting external informati...
- ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems : Abstract: Large Language Model (LLM) agents are emerging to transform daily life. However, existing LLM agents primarily follow a reactive paradigm, relying on explicit user instructions to initiate s...
- DoVer: Intervention-Driven Auto Debugging for LLM Multi-Agent Systems : Abstract: Large language model (LLM)-based multi-agent systems are challenging to debug because failures often arise from long, branching interaction traces. The prevailing practice is to leverage LLM...
Research Sources: 691 | Generated: 12/10/2025
