AI RESEARCH PAPERS & ACADEMIC SOURCES
- SDT-6D: Fully Sparse Depth-Transformer for Staged End-to-End 6D Pose Estimation in Industrial Multi-View Bin Picking : Abstract: Accurately recovering 6D poses in densely packed industrial bin-picking environments remain a serious challenge, owing to occlusions, reflections, and textureless parts. We introduce a holis...
- LapFM: A Laparoscopic Segmentation Foundation Model via Hierarchical Concept Evolving Pre-training : Abstract: Surgical segmentation is pivotal for scene understanding yet remains hindered by annotation scarcity and semantic inconsistency across diverse procedures. Existing approaches typically fine-...
- Leveraging Multispectral Sensors for Color Correction in Mobile Cameras : Abstract: Recent advances in snapshot multispectral (MS) imaging have enabled compact, low-cost spectral sensors for consumer and mobile devices. By capturing richer spectral information than conventi...
- Team-Aware Football Player Tracking with SAM: An Appearance-Based Approach to Occlusion Recovery : Abstract: Football player tracking is challenged by frequent occlusions, similar appearances, and rapid motion in crowded scenes. This paper presents a lightweight SAM-based tracking method combining ...
- Temporal Concept Dynamics in Diffusion Models via Prompt-Conditioned Interventions : Abstract: Diffusion models are usually evaluated by their final outputs, gradually denoising random noise into meaningful images. Yet, generation unfolds along a trajectory, and analyzing this dynamic...
- On-the-fly Large-scale 3D Reconstruction from Multi-Camera Rigs : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have enabled efficient free-viewpoint rendering and photorealistic scene reconstruction. While on-the-fly extensions of 3DGS have shown promis...
- Beyond the Noise: Aligning Prompts with Latent Representations in Diffusion Models : Abstract: Conditional diffusion models rely on language-to-image alignment methods to steer the generation towards semantically accurate outputs. Despite the success of this architecture, misalignment...
- OCCDiff: Occupancy Diffusion Model for High-Fidelity 3D Building Reconstruction from Noisy Point Clouds : Abstract: A major challenge in reconstructing buildings from LiDAR point clouds lies in accurately capturing building surfaces under varying point densities and noise interference. To flexibly gather ...
- Thinking with Images via Self-Calling Agent : Abstract: Thinking-with-images paradigms have showcased remarkable visual reasoning capability by integrating visual information as dynamic elements into the Chain-of-Thought (CoT). However, optimizin...
- MVP: Multiple View Prediction Improves GUI Grounding : Abstract: GUI grounding, which translates natural language instructions into precise pixel coordinates, is essential for developing practical GUI agents. However, we observe that existing grounding mo...
- PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and Generation : Abstract: Oil painting, as a high-level medium that blends human abstract thinking with artistic expression, poses substantial challenges for digital generation and editing due to its intricate brushs...
- Photo3D: Advancing Photorealistic 3D Generation through Structure-Aligned Detail Enhancement : Abstract: Although recent 3D-native generators have made great progress in synthesizing reliable geometry, they still fall short in achieving realistic appearances. A key obstacle lies in the lack of ...
- Fast-ARDiff: An Entropy-informed Acceleration Framework for Continuous Space Autoregressive Generation : Abstract: Autoregressive(AR)-diffusion hybrid paradigms combine AR's structured modeling with diffusion's photorealistic synthesis, yet suffer from high latency due to sequential AR generation and ite...
- An Iteration-Free Fixed-Point Estimator for Diffusion Inversion : Abstract: Diffusion inversion aims to recover the initial noise corresponding to a given image such that this noise can reconstruct the original image through the denoising diffusion process. The key ...
- SSCATeR: Sparse Scatter-Based Convolution Algorithm with Temporal Data Recycling for Real-Time 3D Object Detection in LiDAR Point Clouds : Abstract: This work leverages the continuous sweeping motion of LiDAR scanning to concentrate object detection efforts on specific regions that receive a change in point data from one frame to another...
- BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain : Abstract: Understanding how the human brain represents visual concepts, and in which brain regions these representations are encoded, remains a long-standing challenge. Decades of work have advanced o...
- Modular Neural Image Signal Processing : Abstract: This paper presents a modular neural image signal processing (ISP) framework that processes raw inputs and renders high-quality display-referred images. Unlike prior neural ISP designs, our ...
- Instance-Aware Test-Time Segmentation for Continual Domain Shifts : Abstract: Continual Test-Time Adaptation (CTTA) enables pre-trained models to adapt to continuously evolving domains. Existing methods have improved robustness but typically rely on fixed or batch-lev...
- From Cells to Survival: Hierarchical Analysis of Cell Inter-Relations in Multiplex Microscopy for Lung Cancer Prognosis : Abstract: The tumor microenvironment (TME) has emerged as a promising source of prognostic biomarkers. To fully leverage its potential, analysis methods must capture complex interactions between diffe...
- Automated Pollen Recognition in Optical and Holographic Microscopy Images : Abstract: This study explores the application of deep learning to improve and automate pollen grain detection and classification in both optical and holographic microscopy images, with a particular fo...
- OpenMonoGS-SLAM: Monocular Gaussian Splatting SLAM with Open-set Semantics : Abstract: Simultaneous Localization and Mapping (SLAM) is a foundational component in robotics, AR/VR, and autonomous systems. With the rising focus on spatial AI in recent years, combining SLAM with ...
- Trajectory Densification and Depth from Perspective-based Blur : Abstract: In the absence of a mechanical stabilizer, the camera undergoes inevitable rotational dynamics during capturing, which induces perspective-based blur especially under long-exposure scenarios...
- Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation : Abstract: While state-of-the-art image generation models achieve remarkable visual quality, their internal generative processes remain a "black box." This opacity limits human observation and interven...
- C-DIRA: Computationally Efficient Dynamic ROI Routing and Domain-Invariant Adversarial Learning for Lightweight Driver Behavior Recognition : Abstract: Driver distraction behavior recognition using in-vehicle cameras demands real-time inference on edge devices. However, lightweight models often fail to capture fine-grained behavioral cues, ...
- Repulsor: Accelerating Generative Modeling with a Contrastive Memory Bank : Abstract: The dominance of denoising generative models (e.g., diffusion, flow-matching) in visual synthesis is tempered by their substantial training costs and inefficiencies in representation learnin...
- Dual-Branch Center-Surrounding Contrast: Rethinking Contrastive Learning for 3D Point Clouds : Abstract: Most existing self-supervised learning (SSL) approaches for 3D point clouds are dominated by generative methods based on Masked Autoencoders (MAE). However, these generative methods have bee...
- What really matters for person re-identification? A Mixture-of-Experts Framework for Semantic Attribute Importance : Abstract: State-of-the-art person re-identification methods achieve impressive accuracy but remain largely opaque, leaving open the question: which high-level semantic attributes do these models actua...
- Scale-invariant and View-relational Representation Learning for Full Surround Monocular Depth : Abstract: Recent foundation models demonstrate strong generalization capabilities in monocular depth estimation. However, directly applying these models to Full Surround Monocular Depth Estimation (FS...
- SegEarth-OV3: Exploring SAM 3 for Open-Vocabulary Semantic Segmentation in Remote Sensing Images : Abstract: Most existing methods for training-free Open-Vocabulary Semantic Segmentation (OVSS) are based on CLIP. While these approaches have made progress, they often face challenges in precise local...
- A Scalable Pipeline Combining Procedural 3D Graphics and Guided Diffusion for Photorealistic Synthetic Training Data Generation in White Button Mushroom Segmentation : Abstract: Industrial mushroom cultivation increasingly relies on computer vision for monitoring and automated harvesting. However, developing accurate detection and segmentation models requires large,...
- Skewness-Guided Pruning of Multimodal Swin Transformers for Federated Skin Lesion Classification on Edge Devices : Abstract: In recent years, high-performance computer vision models have achieved remarkable success in medical imaging, with some skin lesion classification systems even surpassing dermatology special...
- Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance : Abstract: We present Wan-Move, a simple and scalable framework that brings motion control to video generative models. Existing motion-controllable methods typically suffer from coarse control granular...
- LoFA: Learning to Predict Personalized Priors for Fast Adaptation of Visual Generative Models : Abstract: Personalizing visual generative models to meet specific user needs has gained increasing attention, yet current methods like Low-Rank Adaptation (LoRA) remain impractical due to their demand...
- Tri-Bench: Stress-Testing VLM Reliability on Spatial Reasoning under Camera Tilt and Object Interference : Abstract: Verifiable geometric reasoning is a critical component for trustworthy and controllable agentic AI. Despite impressive capabilities, Vision-Language Models (VLMs) often fail under realistic ...
- SATGround: A Spatially-Aware Approach for Visual Grounding in Remote Sensing : Abstract: Vision-language models (VLMs) are emerging as powerful generalist tools for remote sensing, capable of integrating information across diverse tasks and enabling flexible, instruction-based i...
- Accelerated Rotation-Invariant Convolution for UAV Image Segmentation : Abstract: Rotation invariance is essential for precise, object-level segmentation in UAV aerial imagery, where targets can have arbitrary orientations and exhibit fine-scale details. Conventional segm...
- UniLayDiff: A Unified Diffusion Transformer for Content-Aware Layout Generation : Abstract: Content-aware layout generation is a critical task in graphic design automation, focused on creating visually appealing arrangements of elements that seamlessly blend with a given background...
- Self-Evolving 3D Scene Generation from a Single Image : Abstract: Generating high-quality, textured 3D scenes from a single image remains a fundamental challenge in vision and graphics. Recent image-to-3D generators recover reasonable geometry from single ...
- LiDAS: Lighting-driven Dynamic Active Sensing for Nighttime Perception : Abstract: Nighttime environments pose significant challenges for camera-based perception, as existing methods passively rely on the scene lighting. We introduce Lighting-driven Dynamic Active Sensing ...
- Unified Diffusion Transformer for High-fidelity Text-Aware Image Restoration : Abstract: Text-Aware Image Restoration (TAIR) aims to recover high-quality images from low-quality inputs containing degraded textual content. While diffusion models provide strong generative priors f...
- Efficiently Reconstructing Dynamic Scenes One D4RT at a Time : Abstract: Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper introduces D4RT, a simple yet powe...
- Selfi: Self Improving Reconstruction Engine via 3D Geometric Feature Alignment : Abstract: Novel View Synthesis (NVS) has traditionally relied on models with explicit 3D inductive biases combined with known camera parameters from Structure-from-Motion (SfM) beforehand. Recent visi...
- Sparse Variable Projection in Robotic Perception: Exploiting Separable Structure for Efficient Nonlinear Optimization : Abstract: Robotic perception often requires solving large nonlinear least-squares (NLS) problems. While sparsity has been well-exploited to scale solvers, a complementary and underexploited structure ...
- VLD: Visual Language Goal Distance for Reinforcement Learning Navigation : Abstract: Training end-to-end policies from image data to directly predict navigation actions for robotic systems has proven inherently difficult. Existing approaches often suffer from either the sim-...
- DIJIT: A Robotic Head for an Active Observer : Abstract: We present DIJIT, a novel binocular robotic head expressly designed for mobile agents that behave as active observers. DIJIT's unique breadth of functionality enables active vision research ...
- Generalizations of the Normalized Radon Cumulative Distribution Transform for Limited Data Recognition : Abstract: The Radon cumulative distribution transform (R-CDT) exploits one-dimensional Wasserstein transport and the Radon transform to represent prominent features in images. It is closely related to...
- FlowSteer: Conditioning Flow Field for Consistent Image Restoration : Abstract: Flow-based text-to-image (T2I) models excel at prompt-driven image generation, but falter on Image Restoration (IR), often "drifting away" from being faithful to the measurement. Prior work ...
- RAVES-Calib: Robust, Accurate and Versatile Extrinsic Self Calibration Using Optimal Geometric Features : Abstract: In this paper, we present a user-friendly LiDAR-camera calibration toolkit that is compatible with various LiDAR and camera sensors and requires only a single pair of laser points and a came...
- Self-Reinforced Deep Priors for Reparameterized Full Waveform Inversion : Abstract: Full waveform inversion (FWI) has become a widely adopted technique for high-resolution subsurface imaging. However, its inherent strong nonlinearity often results in convergence toward loca...
- Learning to Control Physically-simulated 3D Characters via Generating and Mimicking 2D Motions : Abstract: Video data is more cost-effective than motion capture data for learning 3D character motion controllers, yet synthesizing realistic and diverse behaviors directly from videos remains challen...
- Spike-EVPR: Deep Spiking Residual Networks with SNN-Tailored Representations for Event-Based Visual Place Recognition : Abstract: Event cameras are ideal for visual place recognition (VPR) in challenging environments due to their high temporal resolution and high dynamic range. However, existing methods convert sparse ...
- Deep Learning, Machine Learning -- Digital Signal and Image Processing: From Theory to Application : Abstract: Digital Signal Processing (DSP) and Digital Image Processing (DIP) with Machine Learning (ML) and Deep Learning (DL) are popular research areas in Computer Vision and related fields. We high...
- Astral Space: Convex Analysis at Infinity : Abstract: Not all convex functions on $\mathbb{R}^n$ have finite minimizers; some can only be minimized by a sequence as it heads to infinity. In this work, we aim to develop a theory for understandin...
- BeeTLe: An Imbalance-Aware Deep Sequence Model for Linear B-Cell Epitope Prediction and Classification with Logit-Adjusted Losses : Abstract: The process of identifying and characterizing B-cell epitopes, which are the portions of antigens recognized by antibodies, is important for our understanding of the immune system, and for m...
- Deep generative modelling of canonical ensemble with differentiable thermal properties : Abstract: It is a long-standing challenge to accurately and efficiently compute thermodynamic quantities of many-body systems at thermal equilibrium. The conventional methods, e.g., Markov chain Monte...
- Learning effective pruning at initialization from iterative pruning : Abstract: Pruning at initialization (PaI) reduces training costs by removing weights before training, which becomes increasingly crucial with the growing network size. However, current PaI methods sti...
- Automated Construction of Artificial Lattice Structures with Designer Electronic States : Abstract: Manipulating matter with a scanning tunneling microscope (STM) enables creation of atomically defined artificial structures that host designer quantum states. However, the time-consuming nat...
- Universal Representation of Generalized Convex Functions and their Gradients : Abstract: A wide range of optimization problems can often be written in terms of generalized convex functions (GCFs). When this structure is present, it can convert certain nested bilevel objectives i...
- Adaptation of Embedding Models to Financial Filings via LLM Distillation : Abstract: Despite advances in generative large language models (LLMs), practical application of specialized conversational AI agents remains constrained by computation costs, latency requirements, and...
- Segment, Embed, and Align: A Universal Recipe for Aligning Subtitles to Signing : Abstract: The goal of this work is to develop a universal approach for aligning subtitles (i.e., spoken language text with corresponding timestamps) to continuous sign language videos. Prior approache...
- Universal Adversarial Suffixes Using Calibrated Gumbel-Softmax Relaxation : Abstract: Language models (LMs) are often used as zero-shot or few-shot classifiers by scoring label words, but they remain fragile to adversarial prompts. Prior work typically optimizes task- or mode...
- Universal Adversarial Suffixes for Language Models Using Reinforcement Learning with Calibrated Reward : Abstract: Language models are vulnerable to short adversarial suffixes that can reliably alter predictions. Previous works usually find such suffixes with gradient search or rule-based methods, but th...
- What Triggers my Model? Contrastive Explanations Inform Gender Choices by Translation Models : Abstract: Interpretability can be implemented as a means to understand decisions taken by (black box) models, such as machine translation (MT) or large language models (LLMs). Yet, research in this ar...
- Soft Inductive Bias Approach via Explicit Reasoning Perspectives in Inappropriate Utterance Detection Using Large Language Models : Abstract: Recent incidents in certain online games and communities, where anonymity is guaranteed, show that unchecked inappropriate remarks frequently escalate into verbal abuse and even criminal beh...
- HealthcareNLP: where are we and what is next? : Abstract: This proposed tutorial focuses on Healthcare Domain Applications of NLP, what we have achieved around HealthcareNLP, and the challenges that lie ahead for the future. Existing reviews in thi...
- QSTN: A Modular Framework for Robust Questionnaire Inference with Large Language Models : Abstract: We introduce QSTN, an open-source Python framework for systematically generating responses from questionnaire-style prompts to support in-silico surveys and annotation tasks with large langu...
- Ask, Answer, and Detect: Role-Playing LLMs for Personality Detection with Question-Conditioned Mixture-of-Experts : Abstract: Understanding human personality is crucial for web applications such as personalized recommendation and mental health assessment. Existing studies on personality detection predominantly adop...
- Accelerating Urban Science Research with AI Urban Scientist : Abstract: Cities are complex, adaptive systems whose underlying principles remain difficult to disentangle despite unprecedented data abundance. Urban science therefore faces a fundamental challenge: ...
- Beyond Unified Models: A Service-Oriented Approach to Low Latency, Context Aware Phonemization for Real Time TTS : Abstract: Lightweight, real-time text-to-speech systems are crucial for accessibility. However, the most efficient TTS models often rely on lightweight phonemizers that struggle with context-dependent...
- Ontology-Based Knowledge Graph Framework for Industrial Standard Documents via Hierarchical and Propositional Structuring : Abstract: Ontology-based knowledge graph (KG) construction is a core technology that enables multidimensional understanding and advanced reasoning over domain knowledge. Industrial standards, in parti...
- Beyond Real Weights: Hypercomplex Representations for Stable Quantization : Abstract: Multimodal language models (MLLMs) require large parameter capacity to align high-dimensional visual features with linguistic representations, making them computationally heavy and difficult...
- Pose-Based Sign Language Spotting via an End-to-End Encoder Architecture : Abstract: Automatic Sign Language Recognition (ASLR) has emerged as a vital field for bridging the gap between deaf and hearing communities. However, the problem of sign-to-sign retrieval or detecting...
- Shrinking the Generation-Verification Gap with Weak Verifiers : Abstract: Verifiers can improve language model capabilities by scoring and ranking responses from generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limite...
- Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality : Abstract: Video face swapping is crucial in film and entertainment production, where achieving high fidelity and temporal consistency over long and complex video sequences remains a significant challe...
- SSplain: Sparse and Smooth Explainer for Retinopathy of Prematurity Classification : Abstract: Neural networks are frequently used in medical diagnosis. However, due to their black-box nature, model explainers are used to help clinicians understand better and trust model outputs. This...
- Lost in Translation, Found in Embeddings: Sign Language Translation and Alignment : Abstract: Our aim is to develop a unified model for sign language understanding, that performs sign language translation (SLT) and sign-subtitle alignment (SSA). Together, these two tasks enable the c...
- Towards Sustainable Universal Deepfake Detection with Frequency-Domain Masking : Abstract: Universal deepfake detection aims to identify AI-generated images across a broad range of generative models, including unseen ones. This requires robust generalization to new and unseen deep...
- Mask to Adapt: Simple Random Masking Enables Robust Continual Test-Time Learning : Abstract: Distribution shifts at test time degrade image classifiers. Recent continual test-time adaptation (CTTA) methods use masking to regulate learning, but often depend on calibrated uncertainty ...
- Identification of Deforestation Areas in the Amazon Rainforest Using Change Detection Models : Abstract: The preservation of the Amazon Rainforest is one of the global priorities in combating climate change, protecting biodiversity, and safeguarding indigenous cultures. The Satellite-based Moni...
- CVP: Central-Peripheral Vision-Inspired Multimodal Model for Spatial Reasoning : Abstract: We present a central-peripheral vision-inspired framework (CVP), a simple yet effective multimodal model for spatial reasoning that draws inspiration from the two types of human visual field...
- Fourier-RWKV: A Multi-State Perception Network for Efficient Image Dehazing : Abstract: Image dehazing is crucial for reliable visual perception, yet it remains highly challenging under real-world non-uniform haze conditions. Although Transformer-based methods excel at capturin...
- Accuracy Does Not Guarantee Human-Likeness in Monocular Depth Estimators : Abstract: Monocular depth estimation is a fundamental capability for real-world applications such as autonomous driving and robotics. Although deep neural networks (DNNs) have achieved superhuman accu...
- GeoLoom: High-quality Geometric Diagram Generation from Textual Input : Abstract: High-quality geometric diagram generation presents both a challenge and an opportunity: it demands strict spatial accuracy while offering well-defined constraints to guide generation. Inspir...
- Animal Re-Identification on Microcontrollers : Abstract: Camera-based animal re-identification (Animal Re-ID) can support wildlife monitoring and precision livestock management in large outdoor environments with limited wireless connectivity. In t...
- Blur2Sharp: Human Novel Pose and View Synthesis with Generative Prior Refinement : Abstract: The creation of lifelike human avatars capable of realistic pose variation and viewpoint flexibility remains a fundamental challenge in computer vision and graphics. Current approaches typic...
- VisKnow: Constructing Visual Knowledge Base for Object Understanding : Abstract: Understanding objects is fundamental to computer vision. Beyond object recognition that provides only a category label as typical output, in-depth object understanding represents a comprehen...
- SOP^2: Transfer Learning with Scene-Oriented Prompt Pool on 3D Object Detection : Abstract: With the rise of Large Language Models (LLMs) such as GPT-3, these models exhibit strong generalization capabilities. Through transfer learning techniques such as fine-tuning and prompt tuni...
- New VVC profiles targeting Feature Coding for Machines : Abstract: Modern video codecs have been extensively optimized to preserve perceptual quality, leveraging models of the human visual system. However, in split inference systems-where intermediate featu...
- Geometry-Aware Sparse Depth Sampling for High-Fidelity RGB-D Depth Completion in Robotic Systems : Abstract: Accurate three-dimensional perception is essential for modern industrial robotic systems that perform manipulation, inspection, and navigation tasks. RGB-D and stereo vision sensors are wide...
- FastBEV++: Fast by Algorithm, Deployable by Design : Abstract: The advancement of camera-only Bird's-Eye-View(BEV) perception is currently impeded by a fundamental tension between state-of-the-art performance and on-vehicle deployment tractability. This...
- Query-aware Hub Prototype Learning for Few-Shot 3D Point Cloud Semantic Segmentation : Abstract: Few-shot 3D point cloud semantic segmentation (FS-3DSeg) aims to segment novel classes with only a few labeled samples. However, existing metric-based prototype learning methods generate pro...
- SFP: Real-World Scene Recovery Using Spatial and Frequency Priors : Abstract: Scene recovery serves as a critical task for various computer vision applications. Existing methods typically rely on a single prior, which is inherently insufficient to handle multiple degr...
- RLCNet: An end-to-end deep learning framework for simultaneous online calibration of LiDAR, RADAR, and Camera : Abstract: Accurate extrinsic calibration of LiDAR, RADAR, and camera sensors is essential for reliable perception in autonomous vehicles. Still, it remains challenging due to factors such as mechanica...
- EgoX: Egocentric Video Generation from a Single Exocentric Video : Abstract: Egocentric perception enables humans to experience and understand the world directly from their own point of view. Translating exocentric (third-person) videos into egocentric (first-person)...
- PAVAS: Physics-Aware Video-to-Audio Synthesis : Abstract: Recent advances in Video-to-Audio (V2A) generation have achieved impressive perceptual quality and temporal synchronization, yet most models remain appearance-driven, capturing visual-acoust...
- OpenSubject: Leveraging Video-Derived Identity and Diversity Priors for Subject-driven Image Generation and Manipulation : Abstract: Despite the promising progress in subject-driven image generation, current models often deviate from the reference identities and struggle in complex scenes with multiple subjects. To addres...
- Detecting Dental Landmarks from Intraoral 3D Scans: the 3DTeethLand challenge : Abstract: Teeth landmark detection is a critical task in modern clinical orthodontics. Their precise identification enables advanced diagnostics, facilitates personalized treatment strategies, and sup...
- GeoDiffMM: Geometry-Guided Conditional Diffusion for Motion Magnification : Abstract: Video Motion Magnification (VMM) amplifies subtle macroscopic motions to a perceptible level. Recently, existing mainstream Eulerian approaches address amplification-induced noise via decoup...
- PointDico: Contrastive 3D Representation Learning Guided by Diffusion Models : Abstract: Self-supervised representation learning has shown significant improvement in Natural Language Processing and 2D Computer Vision. However, existing methods face difficulties in representing 3...
- Bi^2MAC: Bimodal Bi-Adaptive Mask-Aware Convolution for Remote Sensing Pansharpening : Abstract: Pansharpening aims to fuse a high-resolution panchromatic (PAN) image with a low-resolution multispectral (LRMS) image to generate a high-resolution multispectral image (HRMS). Conventional ...
- HybridSplat: Fast Reflection-baked Gaussian Tracing using Hybrid Splatting : Abstract: Rendering complex reflection of real-world scenes using 3D Gaussian splatting has been a quite promising solution for photorealistic novel view synthesis, but still faces bottlenecks especia...
- DINO-BOLDNet: A DINOv3-Guided Multi-Slice Attention Network for T1-to-BOLD Generation : Abstract: Generating BOLD images from T1w images offers a promising solution for recovering missing BOLD information and enabling downstream tasks when BOLD images are corrupted or unavailable. Motiva...
- TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels : Abstract: Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video and has witnessed rapid progress in recent years. However, we argue that the ex...
- SCU-CGAN: Enhancing Fire Detection through Synthetic Fire Image Generation and Dataset Augmentation : Abstract: Fire has long been linked to human life, causing severe disasters and losses. Early detection is crucial, and with the rise of home IoT technologies, household fire detection systems have em...
- The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss : Abstract: Multimodal Large Language Models (MLLMs), which couple pre-trained vision encoders and language models, have shown remarkable capabilities. However, their reliance on the ubiquitous Pre-Norm...
- Simultaneous Enhancement and Noise Suppression under Complex Illumination Conditions : Abstract: Under challenging light conditions, captured images often suffer from various degradations, leading to a decline in the performance of vision-based applications. Although numerous methods ha...
- Detection of Digital Facial Retouching utilizing Face Beauty Information : Abstract: Facial retouching to beautify images is widely spread in social media, advertisements, and it is even applied in professional photo studios to let individuals appear younger, remove wrinkles...
- Towards Visual Re-Identification of Fish using Fine-Grained Classification for Electronic Monitoring in Fisheries : Abstract: Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being col...
- SAM-Body4D: Training-Free 4D Human Body Mesh Recovery from Videos : Abstract: Human Mesh Recovery (HMR) aims to reconstruct 3D human pose and shape from 2D observations and is fundamental to human-centric understanding in real-world scenarios. While recent image-based...
- Towards Effective and Efficient Long Video Understanding of Multimodal Large Language Models via One-shot Clip Retrieval : Abstract: Due to excessive memory overhead, most Multimodal Large Language Models (MLLMs) can only process videos of limited frames. In this paper, we propose an effective and efficient paradigm to re...
- Robust Agents in Open-Ended Worlds : Abstract: The growing prevalence of artificial intelligence (AI) in various applications underscores the need for agents that can successfully navigate and adapt to an ever-changing, open-ended world....
- PolyLingua: Margin-based Inter-class Transformer for Robust Cross-domain Language Detection : Abstract: Language identification is a crucial first step in multilingual systems such as chatbots and virtual assistants, enabling linguistically and culturally accurate user experiences. Errors at t...
- MobileFineTuner: A Unified End-to-End Framework for Fine-Tuning LLMs on Mobile Phones : Abstract: Mobile phones are the most ubiquitous end devices, generating vast amounts of human-authored data and serving as the primary platform for end-side applications. As high-quality public data f...
- Correction of Decoupled Weight Decay : Abstract: Decoupled weight decay, solely responsible for the performance advantage of AdamW over Adam, has long been set to proportional to learning rate $γ$ without questioning. Some researchers have...
- Persistent Topological Structures and Cohomological Flows as a Mathematical Framework for Brain-Inspired Representation Learning : Abstract: This paper presents a mathematically rigorous framework for brain-inspired representation learning founded on the interplay between persistent topological structures and cohomological flows....
- SPROCKET: Extending ROCKET to Distance-Based Time-Series Transformations With Prototypes : Abstract: Classical Time Series Classification algorithms are dominated by feature engineering strategies. One of the most prominent of these transforms is ROCKET, which achieves strong performance th...
- Wavelet-Accelerated Physics-Informed Quantum Neural Network for Multiscale Partial Differential Equations : Abstract: This work proposes a wavelet-based physics-informed quantum neural network framework to efficiently address multiscale partial differential equations that involve sharp gradients, stiffness,...
- Geometric-Stochastic Multimodal Deep Learning for Predictive Modeling of SUDEP and Stroke Vulnerability : Abstract: Sudden Unexpected Death in Epilepsy (SUDEP) and acute ischemic stroke are life-threatening conditions involving complex interactions across cortical, brainstem, and autonomic systems. We pre...
- Mathematical Foundations of Neural Tangents and Infinite-Width Networks : Abstract: We investigate the mathematical foundations of neural networks in the infinite-width regime through the Neural Tangent Kernel (NTK). We propose the NTK-Eigenvalue-Controlled Residual Network...
- SOFA-FL: Self-Organizing Hierarchical Federated Learning with Adaptive Clustered Data Sharing : Abstract: Federated Learning (FL) faces significant challenges in evolving environments, particularly regarding data heterogeneity and the rigidity of fixed network topologies. To address these issues...
- gHAWK: Local and Global Structure Encoding for Scalable Training of Graph Neural Networks on Knowledge Graphs : Abstract: Knowledge Graphs (KGs) are a rich source of structured, heterogeneous data, powering a wide range of applications. A common approach to leverage this data is to train a graph neural network ...
- Jacobian Aligned Random Forests : Abstract: Axis-aligned decision trees are fast and stable but struggle on datasets with rotated or interaction-dependent decision boundaries, where informative splits require linear combinations of fe...
- Minimizing Layerwise Activation Norm Improves Generalization in Federated Learning : Abstract: Federated Learning (FL) is an emerging machine learning framework that enables multiple clients (coordinated by a server) to collaboratively train a global model by aggregating the locally t...
- A Multivariate Bernoulli-Based Sampling Method for Multi-Label Data with Application to Meta-Research : Abstract: Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient obs...
- Fully Decentralized Certified Unlearning : Abstract: Machine unlearning (MU) seeks to remove the influence of specified data from a trained model in response to privacy requests or data poisoning. While certified unlearning has been analyzed i...
- Transformers for Multimodal Brain State Decoding: Integrating Functional Magnetic Resonance Imaging Data and Medical Metadata : Abstract: Decoding brain states from functional magnetic resonance imaging (fMRI) data is vital for advancing neuroscience and clinical applications. While traditional machine learning and deep learni...
- Solving Over-Smoothing in GNNs via Nonlocal Message Passing: Algebraic Smoothing and Depth Scalability : Abstract: The relationship between Layer Normalization (LN) placement and the over-smoothing phenomenon remains underexplored. We identify a critical dilemma: Pre-LN architectures avoid over-smoothing...
- Optimal Perturbation Budget Allocation for Data Poisoning in Offline Reinforcement Learning : Abstract: Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to data poisoning attacks. Existing attack strategies typically rely on loca...
- Long-Sequence LSTM Modeling for NBA Game Outcome Prediction Using a Novel Multi-Season Dataset : Abstract: Predicting the outcomes of professional basketball games, particularly in the National Basketball Association (NBA), has become increasingly important for coaching strategy, fan engagement, ...
- DS FedProxGrad: Asymptotic Stationarity Without Noise Floor in Fair Federated Learning : Abstract: Recent work \cite{arifgroup} introduced Federated Proximal Gradient \textbf{(\texttt{FedProxGrad})} for solving non-convex composite optimization problems in group fair federated learning. H...
- An Additive Manufacturing Part Qualification Framework: Transferring Knowledge of Stress-strain Behaviors from Additively Manufactured Polymers to Metals : Abstract: Part qualification is crucial in additive manufacturing (AM) because it ensures that additively manufactured parts can be consistently produced and reliably used in critical applications. Pa...
- Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search : Abstract: Text-to-image (TTI) diffusion models have achieved remarkable visual quality, yet they have been repeatedly shown to exhibit social biases across sensitive attributes such as gender, race an...
- Neural Ordinary Differential Equations for Simulating Metabolic Pathway Dynamics from Time-Series Multiomics Data : Abstract: The advancement of human healthspan and bioengineering relies heavily on predicting the behavior of complex biological systems. While high-throughput multiomics data is becoming increasingly...
- Learning and Editing Universal Graph Prompt Tuning via Reinforcement Learning : Abstract: Early graph prompt tuning approaches relied on task-specific designs for Graph Neural Networks (GNNs), limiting their adaptability across diverse pre-training strategies. In contrast, anothe...
- De novo generation of functional terpene synthases using TpsGPT : Abstract: Terpene synthases (TPS) are a key family of enzymes responsible for generating the diverse terpene scaffolds that underpin many natural products, including front-line anticancer drugs such a...
- Identifying counterfactual probabilities using bivariate distributions and uplift modeling : Abstract: Uplift modeling estimates the causal effect of an intervention as the difference between potential outcomes under treatment and control, whereas counterfactual identification aims to recover...
- Forecasting Fails: Unveiling Evasion Attacks in Weather Prediction Models : Abstract: With the increasing reliance on AI models for weather forecasting, it is imperative to evaluate their vulnerability to adversarial perturbations. This work introduces Weather Adaptive Advers...
- Reinforcement Learning From State and Temporal Differences : Abstract: TD($λ$) with function approximation has proved empirically successful for some complex reinforcement learning problems. For linear approximation, TD($λ$) has been shown to minimise the squar...
- Refining Diffusion Models for Motion Synthesis with an Acceleration Loss to Generate Realistic IMU Data : Abstract: We propose a text-to-IMU (inertial measurement unit) motion-synthesis framework to obtain realistic IMU data by fine-tuning a pretrained diffusion model with an acceleration-based second-ord...
- Explainable Anomaly Detection for Industrial IoT Data Streams : Abstract: Industrial maintenance is being transformed by the Internet of Things and edge computing, generating continuous data streams that demand real-time, adaptive decision-making under limited com...
- Unsupervised Learning of Density Estimates with Topological Optimization : Abstract: Kernel density estimation is a key component of a wide variety of algorithms in machine learning, Bayesian inference, stochastic dynamics and signal processing. However, the unsupervised den...
- Open Polymer Challenge: Post-Competition Report : Abstract: Machine learning (ML) offers a powerful path toward discovering sustainable polymer materials, but progress has been limited by the lack of large, high-quality, and openly accessible polymer...
- Fast and Robust Diffusion Posterior Sampling for MR Image Reconstruction Using the Preconditioned Unadjusted Langevin Algorithm : Abstract: Purpose: The Unadjusted Langevin Algorithm (ULA) in combination with diffusion models can generate high quality MRI reconstructions with uncertainty estimation from highly undersampled k-spa...
- Detection of Cyberbullying in GIF using AI : Abstract: Cyberbullying is a well-known social issue, and it is escalating day by day. Due to the vigorous development of the internet, social media provide many different ways for the user to express...
- Integrating LSTM Networks with Neural Levy Processes for Financial Forecasting : Abstract: This paper investigates an optimal integration of deep learning with financial models for robust asset price forecasting. Specifically, we developed a hybrid framework combining a Long Short...
- CrowdLLM: Building LLM-Based Digital Populations Augmented with Generative Models : Abstract: The emergence of large language models (LLMs) has sparked much interest in creating LLM-based digital populations that can be applied to many applications such as social simulation, crowdsou...
- Conformal Defects in Neural Network Field Theories : Abstract: Neural Network Field Theories (NN-FTs) represent a novel construction of arbitrary field theories, including those of conformal fields, through the specification of the network architecture ...
- A Comparative Study of EMG- and IMU-based Gesture Recognition at the Wrist and Forearm : Abstract: Gestures are an integral part of our daily interactions with the environment. Hand gesture recognition (HGR) is the process of interpreting human intent through various input modalities, suc...
- Learning Dynamics from Infrequent Output Measurements for Uncertainty-Aware Optimal Control : Abstract: Reliable optimal control is challenging when the dynamics of a nonlinear system are unknown and only infrequent, noisy output measurements are available. This work addresses this setting of ...
- Provable Diffusion Posterior Sampling for Bayesian Inversion : Abstract: This paper proposes a novel diffusion-based posterior sampling method within a plug-and-play (PnP) framework. Our approach constructs a probability transport from an easy-to-sample terminal ...
- An Introduction to Deep Reinforcement and Imitation Learning : Abstract: Embodied agents, such as robots and virtual characters, must continuously select actions to execute tasks effectively, solving complex sequential decision-making problems. Given the difficul...
- Fairness-aware PageRank via Edge Reweighting : Abstract: Link-analysis algorithms, such as PageRank, are instrumental in understanding the structural dynamics of networks by evaluating the importance of individual vertices based on their connectiv...
- Multi-agent learning under uncertainty: Recurrence vs. concentration : Abstract: In this paper, we examine the convergence landscape of multi-agent learning under uncertainty. Specifically, we analyze two stochastic models of regularized learning in continuous games -- o...
- Robust equilibria in continuous games: From strategic to dynamic robustness : Abstract: In this paper, we examine the robustness of Nash equilibria in continuous games, under both strategic and dynamic uncertainty. Starting with the former, we introduce the notion of a robust e...
- Worst-case generation via minimax optimization in Wasserstein space : Abstract: Worst-case generation plays a critical role in evaluating robustness and stress-testing systems under distribution shifts, in applications ranging from machine learning models to power grids...
- Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation : Abstract: Accurate segmentation of cancerous lesions from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. However, even state-of-the-art models...
- Zero-Splat TeleAssist: A Zero-Shot Pose Estimation Framework for Semantic Teleoperation : Abstract: We introduce Zero-Splat TeleAssist, a zero-shot sensor-fusion pipeline that transforms commodity CCTV streams into a shared, 6-DoF world model for multilateral teleoperation. By integrating ...
- FedLAD: A Modular and Adaptive Testbed for Federated Log Anomaly Detection : Abstract: Log-based anomaly detection (LAD) is critical for ensuring the reliability of large-scale distributed systems. However, most existing LAD approaches assume centralized training, which is oft...
- Probabilistic Multi-Agent Aircraft Landing Time Prediction : Abstract: Accurate and reliable aircraft landing time prediction is essential for effective resource allocation in air traffic management. However, the inherent uncertainty of aircraft trajectories an...
- Magnetic activity of ultracool dwarfs in the LAMOST DR11 : Abstract: Ultracool dwarfs consist of lowest-mass stars and brown dwarfs. Their interior is fully convective, different from that of the partly-convective Sun-like stars. Magnetic field generation pro...
- Low Rank Support Quaternion Matrix Machine : Abstract: Input features are conventionally represented as vectors, matrices, or third order tensors in the real field, for color image classification. Inspired by the success of quaternion data model...
- Multi-Agent Deep Reinforcement Learning for Collaborative UAV Relay Networks under Jamming Atatcks : Abstract: The deployment of Unmanned Aerial Vehicle (UAV) swarms as dynamic communication relays is critical for next-generation tactical networks. However, operating in contested environments require...
- Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging : Abstract: The training and deployment of machine learning (ML) models have become extremely energy-intensive. While existing optimization efforts focus primarily on hardware energy efficiency, a signi...
- Beyond Wave Variables: A Data-Driven Ensemble Approach for Enhanced Teleoperation Transparency and Stability : Abstract: Time delays in communication channels present significant challenges for bilateral teleoperation systems, affecting both transparency and stability. Although traditional wave variable-based ...
- Learned iterative networks: An operator learning perspective : Abstract: Learned image reconstruction has become a pillar in computational imaging and inverse problems. Among the most successful approaches are learned iterative networks, which are formulated by u...
- Uncertainty-Aware Subset Selection for Robust Visual Explainability under Distribution Shifts : Abstract: Subset selection-based methods are widely used to explain deep vision models: they attribute predictions by highlighting the most influential image regions and support object-level explanati...
- Fused Gromov-Wasserstein Contrastive Learning for Effective Enzyme-Reaction Screening : Abstract: Enzymes are crucial catalysts that enable a wide range of biochemical reactions. Efficiently identifying specific enzymes from vast protein libraries is essential for advancing biocatalysis....
- Data-Efficient Learning of Anomalous Diffusion with Wavelet Representations: Enabling Direct Learning from Experimental Trajectories : Abstract: Machine learning (ML) has become a versatile tool for analyzing anomalous diffusion trajectories, yet most existing pipelines are trained on large collections of simulated data. In contrast,...
- Minimax and Bayes Optimal Adaptive Experimental Design for Treatment Choice : Abstract: We consider an adaptive experiment for treatment choice and design a minimax and Bayes optimal adaptive experiment with respect to regret. Given binary treatments, the experimenter's goal is...
- Heuristics for Combinatorial Optimization via Value-based Reinforcement Learning: A Unified Framework and Analysis : Abstract: Since the 1990s, considerable empirical work has been carried out to train statistical models, such as neural networks (NNs), as learned heuristics for combinatorial optimization (CO) proble...
- An Agentic AI System for Multi-Framework Communication Coding : Abstract: Clinical communication is central to patient outcomes, yet large-scale human annotation of patient-provider conversation remains labor-intensive, inconsistent, and difficult to scale. Existi...
- Direct transfer of optimized controllers to similar systems using dimensionless MPC : Abstract: Scaled model experiments are commonly used in various engineering fields to reduce experimentation costs and overcome constraints associated with full-scale systems. The relevance of such ex...
- Gradient-Informed Monte Carlo Fine-Tuning of Diffusion Models for Low-Thrust Trajectory Design : Abstract: Preliminary mission design of low-thrust spacecraft trajectories in the Circular Restricted Three-Body Problem is a global search characterized by a complex objective landscape and numerous ...
- Generation is Required for Data-Efficient Perception : Abstract: It has been hypothesized that human-level visual perception requires a generative approach in which internal representations result from inverting a decoder. Yet today's most successful visi...
- Secure and Privacy-Preserving Federated Learning for Next-Generation Underground Mine Safety : Abstract: Underground mining operations depend on sensor networks to monitor critical parameters such as temperature, gas concentration, and miner movement, enabling timely hazard detection and safety...
- Decentralized Trust for Space AI: Blockchain-Based Federated Learning Across Multi-Vendor LEO Satellite Networks : Abstract: The rise of space AI is reshaping government and industry through applications such as disaster detection, border surveillance, and climate monitoring, powered by massive data from commercia...
- OSMO: Open-Source Tactile Glove for Human-to-Robot Skill Transfer : Abstract: Human video demonstrations provide abundant training data for learning robot policies, but video alone cannot capture the rich contact signals critical for mastering manipulation. We introdu...
- Discovering Influential Factors in Variational Autoencoders : Abstract: In the field of machine learning, it is still a critical issue to identify and supervise the learned representation without manually intervening or intuition assistance to extract useful kno...
- Generative Learning of Heterogeneous Tail Dependence : Abstract: We propose a multivariate generative model to capture the complex dependence structure often encountered in business and financial data. Our model features heterogeneous and asymmetric tail ...
- Freeze then Train: Towards Provable Representation Learning under Spurious Correlations and Feature Noise : Abstract: The existence of spurious correlations such as image backgrounds in the training environment can make empirical risk minimization (ERM) perform badly in the test environment. To address this...
- Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning : Abstract: Federated Learning (FL) is a machine learning paradigm that enables clients to jointly train a global model by aggregating the locally trained models without sharing any local training data....
- BG-HGNN: Toward Efficient Learning for Complex Heterogeneous Graphs : Abstract: Heterogeneous graphs, comprising diverse node and edge types connected through varied relations, are ubiquitous in real-world applications. Message-passing heterogeneous graph neural network...
- Neural Surrogate HMC: On Using Neural Likelihoods for Hamiltonian Monte Carlo in Simulation-Based Inference : Abstract: Bayesian inference methods such as Markov Chain Monte Carlo (MCMC) typically require repeated computations of the likelihood function, but in some scenarios this is infeasible and alternativ...
- Geometry Aware Meta-Learning Neural Network for Joint Phase and Precoder Optimization in RIS : Abstract: In reconfigurable intelligent surface (RIS) aided systems, the joint optimization of the precoder matrix at the base station and the phase shifts of the RIS elements involves significant com...
- GLL: A Differentiable Graph Learning Layer for Neural Networks : Abstract: Standard deep learning architectures used for classification generate label predictions with a projection head and softmax activation function. Although successful, these methods fail to lev...
- TabKAN: Advancing Tabular Data Analysis using Kolmogorov-Arnold Network : Abstract: Tabular data analysis presents unique challenges that arise from heterogeneous feature types, missing values, and complex feature interactions. While traditional machine learning methods lik...
- InfiniteVL: Synergizing Linear and Sparse Attention for Highly-Efficient, Unlimited-Input Vision-Language Models : Abstract: Window attention and linear attention represent two principal strategies for mitigating the quadratic complexity and ever-growing KV cache in Vision-Language Models (VLMs). However, we obser...
- Differentially Private Synthetic Data Generation Using Context-Aware GANs : Abstract: The widespread use of big data across sectors has raised major privacy concerns, especially when sensitive information is shared or analyzed. Regulations such as GDPR and HIPAA impose strict...
- Fed-SE: Federated Self-Evolution for Privacy-Constrained Multi-Environment LLM Agents : Abstract: LLM agents are widely deployed in complex interactive tasks, yet privacy constraints often preclude centralized optimization and co-evolution across dynamic environments. While Federated Lea...
- Siamese-Driven Optimization for Low-Resolution Image Latent Embedding in Image Captioning : Abstract: Image captioning is essential in many fields including assisting visually impaired individuals, improving content management systems, and enhancing human-computer interaction. However, a rec...
- When Tables Leak: Attacking String Memorization in LLM-Based Tabular Data Generation : Abstract: Large Language Models (LLMs) have recently demonstrated remarkable performance in generating high-quality tabular synthetic data. In practice, two primary approaches have emerged for adaptin...
- DAO-GP Drift Aware Online Non-Linear Regression Gaussian-Process : Abstract: Real-world datasets often exhibit temporal dynamics characterized by evolving data distributions. Disregarding this phenomenon, commonly referred to as concept drift, can significantly dimin...
- No Labels, No Problem: Training Visual Reasoners with Multimodal Verifiers : Abstract: Visual reasoning is challenging, requiring both precise object grounding and understanding complex spatial relationships. Existing methods fall into two camps: language-only chain-of-thought...
- Toward Faithful Retrieval-Augmented Generation with Sparse Autoencoders : Abstract: Retrieval-Augmented Generation (RAG) improves the factuality of large language models (LLMs) by grounding outputs in retrieved evidence, but faithfulness failures, where generations contradi...
- Revisiting the Scaling Properties of Downstream Metrics in Large Language Model Training : Abstract: While scaling laws for Large Language Models (LLMs) traditionally focus on proxy metrics like pretraining loss, predicting downstream task performance has been considered unreliable. This pa...
- SAQ: Stabilizer-Aware Quantum Error Correction Decoder : Abstract: Quantum Error Correction (QEC) decoding faces a fundamental accuracy-efficiency tradeoff. Classical methods like Minimum Weight Perfect Matching (MWPM) exhibit variable performance across no...
- Astra: General Interactive World Model with Autoregressive Denoising : Abstract: Recent advances in diffusion transformers have empowered video generation models to generate high-quality video clips from texts or images. However, world models with the ability to predict ...
- NumCoKE: Ordinal-Aware Numerical Reasoning over Knowledge Graphs with Mixture-of-Experts and Contrastive Learning : Abstract: Knowledge graphs (KGs) serve as a vital backbone for a wide range of AI applications, including natural language understanding and recommendation. A promising yet underexplored direction is ...
- No-Regret Strategy Solving in Imperfect-Information Games via Pre-Trained Embedding : Abstract: High-quality information set abstraction remains a core challenge in solving large-scale imperfect-information extensive-form games (IIEFGs)--such as no-limit Texas Hold'em--where the finite...
- ChemLabs on ChemO: A Multi-Agent System for Multimodal Reasoning on IChO 2025 : Abstract: Olympiad-level benchmarks in mathematics and physics are crucial testbeds for advanced AI reasoning, but chemistry, with its unique multimodal symbolic language, has remained an open challen...
- Orchestrator Multi-Agent Clinical Decision Support System for Secondary Headache Diagnosis in Primary Care : Abstract: Unlike most primary headaches, secondary headaches need specialized care and can have devastating consequences if not treated promptly. Clinical guidelines highlight several 'red flag' featu...
- MARL Warehouse Robots : Abstract: We present a comparative study of multi-agent reinforcement learning (MARL) algorithms for cooperative warehouse robotics. We evaluate QMIX and IPPO on the Robotic Warehouse (RWARE) environm...
- SlideGen: Collaborative Multimodal Agents for Scientific Slide Generation : Abstract: Generating academic slides from scientific papers is a challenging multimodal reasoning task that requires both long context understanding and deliberate visual planning. Existing approaches...
- MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens : Abstract: The effectiveness of Multimodal Large Language Models (MLLMs) demonstrates a profound capability in multimodal understanding. However, the simultaneous generation of images with coherent tex...
- Diffusion Models for Wireless Communications : Abstract: A comprehensive study on the applications of denoising diffusion models for wireless systems is provided. The article highlights the capabilities of diffusion models in learning complicated ...
- Score-based Conditional Out-of-Distribution Augmentation for Graph Covariate Shift : Abstract: Distribution shifts between training and testing datasets significantly impair the model performance on graph learning. A commonly-taken causal view in graph invariant learning suggests that...
- The Cream Rises to the Top: Efficient Reranking Method for Verilog Code Generation : Abstract: LLMs face significant challenges in Verilog generation due to limited domain-specific knowledge. While sampling techniques improve pass@k metrics, hardware engineers need one trustworthy sol...
- CarBench: A Comprehensive Benchmark for Neural Surrogates on High-Fidelity 3D Car Aerodynamics : Abstract: Benchmarking has been the cornerstone of progress in computer vision, natural language processing, and the broader deep learning domain, driving algorithmic innovation through standardized d...
- RaX-Crash: A Resource Efficient and Explainable Small Model Pipeline with an Application to City Scale Injury Severity Prediction : Abstract: New York City reports over one hundred thousand motor vehicle collisions each year, creating substantial injury and public health burden. We present RaX-Crash, a resource efficient and expla...
- HSTMixer: A Hierarchical MLP-Mixer for Large-Scale Traffic Forecasting : Abstract: Traffic forecasting task is significant to modern urban management. Recently, there is growing attention on large-scale forecasting, as it better reflects the complexity of real-world traffi...
- LAPA: Log-Domain Prediction-Driven Dynamic Sparsity Accelerator for Transformer Model : Abstract: Attention-based Transformers have revolutionized natural language processing (NLP) and shown strong performance in computer vision (CV) tasks. However, as the input sequence varies, the comp...
- Medical Test-free Disease Detection Based on Big Data : Abstract: Accurate disease detection is of paramount importance for effective medical treatment and patient care. However, the process of disease detection is often associated with extensive medical t...
- SA^2GFM: Enhancing Robust Graph Foundation Models with Structure-Aware Semantic Augmentation : Abstract: We present Graph Foundation Models (GFMs) which have made significant progress in various tasks, but their robustness against domain noise, structural perturbations, and adversarial attacks ...
- FAIM: Frequency-Aware Interactive Mamba for Time Series Classification : Abstract: Time series classification (TSC) is crucial in numerous real-world applications, such as environmental monitoring, medical diagnosis, and posture recognition. TSC tasks require models to eff...
- SetAD: Semi-Supervised Anomaly Learning in Contextual Sets : Abstract: Semi-supervised anomaly detection (AD) has shown great promise by effectively leveraging limited labeled data. However, existing methods are typically structured around scoring individual po...
- Pattern Recognition of Ozone-Depleting Substance Exports in Global Trade Data : Abstract: New methods are needed to monitor environmental treaties, like the Montreal Protocol, by reviewing large, complex customs datasets. This paper introduces a framework using unsupervised machi...
- Using Text-Based Life Trajectories from Swedish Register Data to Predict Residential Mobility with Pretrained Transformers : Abstract: We transform large-scale Swedish register data into textual life trajectories to address two long-standing challenges in data analysis: high cardinality of categorical variables and inconsis...
- Controllable risk scenario generation from human crash data for autonomous vehicle testing : Abstract: Ensuring the safety of autonomous vehicles (AV) requires rigorous testing under both everyday driving and rare, safety-critical conditions. A key challenge lies in simulating environment age...
- Softly Symbolifying Kolmogorov-Arnold Networks : Abstract: Kolmogorov-Arnold Networks (KANs) offer a promising path toward interpretable machine learning: their learnable activations can be studied individually, while collectively fitting complex da...
- Fourier-Enhanced Recurrent Neural Networks for Electrical Load Time Series Downscaling : Abstract: We present a Fourier-enhanced recurrent neural network (RNN) for downscaling electrical loads. The model combines (i) a recurrent backbone driven by low-resolution inputs, (ii) explicit Four...
- Graph Contrastive Learning via Spectral Graph Alignment : Abstract: Given augmented views of each input graph, contrastive learning methods (e.g., InfoNCE) optimize pairwise alignment of graph embeddings across views while providing no mechanism to control t...
- Nonnegative Matrix Factorization through Cone Collapse : Abstract: Nonnegative matrix factorization (NMF) is a widely used tool for learning parts-based, low-dimensional representations of nonnegative data, with applications in vision, text, and bioinformat...
- Semi-Supervised Contrastive Learning with Orthonormal Prototypes : Abstract: Contrastive learning has emerged as a powerful method in deep learning, excelling at learning effective representations through contrasting samples from different distributions. However, dim...
- Towards symbolic regression for interpretable clinical decision scores : Abstract: Medical decision-making makes frequent use of algorithms that combine risk equations with rules, providing clear and standardized treatment pathways. Symbolic regression (SR) traditionally l...
- CIP-Net: Continual Interpretable Prototype-based Network : Abstract: Continual learning constrains models to learn new tasks over time without forgetting what they have already learned. A key challenge in this setting is catastrophic forgetting, where learnin...
- HOLE: Homological Observation of Latent Embeddings for Neural Network Interpretability : Abstract: Deep learning models have achieved remarkable success across various domains, yet their learned representations and decision-making processes remain largely opaque and hard to interpret. Thi...
- Bridging the Clinical Expertise Gap: Development of a Web-Based Platform for Accessible Time Series Forecasting and Analysis : Abstract: Time series forecasting has applications across domains and industries, especially in healthcare, but the technical expertise required to analyze data, build models, and interpret results ca...
- Benchmarking Offline Multi-Objective Reinforcement Learning in Critical Care : Abstract: In critical care settings such as the Intensive Care Unit, clinicians face the complex challenge of balancing conflicting objectives, primarily maximizing patient survival while minimizing r...
- CLARITY: Medical World Model for Guiding Treatment Decisions by Modeling Context-Aware Disease Trajectories in Latent Space : Abstract: Clinical decision-making in oncology requires predicting dynamic disease evolution, a task current static AI predictors cannot perform. While world models (WMs) offer a paradigm for generati...
- LUNA: Linear Universal Neural Attention with Generalization Guarantees : Abstract: Scaling attention faces a critical bottleneck: the $\mathcal{O}(n^2)$ quadratic computational cost of softmax attention, which limits its application in long-sequence domains. While linear a...
- Deep Kernel Aalen-Johansen Estimator: An Interpretable and Flexible Neural Net Framework for Competing Risks : Abstract: We propose an interpretable deep competing risks model called the Deep Kernel Aalen-Johansen (DKAJ) estimator, which generalizes the classical Aalen-Johansen nonparametric estimate of cumula...
- CAMO: Causality-Guided Adversarial Multimodal Domain Generalization for Crisis Classification : Abstract: Crisis classification in social media aims to extract actionable disaster-related information from multimodal posts, which is a crucial task for enhancing situational awareness and facilitat...
- Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders : Abstract: Since the advent of machine learning, interpretability has remained a persistent challenge, becoming increasingly urgent as generative models support high-stakes applications in drug and mat...
- Complexity of One-Dimensional ReLU DNNs : Abstract: We study the expressivity of one-dimensional (1D) ReLU deep neural networks through the lens of their linear regions. For randomly initialized, fully connected 1D ReLU networks (He scaling w...
- Improving the Sensitivity of Backdoor Detectors via Class Subspace Orthogonalization : Abstract: Most post-training backdoor detection methods rely on attacked models exhibiting extreme outlier detection statistics for the target class of an attack, compared to non-target classes. Howev...
- Command & Control (C2) Traffic Detection Via Algorithm Generated Domain (Dga) Classification Using Deep Learning And Natural Language Processing : Abstract: The sophistication of modern malware, specifically regarding communication with Command and Control (C2) servers, has rendered static blacklist-based defenses obsolete. The use of Domain Gen...
- LLM-Generated Counterfactual Stress Scenarios for Portfolio Risk Simulation via Hybrid Prompt-RAG Pipeline : Abstract: We develop a transparent and fully auditable LLM-based pipeline for macro-financial stress testing, combining structured prompting with optional retrieval of country fundamentals and news. T...
- Bayesian Optimization for Function-Valued Responses under Min-Max Criteria : Abstract: Bayesian optimization is widely used for optimizing expensive black box functions, but most existing approaches focus on scalar responses. In many scientific and engineering settings the res...
- Manifolds and Modules: How Function Develops in a Neural Foundation Model : Abstract: Foundation models have shown remarkable success in fitting biological visual systems; however, their black-box nature inherently limits their utility for understanding brain function. Here, ...
- Quantum Circuit Reasoning Models: A Variational Framework for Differentiable Logical Inference : Abstract: This report introduces a novel class of reasoning architectures, termed Quantum Circuit Reasoning Models (QCRM), which extend the concept of Variational Quantum Circuits (VQC) from energy mi...
- Advancing physiological time series reconstruction and imputation via mixture of receptive fields and experts fusion : Abstract: Recent studies show that using diffusion models for time series signal reconstruc- tion holds great promise. However, such approaches remain largely unexplored in the domain of medical time ...
- Artificial Intelligence-Driven Network-on-Chip Design Space Exploration: Neural Network Architectures for Design : Abstract: Network-on-Chip (NoC) design requires exploring a high-dimensional configuration space to satisfy stringent throughput requirements and latency constraints.Traditional design space explorati...
- Referenceless Proton Resonance Frequency Thermometry Using Deep Learning with Self-Attention : Abstract: Background: Accurate proton resonance frequency (PRF) MR thermometry is essential for monitoring temperature rise during thermal ablation with high intensity focused ultrasound (FUS). Conven...
- GSPN-2: Efficient Parallel Sequence Modeling : Abstract: Efficient vision transformer remains a bottleneck for high-resolution images and long-video related real-world applications. Generalized Spatial Propagation Network (GSPN) addresses this by ...
- ByteStorm: a multi-step data-driven approach for Tropical Cyclones detection and tracking : Abstract: Accurate tropical cyclones (TCs) tracking represents a critical challenge in the context of weather and climate science. Traditional tracking schemes mainly rely on subjective thresholds, wh...
- Functional Random Forest with Adaptive Cost-Sensitive Splitting for Imbalanced Functional Data Classification : Abstract: Classification of functional data where observations are curves or trajectories poses unique challenges, particularly under severe class imbalance. Traditional Random Forest algorithms, whil...
- MARINE: Theoretical Optimization and Design for Multi-Agent Recursive IN-context Enhancement : Abstract: Large Language Model (LLM)-based agents demonstrate advanced reasoning capabilities, yet practical constraints frequently limit outputs to single responses, leaving significant performance p...
- The Theory of Strategic Evolution: Games with Endogenous Players and Strategic Replicators : Abstract: This paper develops the Theory of Strategic Evolution, a general model for systems in which the population of players, strategies, and institutional rules evolve together. The theory extends...
- Harmonizing Community Science Datasets to Model Highly Pathogenic Avian Influenza (HPAI) in Birds in the Subantarctic : Abstract: Community science observational datasets are useful in epidemiology and ecology for modeling species distributions, but the heterogeneous nature of the data presents significant challenges f...
- CFD-copilot: leveraging domain-adapted large language model and model context protocol to enhance simulation automation : Abstract: Configuring computational fluid dynamics (CFD) simulations requires significant expertise in physics modeling and numerical methods, posing a barrier to non-specialists. Although automating ...
- DeepCode: Open Agentic Coding : Abstract: Recent advances in large language models (LLMs) have given rise to powerful coding agents, making it possible for code assistants to evolve into code engineers. However, existing methods sti...
- Near-real time fires detection using satellite imagery in Sudan conflict : Abstract: The challenges of ongoing war in Sudan highlight the need for rapid monitoring and analysis of such conflicts. Advances in deep learning and readily available satellite remote sensing imager...
- An Empirical Framework for Evaluating Semantic Preservation Using Hugging Face : Abstract: As machine learning (ML) becomes an integral part of high-autonomy systems, it is critical to ensure the trustworthiness of learning-enabled software systems (LESS). Yet, the nondeterministi...
- Restrictive Hierarchical Semantic Segmentation for Stratified Tooth Layer Detection : Abstract: Accurate understanding of anatomical structures is essential for reliably staging certain dental diseases. A way of introducing this within semantic segmentation models is by utilising hiera...
- A Gray Literature Study on Fairness Requirements in AI-enabled Software Engineering : Abstract: Today, with the growing obsession with applying Artificial Intelligence (AI), particularly Machine Learning (ML), to software across various contexts, much of the focus has been on the effec...
- FRIEDA: Benchmarking Multi-Step Cartographic Reasoning in Vision-Language Models : Abstract: Cartographic reasoning is the skill of interpreting geographic relationships by aligning legends, map scales, compass directions, map texts, and geometries across one or more map images. Alt...
- Joint Activity Design Heuristics for Enhancing Human-Machine Collaboration : Abstract: Joint activity describes when more than one agent (human or machine) contributes to the completion of a task or activity. Designing for joint activity focuses on explicitly supporting the in...
- Short-Context Dominance: How Much Local Context Natural Language Actually Needs? : Abstract: We investigate the short-context dominance hypothesis: that for most sequences, a small local prefix suffices to predict their next tokens. Using large language models as statistical oracles...
- Training LLMs for Honesty via Confessions : Abstract: Large language models (LLMs) can be dishonest when reporting on their actions and beliefs -- for example, they may overstate their confidence in factual claims or cover up evidence of covert...
- Scalable Offline Model-Based RL with Action Chunks : Abstract: In this paper, we study whether model-based reinforcement learning (RL), in particular model-based value expansion, can provide a scalable recipe for tackling complex, long-horizon tasks in ...
- Balanced Accuracy: The Right Metric for Evaluating LLM Judges - Explained through Youden's J statistic : Abstract: Rigorous evaluation of large language models (LLMs) relies on comparing models by the prevalence of desirable or undesirable behaviors, such as task pass rates or policy violations. These pr...
- Long-only cryptocurrency portfolio management by ranking the assets: a neural network approach : Abstract: This paper will propose a novel machine learning based portfolio management method in the context of the cryptocurrency market. Previous researchers mainly focus on the prediction of the mov...
- Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models I: The Task-Query Architecture : Abstract: Both model developers and policymakers seek to quantify and mitigate the risk of rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to fa...
- Chat with UAV -- Human-UAV Interaction Based on Large Language Models : Abstract: The future of UAV interaction systems is evolving from engineer-driven to user-driven, aiming to replace traditional predefined Human-UAV Interaction designs. This shift focuses on enabling ...
- TreeGRPO: Tree-Advantage GRPO for Online RL Post-Training of Diffusion Models : Abstract: Reinforcement learning (RL) post-training is crucial for aligning generative models with human preferences, but its prohibitive computational cost remains a major barrier to widespread adopt...
- LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks : Abstract: In our prior work, LayerPipe, we had introduced an approach to accelerate training of convolutional, fully connected, and spiking neural networks by overlapping forward and backward computat...
- Information-Dense Reasoning for Efficient and Auditable Security Alert Triage : Abstract: Security Operations Centers face massive, heterogeneous alert streams under minute-level service windows, creating the Alert Triage Latency Paradox: verbose reasoning chains ensure accuracy ...
- A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties : Abstract: Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse a...
- Embodied Tree of Thoughts: Deliberate Manipulation Planning with Embodied World Model : Abstract: World models have emerged as a pivotal component in robot manipulation planning, enabling agents to predict future environmental states and reason about the consequences of actions before ex...
- ClinicalTrialsHub: Bridging Registries and Literature for Comprehensive Clinical Trial Access : Abstract: We present ClinicalTrialsHub, an interactive search-focused platform that consolidates all data from ClinicalTrials.gov and augments it by automatically extracting and structuring trial-rele...
- PR-CapsNet: Pseudo-Riemannian Capsule Network with Adaptive Curvature Routing for Graph Learning : Abstract: Capsule Networks (CapsNets) show exceptional graph representation capacity via dynamic routing and vectorized hierarchical representations, but they model the complex geometries of real\-wor...
- MM-CoT:A Benchmark for Probing Visual Chain-of-Thought Reasoning in Multimodal Models : Abstract: The ability to perform Chain-of-Thought (CoT) reasoning marks a major milestone for multimodal models (MMs), enabling them to solve complex visual reasoning problems. Yet a critical question...
- SpeechQualityLLM: LLM-Based Multimodal Assessment of Speech Quality : Abstract: Objective speech quality assessment is central to telephony, VoIP, and streaming systems, where large volumes of degraded audio must be monitored and optimized at scale. Classical metrics su...
- HybridToken-VLM: Hybrid Token Compression for Vision-Language Models : Abstract: Vision-language models (VLMs) have transformed multimodal reasoning, but feeding hundreds of visual patch tokens into LLMs incurs quadratic computational costs, straining memory and context ...
- Residual-SwinCA-Net: A Channel-Aware Integrated Residual CNN-Swin Transformer for Malignant Lesion Segmentation in BUSI : Abstract: A novel deep hybrid Residual-SwinCA-Net segmentation framework is proposed in the study for addressing such challenges by extracting locally correlated and robust features, incorporating res...
- Distilling Future Temporal Knowledge with Masked Feature Reconstruction for 3D Object Detection : Abstract: Camera-based temporal 3D object detection has shown impressive results in autonomous driving, with offline models improving accuracy by using future frames. Knowledge distillation (KD) can b...
- Model-Based Diffusion Sampling for Predictive Control in Offline Decision Making : Abstract: Offline decision-making requires synthesizing reliable behaviors from fixed datasets without further interaction, yet existing generative approaches often yield trajectories that are dynamic...
- Empowering smart app development with SolidGPT: an edge-cloud hybrid AI agent framework : Abstract: The integration of Large Language Models (LLMs) into mobile and software development workflows faces a persistent tension among three demands: semantic awareness, developer productivity, and...
- Systematization of Knowledge: Security and Safety in the Model Context Protocol Ecosystem : Abstract: The Model Context Protocol (MCP) has emerged as the de facto standard for connecting Large Language Models (LLMs) to external data and tools, effectively functioning as the "USB-C for Agenti...
- Terrain Diffusion: A Diffusion-Based Successor to Perlin Noise in Infinite, Real-Time Terrain Generation : Abstract: For decades, procedural worlds have been built on procedural noise functions such as Perlin noise, which are fast and infinite, yet fundamentally limited in realism and large-scale coherence...
- GeoDM: Geometry-aware Distribution Matching for Dataset Distillation : Abstract: Dataset distillation aims to synthesize a compact subset of the original data, enabling models trained on it to achieve performance comparable to those trained on the original large dataset....
- Argus: A Multi-Agent Sensitive Information Leakage Detection Framework Based on Hierarchical Reference Relationships : Abstract: Sensitive information leakage in code repositories has emerged as a critical security challenge. Traditional detection methods that rely on regular expressions, fingerprint features, and hig...
- Interpreting Structured Perturbations in Image Protection Methods for Diffusion Models : Abstract: Recent image protection mechanisms such as Glaze and Nightshade introduce imperceptible, adversarially designed perturbations intended to disrupt downstream text-to-image generative models. ...
- Robust Finetuning of Vision-Language-Action Robot Policies via Parameter Merging : Abstract: Generalist robot policies, trained on large and diverse datasets, have demonstrated the ability to generalize across a wide spectrum of behaviors, enabling a single policy to act in varied r...
- Conditional Morphogenesis: Emergent Generation of Structural Digits via Neural Cellular Automata : Abstract: Biological systems exhibit remarkable morphogenetic plasticity, where a single genome can encode various specialized cellular structures triggered by local chemical signals. In the domain of...
- Are generative AI text annotations systematically biased? : Abstract: This paper investigates bias in GLLM annotations by conceptually replicating manual annotations of Boukes (2024). Using various GLLMs (Llama3.1:8b, Llama3.3:70b, GPT4o, Qwen2.5:72b) in combi...
- Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models II: Benchmark Generation Process : Abstract: The potential for rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons has gener...
- Biothreat Benchmark Generation Framework for Evaluating Frontier AI Models III: Implementing the Bacterial Biothreat Benchmark (B3) Dataset : Abstract: The potential for rapidly-evolving frontier artificial intelligence (AI) models, especially large language models (LLMs), to facilitate bioterrorism or access to biological weapons has gener...
- ContextDrag: Precise Drag-Based Image Editing via Context-Preserving Token Injection and Position-Consistent Attention : Abstract: Drag-based image editing aims to modify visual content followed by user-specified drag operations. Despite existing methods having made notable progress, they still fail to fully exploit the...
- Visionary: The World Model Carrier Built on WebGPU-Powered Gaussian Splatting Platform : Abstract: Neural rendering, particularly 3D Gaussian Splatting (3DGS), has evolved rapidly and become a key component for building world models. However, existing viewer solutions remain fragmented, h...
- LLM-based Vulnerable Code Augmentation: Generate or Refactor? : Abstract: Vulnerability code-bases often suffer from severe imbalance, limiting the effectiveness of Deep Learning-based vulnerability classifiers. Data Augmentation could help solve this by mitigatin...
- Developing Distance-Aware Uncertainty Quantification Methods in Physics-Guided Neural Networks for Reliable Bearing Health Prediction : Abstract: Accurate and uncertainty-aware degradation estimation is essential for predictive maintenance in safety-critical systems like rotating machinery with rolling-element bearings. Many existing ...
- Disrupting Hierarchical Reasoning: Adversarial Protection for Geographic Privacy in Multimodal Reasoning Models : Abstract: Multi-modal large reasoning models (MLRMs) pose significant privacy risks by inferring precise geographic locations from personal images through hierarchical chain-of-thought reasoning. Exis...
- SensHRPS: Sensing Comfortable Human-Robot Proxemics and Personal Space With Eye-Tracking : Abstract: Social robots must adjust to human proxemic norms to ensure user comfort and engagement. While prior research demonstrates that eye-tracking features reliably estimate comfort in human-human...
- A Novel Wasserstein Quaternion Generative Adversarial Network for Color Image Generation : Abstract: Color image generation has a wide range of applications, but the existing generation models ignore the correlation among color channels, which may lead to chromatic aberration problems. In a...
- Curriculum Guided Massive Multi Agent System Solving For Robust Long Horizon Tasks : Abstract: Large Language Models and multi-agent systems have shown promise in decomposing complex tasks, yet they struggle with long-horizon reasoning tasks and escalating computation cost. This work ...
- Bridging Scale Discrepancies in Robotic Control via Language-Based Action Representations : Abstract: Recent end-to-end robotic manipulation research increasingly adopts architectures inspired by large language models to enable robust manipulation. However, a critical challenge arises from s...
- A Hybrid Model for Stock Market Forecasting: Integrating News Sentiment and Time Series Data with Graph Neural Networks : Abstract: Stock market prediction is a long-standing challenge in finance, as accurate forecasts support informed investment decisions. Traditional models rely mainly on historical prices, but recent ...
- Disturbance-Free Surgical Video Generation from Multi-Camera Shadowless Lamps for Open Surgery : Abstract: Video recordings of open surgeries are greatly required for education and research purposes. However, capturing unobstructed videos is challenging since surgeons frequently block the camera ...
- Mind to Hand: Purposeful Robotic Control via Embodied Reasoning : Abstract: Humans act with context and intention, with reasoning playing a central role. While internet-scale data has enabled broad reasoning capabilities in AI systems, grounding these abilities in p...
- Examining Student Interactions with a Pedagogical AI-Assistant for Essay Writing and their Impact on Students Writing Quality : Abstract: The dynamic nature of interactions between students and GenAI, as well as their relationship to writing quality, remains underexplored. While most research has examined how general-purpose G...
- Decoupling Template Bias in CLIP: Harnessing Empty Prompts for Enhanced Few-Shot Learning : Abstract: The Contrastive Language-Image Pre-Training (CLIP) model excels in few-shot learning by aligning visual and textual representations. Our study shows that template-sample similarity (TSS), de...
- Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning : Abstract: Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard vi...
- Reusability in MLOps: Leveraging Ports and Adapters to Build a Microservices Architecture for the Maritime Domain : Abstract: ML-Enabled Systems (MLES) are inherently complex since they require multiple components to achieve their business goal. This experience report showcases the software architecture reusability...
- Automatic Essay Scoring and Feedback Generation in Basque Language Learning : Abstract: This paper introduces the first publicly available dataset for Automatic Essay Scoring (AES) and feedback generation in Basque, targeting the CEFR C1 proficiency level. The dataset comprises...
- Multi-domain performance analysis with scores tailored to user preferences : Abstract: The performance of algorithms, methods, and models tends to depend heavily on the distribution of cases on which they are applied, this distribution being specific to the applicative domain....
- Mitigating Individual Skin Tone Bias in Skin Lesion Classification through Distribution-Aware Reweighting : Abstract: Skin color has historically been a focal point of discrimination, yet fairness research in machine learning for medical imaging often relies on coarse subgroup categories, overlooking indivi...
- Data-Driven Dynamic Parameter Learning of manipulator robots : Abstract: Bridging the sim-to-real gap remains a fundamental challenge in robotics, as accurate dynamic parameter estimation is essential for reliable model-based control, realistic simulation, and sa...
- Refining Visual Artifacts in Diffusion Models via Explainable AI-based Flaw Activation Maps : Abstract: Diffusion models have achieved remarkable success in image synthesis. However, addressing artifacts and unrealistic regions remains a critical challenge. We propose self-refining diffusion, ...
- Fluent Alignment with Disfluent Judges: Post-training for Lower-resource Languages : Abstract: We propose a post-training method for lower-resource languages that preserves fluency of language models even when aligned by disfluent reward models. Preference-optimization is now a well-r...
- A Systematic Evaluation of Preference Aggregation in Federated RLHF for Pluralistic Alignment of LLMs : Abstract: This paper addresses the challenge of aligning large language models (LLMs) with diverse human preferences within federated learning (FL) environments, where standard methods often fail to a...
- MatteViT: High-Frequency-Aware Document Shadow Removal with Shadow Matte Guidance : Abstract: Document shadow removal is essential for enhancing the clarity of digitized documents. Preserving high-frequency details (e.g., text edges and lines) is critical in this process because shad...
- Can TabPFN Compete with GNNs for Node Classification via Graph Tabularization? : Abstract: Foundation models pretrained on large data have demonstrated remarkable zero-shot generalization capabilities across domains. Building on the success of TabPFN for tabular data and its recen...
- Democratizing ML for Enterprise Security: A Self-Sustained Attack Detection Framework : Abstract: Despite advancements in machine learning for security, rule-based detection remains prevalent in Security Operations Centers due to the resource intensiveness and skill gap associated with M...
- PrivTune: Efficient and Privacy-Preserving Fine-Tuning of Large Language Models via Device-Cloud Collaboration : Abstract: With the rise of large language models, service providers offer language models as a service, enabling users to fine-tune customized models via uploaded private datasets. However, this raise...
- Multicalibration for LLM-based Code Generation : Abstract: As AI-based code generation becomes widespread, researchers are investigating the calibration of code LLMs - ensuring their confidence scores faithfully represent the true likelihood of code...
- Emovectors: assessing emotional content in jazz improvisations for creativity evaluation : Abstract: Music improvisation is fascinating to study, being essentially a live demonstration of a creative process. In jazz, musicians often improvise across predefined chord progressions (leadsheets...
- Do Depth-Grown Models Overcome the Curse of Depth? An In-Depth Analysis : Abstract: Gradually growing the depth of Transformers during training can not only reduce training cost but also lead to improved reasoning performance, as shown by MIDAS (Saunshi et al., 2024). Thus ...
- Training-Free Dual Hyperbolic Adapters for Better Cross-Modal Reasoning : Abstract: Recent research in Vision-Language Models (VLMs) has significantly advanced our capabilities in cross-modal reasoning. However, existing methods suffer from performance degradation with doma...
- Impact of Data-Oriented and Object-Oriented Design on Performance and Cache Utilization with Artificial Intelligence Algorithms in Multi-Threaded CPUs : Abstract: The growing performance gap between multi-core CPUs and main memory necessitates hardware-aware software design paradigms. This study provides a comprehensive performance analysis of Data Or...
- Can AI autonomously build, operate, and use the entire data stack? : Abstract: Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific p...
- SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models : Abstract: Large reasoning models (LRMs) often cost significant key-value (KV) cache overhead, due to their linear growth with the verbose chain-of-thought (CoT) reasoning process. This costs both memo...
- Toward an AI Reasoning-Enabled System for Patient-Clinical Trial Matching : Abstract: Screening patients for clinical trial eligibility remains a manual, time-consuming, and resource-intensive process. We present a secure, scalable proof-of-concept system for Artificial Intel...
- Large Language Models for Education and Research: An Empirical and User Survey-based Analysis : Abstract: Pretrained Large Language Models (LLMs) have achieved remarkable success across diverse domains, with education and research emerging as particularly impactful areas. Among current state-of-...
- Scalable Back-End for an AI-Based Diabetes Prediction Application : Abstract: The rising global prevalence of diabetes necessitates early detection to prevent severe complications. While AI-powered prediction applications offer a promising solution, they require a res...
- Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions : Abstract: Learning about the causal structure of the world is a fundamental problem for human cognition. Causal models and especially causal learning have proved to be difficult for large pretrained m...
- Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes : Abstract: Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patien...
- Reasoning Models Ace the CFA Exams : Abstract: Previous research has reported that large language models (LLMs) demonstrate poor performance on the Chartered Financial Analyst (CFA) exams. However, recent reasoning models have achieved s...
- AgentEval: Generative Agents as Reliable Proxies for Human Evaluation of AI-Generated Content : Abstract: Modern businesses are increasingly challenged by the time and expense required to generate and assess high-quality content. Human writers face time constraints, and extrinsic evaluations can...
- Towards a Science of Scaling Agent Systems : Abstract: Agents, language model (LM)-based systems that are capable of reasoning, planning, and acting are becoming the dominant paradigm for real-world AI applications. Despite this widespread adopt...
- rSIM: Incentivizing Reasoning Capabilities of LLMs via Reinforced Strategy Injection : Abstract: Large language models (LLMs) are post-trained through reinforcement learning (RL) to evolve into Reasoning Language Models (RLMs), where the hallmark of this advanced reasoning is ``aha'' mo...
- Predicting California Bearing Ratio with Ensemble and Neural Network Models: A Case Study from T\"urkiye : Abstract: The California Bearing Ratio (CBR) is a key geotechnical indicator used to assess the load-bearing capacity of subgrade soils, especially in transportation infrastructure and foundation desi...
- Soil Compaction Parameters Prediction Based on Automated Machine Learning Approach : Abstract: Soil compaction is critical in construction engineering to ensure the stability of structures like road embankments and earth dams. Traditional methods for determining optimum moisture conte...
- Enhancing Explainability of Graph Neural Networks Through Conceptual and Structural Analyses and Their Extensions : Abstract: Graph Neural Networks (GNNs) have become a powerful tool for modeling and analyzing data with graph structures. The wide adoption in numerous applications underscores the value of these mode...
- The High Cost of Incivility: Quantifying Interaction Inefficiency via Multi-Agent Monte Carlo Simulations : Abstract: Workplace toxicity is widely recognized as detrimental to organizational culture, yet quantifying its direct impact on operational efficiency remains methodologically challenging due to the ...
- Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making : Abstract: Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning, leading to brittleness, poor generalization, and high computational overhead. Inspire...
- DeepFeature: Iterative Context-aware Feature Generation for Wearable Biosignals : Abstract: Biosignals collected from wearable devices are widely utilized in healthcare applications. Machine learning models used in these applications often rely on features extracted from biosignals...
- Prismatic World Model: Learning Compositional Dynamics for Planning in Hybrid Systems : Abstract: Model-based planning in robotic domains is fundamentally challenged by the hybrid nature of physical dynamics, where continuous motion is punctuated by discrete events such as contacts and i...
- From Accuracy to Impact: The Impact-Driven AI Framework (IDAIF) for Aligning Engineering Architecture with Theory of Change : Abstract: This paper introduces the Impact-Driven AI Framework (IDAIF), a novel architectural methodology that integrates Theory of Change (ToC) principles with modern artificial intelligence system d...
- Using reinforcement learning to probe the role of feedback in skill acquisition : Abstract: Many high-performance human activities are executed with little or no external feedback: think of a figure skater landing a triple jump, a pitcher throwing a curveball for a strike, or a bar...
- Autonomous Issue Resolver: Towards Zero-Touch Code Maintenance : Abstract: Recent advances in Large Language Models have revolutionized function-level code generation; however, repository-scale Automated Program Repair (APR) remains a significant challenge. Current...
- A Lightweight Transfer Learning-Based State-of-Health Monitoring with Application to Lithium-ion Batteries in Unmanned Air Vehicles : Abstract: Accurate and rapid state-of-health (SOH) monitoring plays an important role in indicating energy information for lithium-ion battery-powered portable mobile devices. To confront their variab...
- Principles2Plan: LLM-Guided System for Operationalising Ethical Principles into Plans : Abstract: Ethical awareness is critical for robots operating in human environments, yet existing automated planning tools provide little support. Manually specifying ethical rules is labour-intensive ...
- The SMART+ Framework for AI Systems : Abstract: Artificial Intelligence (AI) systems are now an integral part of multiple industries. In clinical research, AI supports automated adverse event detection in clinical trials, patient eligibil...
- CogMCTS: A Novel Cognitive-Guided Monte Carlo Tree Search Framework for Iterative Heuristic Evolution with Large Language Models : Abstract: Automatic Heuristic Design (AHD) is an effective1 framework for solving complex optimization prob-2 lems. The development of large language mod-3 els (LLMs) enables the automated generation ...
- Protein Secondary Structure Prediction Using Transformers : Abstract: Predicting protein secondary structures such as alpha helices, beta sheets, and coils from amino acid sequences is essential for understanding protein function. This work presents a transfor...
- See-Control: A Multimodal Agent Framework for Smartphone Interaction with a Robotic Arm : Abstract: Recent advances in Multimodal Large Language Models (MLLMs) have enabled their use as intelligent agents for smartphone operation. However, existing methods depend on the Android Debug Bridg...
- Multi-Agent Intelligence for Multidisciplinary Decision-Making in Gastrointestinal Oncology : Abstract: Multimodal clinical reasoning in the field of gastrointestinal (GI) oncology necessitates the integrated interpretation of endoscopic imagery, radiological data, and biochemical markers. Des...
- Deconstructing the Dual Black Box:A Plug-and-Play Cognitive Framework for Human-AI Collaborative Enhancement and Its Implications for AI Governance : Abstract: Currently, there exists a fundamental divide between the "cognitive black box" (implicit intuition) of human experts and the "computational black box" (untrustworthy decision-making) of arti...
- Towards Foundation Models with Native Multi-Agent Intelligence : Abstract: Foundation models (FMs) are increasingly assuming the role of the "brain" of AI agents. While recent efforts have begun to equip FMs with native single-agent abilities -- such as GUI interac...
- Performance Comparison of Aerial RIS and STAR-RIS in 3D Wireless Environments : Abstract: Reconfigurable intelligent surface (RIS) and simultaneously transmitting and reflecting RIS (STAR-RIS) have emerged as key enablers for enhancing wireless coverage and capacity in next-gener...
- A Practical Guide for Designing, Developing, and Deploying Production-Grade Agentic AI Workflows : Abstract: Agentic AI marks a major shift in how autonomous systems reason, plan, and execute multi-step tasks. Unlike traditional single model prompting, agentic workflows integrate multiple specializ...
- CARLoS: Retrieval via Concise Assessment Representation of LoRAs at Scale : Abstract: The rapid proliferation of generative components, such as LoRAs, has created a vast but unstructured ecosystem. Existing discovery methods depend on unreliable user descriptions or biased po...
- Interpolation in Knowledge Representation : Abstract: Craig interpolation and uniform interpolation have many applications in knowledge representation, including explainability, forgetting, modularization and reuse, and even learning. At the sa...
- EcomBench: Towards Holistic Evaluation of Foundation Agents in E-commerce : Abstract: Foundation agents have rapidly advanced in their ability to reason and interact with real environments, making the evaluation of their core capabilities increasingly important. While many be...
- Same Content, Different Answers: Cross-Modal Inconsistency in MLLMs : Abstract: We introduce two new benchmarks REST and REST+(Render-Equivalence Stress Tests) to enable systematic evaluation of cross-modal inconsistency in multimodal large language models (MLLMs). MLLM...
- Automating High Energy Physics Data Analysis with LLM-Powered Agents : Abstract: We present a proof-of-principle study demonstrating the use of large language model (LLM) agents to automate a representative high energy physics (HEP) analysis. Using the Higgs boson diphot...
- ThreadWeaver: Adaptive Threading for Efficient Parallel Reasoning in Language Models : Abstract: Scaling inference-time computation has enabled Large Language Models (LLMs) to achieve strong reasoning performance, but inherently sequential decoding leads to substantial latency, especial...
- Space Alignment Matters: The Missing Piece for Inducing Neural Collapse in Long-Tailed Learning : Abstract: Recent studies on Neural Collapse (NC) reveal that, under class-balanced conditions, the class feature means and classifier weights spontaneously align into a simplex equiangular tight frame...
- AudioScene: Integrating Object-Event Audio into 3D Scenes : Abstract: The rapid advances in audio analysis underscore its vast potential for humancomputer interaction, environmental monitoring, and public safety; yet, existing audioonly datasets often lack spa...
- MixLM: High-Throughput and Effective LLM Ranking via Text-Embedding Mix-Interaction : Abstract: Large language models (LLMs) excel at capturing semantic nuances and therefore show impressive relevance ranking performance in modern recommendation and search systems. However, they suffer...
- SABER: Small Actions, Big Errors - Safeguarding Mutating Steps in LLM Agents : Abstract: Despite rapid progress in LLM agents, performance on long-horizon, tool-using tasks remains fragile. To better understand this fragility, we ask a simple question: \emph{do all actions contr...
- GPU Memory Prediction for Multimodal Model Training : Abstract: As deep learning models in agentic AI systems grow in scale and complexity, GPU memory requirements increase and often exceed the available GPU memory capacity, so that out-of-memory (OoM) e...
Research Sources: 363 | Generated: 12/10/2025
