AI Research News Feeds for November 24th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

ChainV: Atomic Visual Hints Make Multimodal Reasoning Shorter and Better : Abstract: Recent advances in multimodal reasoning models have demonstrated impressive capabilities across text and vision. However, even leading models exhibit redundant self-reflection when generatin...
PEGS: Physics-Event Enhanced Large Spatiotemporal Motion Reconstruction via 3D Gaussian Splatting : Abstract: Reconstruction of rigid motion over large spatiotemporal scales remains a challenging task due to limitations in modeling paradigms, severe motion blur, and insufficient physical consistency...
Off the Planckian Locus: Using 2D Chromaticity to Improve In-Camera Color : Abstract: Traditional in-camera colorimetric mapping relies on correlated color temperature (CCT)-based interpolation between pre-calibrated transforms optimized for Planckian illuminants such as CIE ...
A Multi-Stage Optimization Framework for Deploying Learned Image Compression on FPGAs : Abstract: Deep learning-based image compression (LIC) has achieved state-of-the-art rate-distortion (RD) performance, yet deploying these models on resource-constrained FPGAs remains a major challenge...
One-Step Diffusion Transformer for Controllable Real-World Image Super-Resolution : Abstract: Recent advances in diffusion-based real-world image super-resolution (Real-ISR) have demonstrated remarkable perceptual quality, yet the balance between fidelity and controllability remains ...
Learning to Look Closer: A New Instance-Wise Loss for Small Cerebral Lesion Segmentation : Abstract: Traditional loss functions in medical image segmentation, such as Dice, often under-segment small lesions because their small relative volume contributes negligibly to the overall loss. To a...
DiffRefiner: Coarse to Fine Trajectory Planning via Diffusion Refinement with Semantic Interaction for End to End Autonomous Driving : Abstract: Unlike discriminative approaches in autonomous driving that predict a fixed set of candidate trajectories of the ego vehicle, generative methods, such as diffusion models, learn the underlyi...
UI-Styler: Ultrasound Image Style Transfer with Class-Aware Prompts for Cross-Device Diagnosis Using a Frozen Black-Box Inference Network : Abstract: The appearance of ultrasound images varies across acquisition devices, causing domain shifts that degrade the performance of fixed black-box downstream inference models when reused. To mitig...
Navigating in the Dark: A Multimodal Framework and Dataset for Nighttime Traffic Sign Recognition : Abstract: Traffic signboards are vital for road safety and intelligent transportation systems, enabling navigation and autonomous driving. Yet, recognizing traffic signs at night remains challenging d...
PostCam: Camera-Controllable Novel-View Video Generation with Query-Shared Cross-Attention : Abstract: We propose PostCam, a framework for novel-view video generation that enables post-capture editing of camera trajectories in dynamic scenes. We find that existing video recapture methods suff...
Real Noise Decoupling for Hyperspectral Image Denoising : Abstract: Hyperspectral image (HSI) denoising is a crucial step in enhancing the quality of HSIs. Noise modeling methods can fit noise distributions to generate synthetic HSIs to train denoising netwo...
VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation : Abstract: Vision-language-action (VLA) models show potential for general robotic tasks, but remain challenging in spatiotemporally coherent manipulation, which requires fine-grained representations. T...
Continual Alignment for SAM: Rethinking Foundation Models for Medical Image Segmentation in Continual Learning : Abstract: In medical image segmentation, heterogeneous privacy policies across institutions often make joint training on pooled datasets infeasible, motivating continual image segmentation-learning fr...
SING3R-SLAM: Submap-based Indoor Monocular Gaussian SLAM with 3D Reconstruction Priors : Abstract: Recent advances in dense 3D reconstruction enable the accurate capture of local geometry; however, integrating them into SLAM is challenging due to drift and redundant point maps, which limi...
Scaling Self-Supervised and Cross-Modal Pretraining for Volumetric CT Transformers : Abstract: We introduce SPECTRE, a fully transformer-based foundation model for volumetric computed tomography (CT). Our Self-Supervised & Cross-Modal Pretraining for CT Representation Extraction (SPEC...
FisheyeGaussianLift: BEV Feature Lifting for Surround-View Fisheye Camera Perception : Abstract: Accurate BEV semantic segmentation from fisheye imagery remains challenging due to extreme non-linear distortion, occlusion, and depth ambiguity inherent to wide-angle projections. We presen...
Dual-domain Adaptation Networks for Realistic Image Super-resolution : Abstract: Realistic image super-resolution (SR) focuses on transforming real-world low-resolution (LR) images into high-resolution (HR) ones, handling more complex degradation patterns than synthetic ...
QueryOcc: Query-based Self-Supervision for 3D Semantic Occupancy : Abstract: Learning 3D scene geometry and semantics from images is a core challenge in computer vision and a key capability for autonomous driving. Since large-scale 3D annotation is prohibitively expe...
Blind Deconvolution for Color Images Using Normalized Quaternion Kernels : Abstract: In this work, we address the challenging problem of blind deconvolution for color images. Existing methods often convert color images to grayscale or process each color channel separately, w...
A Little More Like This: Text-to-Image Retrieval with Vision-Language Models Using Relevance Feedback : Abstract: Large vision-language models (VLMs) enable intuitive visual search using natural language queries. However, improving their performance often requires fine-tuning and scaling to larger model...
MolSight: Optical Chemical Structure Recognition with SMILES Pretraining, Multi-Granularity Learning and Reinforcement Learning : Abstract: Optical Chemical Structure Recognition (OCSR) plays a pivotal role in modern chemical informatics, enabling the automated conversion of chemical structure images from scientific literature, ...
BiFingerPose: Bimodal Finger Pose Estimation for Touch Devices : Abstract: Finger pose offers promising opportunities to expand human computer interaction capability of touchscreen devices. Existing finger pose estimation algorithms that can be implemented in porta...
SpatialGeo:Boosting Spatial Reasoning in Multimodal LLMs via Geometry-Semantics Fusion : Abstract: Multimodal large language models (MLLMs) have achieved significant progress in image and language tasks due to the strong reasoning capability of large language models (LLMs). Nevertheless, ...
NoPe-NeRF++: Local-to-Global Optimization of NeRF with No Pose Prior : Abstract: In this paper, we introduce NoPe-NeRF++, a novel local-to-global optimization algorithm for training Neural Radiance Fields (NeRF) without requiring pose priors. Existing methods, particular...
Refracting Reality: Generating Images with Realistic Transparent Objects : Abstract: Generative image models can produce convincingly real images, with plausible shapes, textures, layouts and lighting. However, one domain in which they perform notably poorly is in the synthe...
Loomis Painter: Reconstructing the Painting Process : Abstract: Step-by-step painting tutorials are vital for learning artistic techniques, but existing video resources (e.g., YouTube) lack interactivity and personalization. While recent generative model...
Label-Efficient Skeleton-based Recognition with Stable-Invertible Graph Convolutional Networks : Abstract: Skeleton-based action recognition is a hotspot in image processing. A key challenge of this task lies in its dependence on large, manually labeled datasets whose acquisition is costly and ti...
DSeq-JEPA: Discriminative Sequential Joint-Embedding Predictive Architecture : Abstract: Image-based Joint-Embedding Predictive Architecture (I-JEPA) learns visual representations by predicting latent embeddings of masked regions from visible context. However, it treats all regi...
UAM: A Unified Attention-Mamba Backbone of Multimodal Framework for Tumor Cell Classification : Abstract: Cell-level radiomics features provide fine-grained insights into tumor phenotypes and have the potential to significantly enhance diagnostic accuracy on hematoxylin and eosin (H&E) images. B...
SuperQuadricOcc: Multi-Layer Gaussian Approximation of Superquadrics for Real-Time Self-Supervised Occupancy Estimation : Abstract: Semantic occupancy estimation enables comprehensive scene understanding for automated driving, providing dense spatial and semantic information essential for perception and planning. While G...
ATAC: Augmentation-Based Test-Time Adversarial Correction for CLIP : Abstract: Despite its remarkable success in zero-shot image-text matching, CLIP remains highly vulnerable to adversarial perturbations on images. As adversarial fine-tuning is prohibitively costly, re...
SVRecon: Sparse Voxel Rasterization for Surface Reconstruction : Abstract: We extend the recently proposed sparse voxel rasterization paradigm to the task of high-fidelity surface reconstruction by integrating Signed Distance Function (SDF), named SVRecon. Unlike 3...
MorphSeek: Fine-grained Latent Representation-Level Policy Optimization for Deformable Image Registration : Abstract: Deformable image registration (DIR) remains a fundamental yet challenging problem in medical image analysis, largely due to the prohibitively high-dimensional deformation space of dense disp...
MCMoE: Completing Missing Modalities with Mixture of Experts for Incomplete Multimodal Action Quality Assessment : Abstract: Multimodal Action Quality Assessment (AQA) has recently emerged as a promising paradigm. By leveraging complementary information across shared contextual cues, it enhances the discriminative...
MMT-ARD: Multimodal Multi-Teacher Adversarial Distillation for Robust Vision-Language Models : Abstract: Vision-Language Models (VLMs) are increasingly deployed in safety-critical applications, making their adversarial robustness a crucial concern. While adversarial knowledge distillation has s...
Illustrator's Depth: Monocular Layer Index Prediction for Image Decomposition : Abstract: We introduce Illustrator's Depth, a novel definition of depth that addresses a key challenge in digital content creation: decomposing flat images into editable, ordered layers. Inspired by a...
Improving Multimodal Distillation for 3D Semantic Segmentation under Domain Shift : Abstract: Semantic segmentation networks trained under full supervision for one type of lidar fail to generalize to unseen lidars without intervention. To reduce the performance gap under domain shift...
GPR-OdomNet: Difference and Similarity-Driven Odometry Estimation Network for Ground Penetrating Radar-Based Localization : Abstract: When performing robot/vehicle localization using ground penetrating radar (GPR) to handle adverse weather and environmental conditions, existing techniques often struggle to accurately estim...
Counterfactual World Models via Digital Twin-conditioned Video Diffusion : Abstract: World models learn to predict the temporal evolution of visual observations given a control signal, potentially enabling agents to reason about environments through forward simulation. Becau...
Radar2Shape: 3D Shape Reconstruction from High-Frequency Radar using Multiresolution Signed Distance Functions : Abstract: Determining the shape of 3D objects from high-frequency radar signals is analytically complex but critical for commercial and aerospace applications. Previous deep learning methods have been...
An Artificial Intelligence Framework for Measuring Human Spine Aging Using MRI : Abstract: The human spine is a complex structure composed of 33 vertebrae. It holds the body and is important for leading a healthy life. The spine is vulnerable to age-related degenerations that can ...
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models : Abstract: Scaling up multimodal models has enabled remarkable advances in visual understanding and reasoning, but practical demands call for smaller, efficient systems. In this work, we conduct a prin...
Video-R4: Reinforcing Text-Rich Video Reasoning with Visual Rumination : Abstract: Understanding text-rich videos requires reading small, transient textual cues that often demand repeated inspection. Yet most video QA models rely on single-pass perception over fixed frames...
EvDiff: High Quality Video with an Event Camera : Abstract: As neuromorphic sensors, event cameras asynchronously record changes in brightness as streams of sparse events with the advantages of high temporal resolution and high dynamic range. Reconst...
Native 3D Editing with Full Attention : Abstract: Instruction-guided 3D editing is a rapidly emerging field with the potential to broaden access to 3D content creation. However, existing methods face critical limitations: optimization-based...
MobileOcc: A Human-Aware Semantic Occupancy Dataset for Mobile Robots : Abstract: Dense 3D semantic occupancy perception is critical for mobile robots operating in pedestrian-rich environments, yet it remains underexplored compared to its application in autonomous driving...
Exploring the added value of pretherapeutic MR descriptors in predicting breast cancer pathologic complete response to neoadjuvant chemotherapy : Abstract: Objectives: To evaluate the association between pretreatment MRI descriptors and breast cancer (BC) pathological complete response (pCR) to neoadjuvant chemotherapy (NAC). Materials \& Metho...
Learning Latent Transmission and Glare Maps for Lens Veiling Glare Removal : Abstract: Beyond the commonly recognized optical aberrations, the imaging performance of compact optical systems-including single-lens and metalens designs-is often further degraded by veiling glare c...
METIS: Multi-Source Egocentric Training for Integrated Dexterous Vision-Language-Action Model : Abstract: Building a generalist robot that can perceive, reason, and act across diverse tasks remains an open challenge, especially for dexterous manipulation. A major bottleneck lies in the scarcity ...
IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation : Abstract: While Visual Large Language Models (VLLMs) show great promise as embodied agents, they continue to face substantial challenges in spatial reasoning. Existing embodied benchmarks largely focu...
Colo-ReID: Discriminative Representation Embedding with Meta-learning for Colonoscopic Polyp Re-Identification : Abstract: Colonoscopic Polyp Re-Identification aims to match the same polyp from a large gallery with images from different views taken using different cameras and plays an important role in the preve...
A statistical method for crack pre-detection in 3D concrete images : Abstract: In practical applications, effectively segmenting cracks in large-scale computed tomography (CT) images holds significant importance for understanding the structural integrity of materials. ...
MindShot: A Few-Shot Brain Decoding Framework via Transferring Cross-Subject Prior and Distilling Frequency Domain Knowledge : Abstract: Aiming to reconstruct visual stimuli from brain signals, brain decoding has recently made significant progress using functional magnetic resonance imaging (fMRI). However, it still has chall...
SpotFormer: Multi-Scale Spatio-Temporal Transformer for Facial Expression Spotting : Abstract: Facial expression spotting, identifying periods where facial expressions occur in a video, is a significant yet challenging task in facial expression analysis. The issues of irrelevant facia...
Minimax Statistical Estimation under Wasserstein Contamination : Abstract: Contaminations are a key concern in modern statistical learning, as small but systematic perturbations of all datapoints can substantially alter estimation results. Here, we study Wasserstei...
(De)-regularized Maximum Mean Discrepancy Gradient Flow : Abstract: We introduce a (de)-regularization of the Maximum Mean Discrepancy (DrMMD) and its Wasserstein gradient flow. Existing gradient flows that transport samples from source distribution to targe...
Reproducibility Report: Test-Time Training on Nearest Neighbors for Large Language Models : Abstract: We reproduce the central claims of Test-Time Training on Nearest Neighbors for Large Language Models (Hardt and Sun, 2024), which proposes adapting a language model at inference time by fine...
How Language Directions Align with Token Geometry in Multilingual LLMs : Abstract: Multilingual LLMs demonstrate strong performance across diverse languages, yet there has been limited systematic analysis of how language information is structured within their internal repr...
NALA_MAINZ at BLP-2025 Task 2: A Multi-agent Approach for Bangla Instruction to Python Code Generation : Abstract: This paper presents JGU Mainz's winning system for the BLP-2025 Shared Task on Code Generation from Bangla Instructions. We propose a multi-agent-based pipeline. First, a code-generation age...
From Representation to Enactment: The ABC Framework of the Translating Mind : Abstract: Building on the Extended Mind (EM) theory and radical enactivism, this article suggests an alternative to representation-based models of the mind. We lay out a novel ABC framework of the tra...
Interpretable dimensions support an effect of agentivity and telicity on split intransitivity : Abstract: Intransitive verbs fall into two different syntactic classes, unergatives and unaccusatives. It has long been argued that verbs describing an agentive action are more likely to appear in an ...
PEPPER: Perception-Guided Perturbation for Robust Backdoor Defense in Text-to-Image Diffusion Models : Abstract: Recent studies show that text to image (T2I) diffusion models are vulnerable to backdoor attacks, where a trigger in the input prompt can steer generation toward harmful or unintended conten...
Improving Latent Reasoning in LLMs via Soft Concept Mixing : Abstract: Unlike human reasoning in abstract conceptual spaces, large language models (LLMs) typically reason by generating discrete tokens, which potentially limit their expressive power. The recent ...
Predicting the Formation of Induction Heads : Abstract: Arguably, specialized attention heads dubbed induction heads (IHs) underlie the remarkable in-context learning (ICL) capabilities of modern language models (LMs); yet, a precise characteriza...
ARQUSUMM: Argument-aware Quantitative Summarization of Online Conversations : Abstract: Online conversations have become more prevalent on public discussion platforms (e.g. Reddit). With growing controversial topics, it is desirable to summarize not only diverse arguments, but ...
Do Vision-Language Models Understand Visual Persuasiveness? : Abstract: Recent advances in vision-language models (VLMs) have enabled impressive multi-modal reasoning and understanding. Yet, whether these models truly grasp visual persuasion-how visual cues shap...
Principled Design of Interpretable Automated Scoring for Large-Scale Educational Assessments : Abstract: AI-driven automated scoring systems offer scalable and efficient means of evaluating complex student-generated responses. Yet, despite increasing demand for transparency and interpretability...
MUCH: A Multilingual Claim Hallucination Benchmark : Abstract: Claim-level Uncertainty Quantification (UQ) is a promising approach to mitigate the lack of reliability in Large Language Models (LLMs). We introduce MUCH, the first claim-level UQ benchmark...
LangMark: A Multilingual Dataset for Automatic Post-Editing : Abstract: Automatic post-editing (APE) aims to correct errors in machine-translated text, enhancing translation quality, while reducing the need for human intervention. Despite advances in neural mach...
AutoLink: Autonomous Schema Exploration and Expansion for Scalable Schema Linking in Text-to-SQL at Scale : Abstract: For industrial-scale text-to-SQL, supplying the entire database schema to Large Language Models (LLMs) is impractical due to context window limits and irrelevant noise. Schema linking, which...
E$^3$-Pruner: Towards Efficient, Economical, and Effective Layer Pruning for Large Language Models : Abstract: With the increasing size of large language models, layer pruning has gained increased attention as a hardware-friendly approach for model compression. However, existing layer pruning methods...
A Simple Yet Strong Baseline for Long-Term Conversational Memory of LLM Agents : Abstract: LLM-based conversational agents still struggle to maintain coherent, personalized interaction over many sessions: fixed context windows limit how much history can be kept in view, and most e...
Social-Media Based Personas Challenge: Hybrid Prediction of Common and Rare User Actions on Bluesky : Abstract: Understanding and predicting user behavior on social media platforms is crucial for content recommendation and platform design. While existing approaches focus primarily on common actions li...
Estonian WinoGrande Dataset: Comparative Analysis of LLM Performance on Human and Machine Translation : Abstract: In this paper, we present a localized and culturally adapted Estonian translation of the test set from the widely used commonsense reasoning benchmark, WinoGrande. We detail the translation ...
Humanlike Multi-user Agent (HUMA): Designing a Deceptively Human AI Facilitator for Group Chats : Abstract: Conversational agents built on large language models (LLMs) are becoming increasingly prevalent, yet most systems are designed for one-on-one, turn-based exchanges rather than natural, async...
A new kid on the block: Distributional semantics predicts the word-specific tone signatures of monosyllabic words in conversational Taiwan Mandarin : Abstract: We present a corpus-based investigation of how the pitch contours of monosyllabic words are realized in spontaneous conversational Mandarin, focusing on the effects of words' meanings. We us...
Don't Learn, Ground: A Case for Natural Language Inference with Visual Grounding : Abstract: We propose a zero-shot method for Natural Language Inference (NLI) that leverages multimodal representations by grounding language in visual contexts. Our approach generates visual represent...
PUCP-Metrix: A Comprehensive Open-Source Repository of Linguistic Metrics for Spanish : Abstract: Linguistic features remain essential for interpretability and tasks involving style, structure, and readability, but existing Spanish tools offer limited coverage. We present PUCP-Metrix, an...
The Shifting Landscape of Vaccine Discourse: Insights From a Decade of Pre- to Post-COVID-19 Vaccine Posts on Social Media : Abstract: In this work, we study English-language vaccine discourse in social media posts, specifically posts on X (formerly Twitter), in seven years before the COVID-19 outbreak (2013 to 2019) and th...
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists : Abstract: With the rapid development of Large Language Models (LLMs), AI agents have demonstrated increasing proficiency in scientific tasks, ranging from hypothesis generation and experimental design...
Vision Language Models are Confused Tourists : Abstract: Although the cultural dimension has been one of the key aspects in evaluating Vision-Language Models (VLMs), their ability to remain stable across diverse cultural inputs remains largely unt...
Cross-cultural value alignment frameworks for responsible AI governance: Evidence from China-West comparative analysis : Abstract: As Large Language Models (LLMs) increasingly influence high-stakes decision-making across global contexts, ensuring their alignment with diverse cultural values has become a critical governa...
Robot Confirmation Generation and Action Planning Using Long-context Q-Former Integrated with Multimodal LLM : Abstract: Human-robot collaboration towards a shared goal requires robots to understand human action and interaction with the surrounding environment. This paper focuses on human-robot interaction (HR...
Resolving Sentiment Discrepancy for Multimodal Sentiment Detection via Semantics Completion and Decomposition : Abstract: With the proliferation of social media posts in recent years, the need to detect sentiments in multimodal (image-text) content has grown rapidly. Since posts are user-generated, the image an...
The persistence of painting styles : Abstract: Art is a deeply personal and expressive medium, where each artist brings their own style, technique, and cultural background into their work. Traditionally, identifying artistic styles has b...
Motion Transfer-Enhanced StyleGAN for Generating Diverse Macaque Facial Expressions : Abstract: Generating animal faces using generative AI techniques is challenging because the available training images are limited both in quantity and variation, particularly for facial expressions ac...
SVG360: Multi-View SVG Generation with Geometric and Color Consistency from a Single SVG : Abstract: Scalable Vector Graphics (SVGs) are central to modern design workflows, offering scaling without distortion and precise editability. However, for single object SVGs, generating multi-view co...
Towards Unified Vision Language Models for Forest Ecological Analysis in Earth Observation : Abstract: Recent progress in vision language models (VLMs) has enabled remarkable perception and reasoning capabilities, yet their potential for scientific regression in Earth Observation (EO) remains...
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models : Abstract: Vision Language Models (VLMs) have achieved impressive performance on spatial reasoning benchmarks, yet these evaluations mask critical weaknesses in understanding object interactions. Curre...
Parts-Mamba: Augmenting Joint Context with Part-Level Scanning for Occluded Human Skeleton : Abstract: Skeleton action recognition involves recognizing human action from human skeletons. The use of graph convolutional networks (GCNs) has driven major advances in this recognition task. In real...
The Joint Gromov Wasserstein Objective for Multiple Object Matching : Abstract: The Gromov-Wasserstein (GW) distance serves as a powerful tool for matching objects in metric spaces. However, its traditional formulation is constrained to pairwise matching between single ...
Glass Surface Detection: Leveraging Reflection Dynamics in Flash/No-flash Imagery : Abstract: Glass surfaces are ubiquitous in daily life, typically appearing colorless, transparent, and lacking distinctive features. These characteristics make glass surface detection a challenging co...
R-AVST: Empowering Video-LLMs with Fine-Grained Spatio-Temporal Reasoning in Complex Audio-Visual Scenarios : Abstract: Recently, rapid advancements have been made in multimodal large language models (MLLMs), especially in video understanding tasks. However, current research focuses on simple video scenarios,...
Warm Diffusion: Recipe for Blur-Noise Mixture Diffusion Models : Abstract: Diffusion probabilistic models have achieved remarkable success in generative tasks across diverse data types. While recent studies have explored alternative degradation processes beyond Gau...
Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content : Abstract: Quality assessment of AI-generated content is crucial for evaluating model capability and guiding model optimization. However, most existing quality assessment datasets and models provide on...
UniModel: A Visual-Only Framework for Unified Multimodal Understanding and Generation : Abstract: We present UniModel, a unified generative model that jointly supports visual understanding and visual generation within a single pixel-to-pixel diffusion framework. Our goal is to achieve un...
DeltaDeno: Zero-Shot Anomaly Generation via Delta-Denoising Attribution : Abstract: Anomaly generation is often framed as few-shot fine-tuning with anomalous samples, which contradicts the scarcity that motivates generation and tends to overfit category priors. We tackle th...
Rethinking Diffusion Model-Based Video Super-Resolution: Leveraging Dense Guidance from Aligned Features : Abstract: Diffusion model (DM) based Video Super-Resolution (VSR) approaches achieve impressive perceptual quality. However, they suffer from error accumulation, spatial artifacts, and a trade-off bet...
Shape-preserving Tooth Segmentation from CBCT Images Using Deep Learning with Semantic and Shape Awareness : Abstract: Background:Accurate tooth segmentation from cone beam computed tomography (CBCT) images is crucial for digital dentistry but remains challenging in cases of interdental adhesions, which caus...
MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models : Abstract: Modern Vision-Language Models (VLMs) demonstrate sophisticated reasoning, escalating privacy risks beyond simple attribute perception to individual-level linkage. Current privacy benchmarks ...
Flow-Guided Implicit Neural Representation for Motion-Aware Dynamic MRI Reconstruction : Abstract: Dynamic magnetic resonance imaging (dMRI) captures temporally-resolved anatomy but is often challenged by limited sampling and motion-induced artifacts. Conventional motion-compensated recon...
FingerCap: Fine-grained Finger-level Hand Motion Captioning : Abstract: Understanding fine-grained human hand motion is fundamental to visual perception, embodied intelligence, and multimodal communication. In this work, we propose Fine-grained Finger-level Hand...
Point-Supervised Facial Expression Spotting with Gaussian-Based Instance-Adaptive Intensity Modeling : Abstract: Automatic facial expression spotting, which aims to identify facial expression instances in untrimmed videos, is crucial for facial expression analysis. Existing methods primarily focus on f...
MatPedia: A Universal Generative Foundation for High-Fidelity Material Synthesis : Abstract: Physically-based rendering (PBR) materials are fundamental to photorealistic graphics, yet their creation remains labor-intensive and requires specialized expertise. While generative models ...
Two Heads Better than One: Dual Degradation Representation for Blind Super-Resolution : Abstract: Previous methods have demonstrated remarkable performance in single image super-resolution (SISR) tasks with known and fixed degradation (e.g., bicubic downsampling). However, when the actua...
Gradient-Driven Natural Selection for Compact 3D Gaussian Splatting : Abstract: 3DGS employs a large number of Gaussian primitives to fit scenes, resulting in substantial storage and computational overhead. Existing pruning methods rely on manually designed criteria or ...
RadioKMoE: Knowledge-Guided Radiomap Estimation with Kolmogorov-Arnold Networks and Mixture-of-Experts : Abstract: Radiomap serves as a vital tool for wireless network management and deployment by providing powerful spatial knowledge of signal propagation and coverage. However, increasingly complex radio...
DReX: Pure Vision Fusion of Self-Supervised and Convolutional Representations for Image Complexity Prediction : Abstract: Visual complexity prediction is a fundamental problem in computer vision with applications in image compression, retrieval, and classification. Understanding what makes humans perceive an im...
DepthFocus: Controllable Depth Estimation for See-Through Scenes : Abstract: Depth in the real world is rarely singular. Transmissive materials create layered ambiguities that confound conventional perception systems. Existing models remain passive, attempting to est...
VLM-Augmented Degradation Modeling for Image Restoration Under Adverse Weather Conditions : Abstract: Reliable visual perception under adverse weather conditions, such as rain, haze, snow, or a mixture of them, is desirable yet challenging for autonomous driving and outdoor robots. In this p...
RoomPlanner: Explicit Layout Planner for Easier LLM-Driven 3D Room Generation : Abstract: In this paper, we propose RoomPlanner, the first fully automatic 3D room generation framework for painlessly creating realistic indoor scenes with only short text as input. Without any manua...
PathAgent: Toward Interpretable Analysis of Whole-slide Pathology Images via Large Language Model-based Agentic Reasoning : Abstract: Analyzing whole-slide images (WSIs) requires an iterative, evidence-driven reasoning process that parallels how pathologists dynamically zoom, refocus, and self-correct while collecting the ...
RL-AD-Net: Reinforcement Learning Guided Adaptive Displacement in Latent Space for Refined Point Cloud Completion : Abstract: Recent point cloud completion models, including transformer-based, denoising-based, and other state-of-the-art approaches, generate globally plausible shapes from partial inputs but often le...
REArtGS++: Generalizable Articulation Reconstruction with Temporal Geometry Constraint via Planar Gaussian Splatting : Abstract: Articulated objects are pervasive in daily environments, such as drawers and refrigerators. Towards their part-level surface reconstruction and joint parameter estimation, REArtGS~\cite{wu20...
Diversity Has Always Been There in Your Visual Autoregressive Models : Abstract: Visual Autoregressive (VAR) models have recently garnered significant attention for their innovative next-scale prediction paradigm, offering notable advantages in both inference efficiency ...
SPAGS: Sparse-View Articulated Object Reconstruction from Single State via Planar Gaussian Splatting : Abstract: Articulated objects are ubiquitous in daily environments, and their 3D reconstruction holds great significance across various fields. However, existing articulated object reconstruction meth...
Sparse Reasoning is Enough: Biological-Inspired Framework for Video Anomaly Detection with Large Pre-trained Models : Abstract: Video anomaly detection (VAD) plays a vital role in real-world applications such as security surveillance, autonomous driving, and industrial monitoring. Recent advances in large pre-trained...
Bridging Visual Affective Gap: Borrowing Textual Knowledge by Learning from Noisy Image-Text Pairs : Abstract: Visual emotion recognition (VER) is a longstanding field that has garnered increasing attention with the advancement of deep neural networks. Although recent studies have achieved notable im...
The Loss of Control Playbook: Degrees, Dynamics, and Preparedness : Abstract: This research report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework. Despite increasing p...
Fast LLM Post-training via Decoupled and Best-of-N Speculation : Abstract: Rollout dominates the training time in large language model (LLM) post-training, where the trained model is used to generate tokens given a batch of prompts. SpecActor achieves fast rollout ...
When Structure Doesn't Help: LLMs Do Not Read Text-Attributed Graphs as Effectively as We Expected : Abstract: Graphs provide a unified representation of semantic content and relational structure, making them a natural fit for domains such as molecular modeling, citation networks, and social graphs. ...
GCL-OT: Graph Contrastive Learning with Optimal Transport for Heterophilic Text-Attributed Graphs : Abstract: Recently, structure-text contrastive learning has shown promising performance on text-attributed graphs by leveraging the complementary strengths of graph neural networks and language models...
A Vector Symbolic Approach to Multiple Instance Learning : Abstract: Multiple Instance Learning (MIL) tasks impose a strict logical constraint: a bag is labeled positive if and only if at least one instance within it is positive. While this iff constraint ali...
Provably Minimum-Length Conformal Prediction Sets for Ordinal Classification : Abstract: Ordinal classification has been widely applied in many high-stakes applications, e.g., medical imaging and diagnosis, where reliable uncertainty quantification (UQ) is essential for decision...
Better audio representations are more brain-like: linking model-brain alignment with performance in downstream auditory tasks : Abstract: Artificial neural networks (ANNs) are increasingly powerful models of brain computation, yet it remains unclear whether improving their task performance also makes their internal representat...
Topologic Attention Networks: Attending to Direct and Indirect Neighbors through Gaussian Belief Propagation : Abstract: Graph Neural Networks rely on local message passing, which limits their ability to model long-range dependencies in graphs. Existing approaches extend this range through continuous-time dyna...
PersonalizedRouter: Personalized LLM Routing via Graph-based User Preference Modeling : Abstract: The growing number of Large Language Models (LLMs) with diverse capabilities and response styles provides users with a wider range of choices, which presents challenges in selecting appropri...
Predicting Talent Breakout Rate using Twitter and TV data : Abstract: Early detection of rising talents is of paramount importance in the field of advertising. In this paper, we define a concept of talent breakout and propose a method to detect Japanese talent...
A Hybrid Computational Intelligence Framework for scRNA-seq Imputation: Integrating scRecover and Random Forests : Abstract: Single-cell RNA sequencing (scRNA-seq) enables transcriptomic profiling at cellular resolution but suffers from pervasive dropout events that obscure biological signals. We present SCR-MF, a...
CroTad: A Contrastive Reinforcement Learning Framework for Online Trajectory Anomaly Detection : Abstract: Detecting trajectory anomalies is a vital task in modern Intelligent Transportation Systems (ITS), enabling the identification of unsafe, inefficient, or irregular travel behaviours. While d...
A novel approach to classification of ECG arrhythmia types with latent ODEs : Abstract: 12-lead ECGs with high sampling frequency are the clinical gold standard for arrhythmia detection, but their short-term, spot-check nature often misses intermittent events. Wearable ECGs ena...
ToC: Tree-of-Claims Search with Multi-Agent Language Models : Abstract: Optimizing patent claims is a critical yet challenging task, demanding careful balance between maximizing novelty and preserving legal scope. Manual claim drafting is labor-intensive, costly...
Gradient flow for deep equilibrium single-index models : Abstract: Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many mod...
FIRM: Federated In-client Regularized Multi-objective Alignment for Large Language Models : Abstract: Aligning Large Language Models (LLMs) with human values often involves balancing multiple, conflicting objectives such as helpfulness and harmlessness. Training these models is computational...
Mask the Redundancy: Evolving Masking Representation Learning for Multivariate Time-Series Clustering : Abstract: Multivariate Time-Series (MTS) clustering discovers intrinsic grouping patterns of temporal data samples. Although time-series provide rich discriminative information, they also contain subs...
Energy Scaling Laws for Diffusion Models: Quantifying Compute and Carbon Emissions in Image Generation : Abstract: The rapidly growing computational demands of diffusion models for image generation have raised significant concerns about energy consumption and environmental impact. While existing approach...
Step-E: A Differentiable Data Cleaning Framework for Robust Learning with Noisy Labels : Abstract: Training data collected in the wild often contain noisy labels and outliers that substantially degrade the performance and reliability of deep neural networks. While data cleaning is commonl...
Hash Collisions in Molecular Fingerprints: Effects on Property Prediction and Bayesian Optimization : Abstract: Molecular fingerprinting methods use hash functions to create fixed-length vector representations of molecules. However, hash collisions cause distinct substructures to be represented with t...
Four decades of circumpolar super-resolved satellite land surface temperature data : Abstract: Land surface temperature (LST) is an essential climate variable (ECV) crucial for understanding land-atmosphere energy exchange and monitoring climate change, especially in the rapidly warmi...
Reconstruction of Surface EMG Signal using IMU data for Upper Limb Actions : Abstract: Surface Electromyography (sEMG) provides vital insights into muscle function, but it can be noisy and challenging to acquire. Inertial Measurement Units (IMUs) provide a robust and wearable ...
DelTriC: A Novel Clustering Method with Accurate Outlier : Abstract: The paper introduces DelTriC (Delaunay Triangulation Clustering), a clustering algorithm which integrates PCA/UMAP-based projection, Delaunay triangulation, and a novel back-projection mecha...
Generating transition states of chemical reactions via distance-geometry-based flow matching : Abstract: Transition states (TSs) are crucial for understanding reaction mechanisms, yet their exploration is limited by the complexity of experimental and computational approaches. Here we propose TS...
FlexiFlow: decomposable flow matching for generation of flexible molecular ensemble : Abstract: Sampling useful three-dimensional molecular structures along with their most favorable conformations is a key challenge in drug discovery. Current state-of-the-art 3D de-novo design flow mat...
Enforcing governing equation constraints in neural PDE solvers via training-free projections : Abstract: Neural PDE solvers used for scientific simulation often violate governing equation constraints. While linear constraints can be projected cheaply, many constraints are nonlinear, complicatin...
Automobile demand forecasting: Spatiotemporal and hierarchical modeling, life cycle dynamics, and user-generated online information : Abstract: Premium automotive manufacturers face increasingly complex forecasting challenges due to high product variety, sparse variant-level data, and volatile market dynamics. This study addresses m...
SAVeD: Semantic Aware Version Discovery : Abstract: Our work introduces SAVeD (Semantically Aware Version Detection), a contrastive learning-based framework for identifying versions of structured datasets without relying on metadata, labels, ...
Self-supervised denoising of raw tomography detector data for improved image reconstruction : Abstract: Ultrafast electron beam X-ray computed tomography produces noisy data due to short measurement times, causing reconstruction artifacts and limiting overall image quality. To counteract these...
ReBaPL: Repulsive Bayesian Prompt Learning : Abstract: Prompt learning has emerged as an effective technique for fine-tuning large-scale foundation models for downstream tasks. However, conventional prompt tuning methods are prone to overfitting...
Convergence and stability of Q-learning in Hierarchical Reinforcement Learning : Abstract: Hierarchical Reinforcement Learning promises, among other benefits, to efficiently capture and utilize the temporal structure of a decision-making problem and to enhance continual learning c...
R2PS: Worst-Case Robust Real-Time Pursuit Strategies under Partial Observability : Abstract: Computing worst-case robust strategies in pursuit-evasion games (PEGs) is time-consuming, especially when real-world factors like partial observability are considered. While important for ge...
A Unified Stability Analysis of SAM vs SGD: Role of Data Coherence and Emergence of Simplicity Bias : Abstract: Understanding the dynamics of optimization in deep learning is increasingly important as models scale. While stochastic gradient descent (SGD) and its variants reliably find solutions that g...
Stable Coresets via Posterior Sampling: Aligning Induced and Full Loss Landscapes : Abstract: As deep learning models continue to scale, the growing computational demands have amplified the need for effective coreset selection techniques. Coreset selection aims to accelerate training...
Self-Supervised Learning by Curvature Alignment : Abstract: Self-supervised learning (SSL) has recently advanced through non-contrastive methods that couple an invariance term with variance, covariance, or redundancy-reduction penalties. While such o...
Towards fully differentiable neural ocean model with Veros : Abstract: We present a differentiable extension of the VEROS ocean model, enabling automatic differentiation through its dynamical core. We describe the key modifications required to make the model fu...
Multi-Agent Pointer Transformer: Seq-to-Seq Reinforcement Learning for Multi-Vehicle Dynamic Pickup-Delivery Problems : Abstract: This paper addresses the cooperative Multi-Vehicle Dynamic Pickup and Delivery Problem with Stochastic Requests (MVDPDPSR) and proposes an end-to-end centralized decision-making framework ba...
Unmasking Airborne Threats: Guided-Transformers for Portable Aerosol Mass Spectrometry : Abstract: Matrix Assisted Laser Desorption/Ionization Mass Spectrometry (MALDI-MS) is a cornerstone in biomolecular analysis, offering precise identification of pathogens through unique mass spectral ...
Harnessing Data from Clustered LQR Systems: Personalized and Collaborative Policy Optimization : Abstract: It is known that reinforcement learning (RL) is data-hungry. To improve sample-efficiency of RL, it has been proposed that the learning algorithm utilize data from 'approximately similar' pr...
Fermions and Supersymmetry in Neural Network Field Theories : Abstract: We introduce fermionic neural network field theories via Grassmann-valued neural networks. Free theories are obtained by a generalization of the Central Limit Theorem to Grassmann variables....
Membership Inference Attacks Beyond Overfitting : Abstract: Membership inference attacks (MIAs) against machine learning (ML) models aim to determine whether a given data point was part of the model training data. These attacks may pose significant p...
Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition : Abstract: Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to ...
BITS for GAPS: Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates : Abstract: We introduce the Bayesian Information-Theoretic Sampling for hierarchical GAussian Process Surrogates (BITS for GAPS) framework to emulate latent components in hybrid physical systems. BITS ...
Is the Cure Still Worse Than the Disease? Test Overfitting by LLMs in Automated Program Repair : Abstract: Automated program repair has been shown to be susceptible to generating repaired code that passes on seen tests but fails on a hold-out set of hidden tests. This problem, dubbed test overfit...
Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representational Alignment : Abstract: Enforcing alignment between the internal representations of diffusion or flow-based generative models and those of pretrained self-supervised encoders has recently been shown to provide a po...
Neighbor GRPO: Contrastive ODE Policy Optimization Aligns Flow Models : Abstract: Group Relative Policy Optimization (GRPO) has shown promise in aligning image and video generative models with human preferences. However, applying it to modern flow matching models is chall...
Real-Time Cooked Food Image Synthesis and Visual Cooking Progress Monitoring on Edge Devices : Abstract: Synthesizing realistic cooked food images from raw inputs on edge devices is a challenging generative task, requiring models to capture complex changes in texture, color and structure during...
A Diversity-optimized Deep Ensemble Approach for Accurate Plant Leaf Disease Detection : Abstract: Plant diseases pose a significant threat to global agriculture, causing over $220 billion in annual economic losses and jeopardizing food security. The timely and accurate detection of these...
Generative MIMO Beam Map Construction for Location Recovery and Beam Tracking : Abstract: Machine learning (ML) has greatly advanced data-driven channel modeling and resource optimization in wireless communication systems. However, most existing ML-based methods rely on large, ac...
An Efficient Computational Framework for Discrete Fuzzy Numbers Based on Total Orders : Abstract: Discrete fuzzy numbers, and in particular those defined over a finite chain $L_n = \{0, \ldots, n\}$, have been effectively employed to represent linguistic information within the framework ...
Dissecting Quantum Reinforcement Learning: A Systematic Evaluation of Key Components : Abstract: Parameterised quantum circuit (PQC) based Quantum Reinforcement Learning (QRL) has emerged as a promising paradigm at the intersection of quantum computing and reinforcement learning (RL). B...
Layer-wise Weight Selection for Power-Efficient Neural Network Acceleration : Abstract: Systolic array accelerators execute CNNs with energy dominated by the switching activity of multiply accumulate (MAC) units. Although prior work exploits weight dependent MAC power for compr...
OmniLens++: Blind Lens Aberration Correction via Large LensLib Pre-Training and Latent PSF Representation : Abstract: Emerging deep-learning-based lens library pre-training (LensLib-PT) pipeline offers a new avenue for blind lens aberration correction by training a universal neural network, demonstrating st...
FireScope: Wildfire Risk Prediction with a Chain-of-Thought Oracle : Abstract: Predicting wildfire risk is a reasoning-intensive spatial problem that requires the integration of visual, climatic, and geographic factors to infer continuous risk maps. Existing methods la...
On the Predictive Skill of Artificial Intelligence-based Weather Models for Extreme Events using Uncertainty Quantification : Abstract: Accurate prediction of extreme weather events remains a major challenge for artificial intelligence based weather prediction systems. While deterministic models such as FuXi, GraphCast, and ...
Investigating self-supervised representations for audio-visual deepfake detection : Abstract: Self-supervised representations excel at many vision and speech tasks, but their potential for audio-visual deepfake detection remains underexplored. Unlike prior work that uses these featur...
Intrinsic preservation of plasticity in continual quantum learning : Abstract: Artificial intelligence in dynamic, real-world environments requires the capacity for continual learning. However, standard deep learning suffers from a fundamental issue: loss of plasticity...
Fast Decoding for Non-Adaptive Learning of Erd\H{o}s--R\'enyi Random Graphs : Abstract: We study the problem of learning an unknown graph via group queries on node subsets, where each query reports whether at least one edge is present among the queried nodes. In general, learni...
Equivariant-Aware Structured Pruning for Efficient Edge Deployment: A Comprehensive Framework with Adaptive Fine-Tuning : Abstract: This paper presents a novel framework combining group equivariant convolutional neural networks (G-CNNs) with equivariant-aware structured pruning to produce compact, transformation-invarian...
A First Full Physics Benchmark for Highly Granular Calorimeter Surrogates : Abstract: The physics programs of current and future collider experiments necessitate the development of surrogate simulators for calorimeter showers. While much progress has been made in the developm...
Non-Parametric Probabilistic Robustness: A Conservative Metric with Optimized Perturbation Distributions : Abstract: Deep learning (DL) models, despite their remarkable success, remain vulnerable to small input perturbations that can cause erroneous outputs, motivating the recent proposal of probabilistic ...
Selective Rotary Position Embedding : Abstract: Position information is essential for language modeling. In softmax transformers, Rotary Position Embeddings (\textit{RoPE}) encode positions through \textit{fixed-angle} rotations, while in...
SPEAR-1: Scaling Beyond Robot Demonstrations via 3D Understanding : Abstract: Robotic Foundation Models (RFMs) hold great promise as generalist, end-to-end systems for robot control. Yet their ability to generalize across new environments, tasks, and embodiments remai...
CREST: Improving Interpretability and Effectiveness of Troubleshooting at Ericsson through Criterion-Specific Trouble Report Retrieval : Abstract: The rapid evolution of the telecommunication industry necessitates efficient troubleshooting processes to maintain network reliability, software maintainability, and service quality. Trouble...
A Framework for Adaptive Stabilisation of Nonlinear Stochastic Systems : Abstract: We consider the adaptive control problem for discrete-time, nonlinear stochastic systems with linearly parameterised uncertainty. Assuming access to a parameterised family of controllers tha...
Addressing A Posteriori Performance Degradation in Neural Network Subgrid Stress Models : Abstract: Neural network subgrid stress models often have a priori performance that is far better than the a posteriori performance, leading to neural network models that look very promising a priori ...
"Normalized Stress" is Not Normalized: How to Interpret Stress Correctly : Abstract: Stress is among the most commonly employed quality metrics and optimization criteria for dimension reduction projections of high dimensional data. Complex, high dimensional data is ubiquitou...
UplinkNet: Practical Commercial 5G Standalone (SA) Uplink Throughput Prediction : Abstract: While 5G New Radio (NR) networks offer significant uplink throughput improvements, these gains are primarily realized when User Equipment (UE) connects to high-frequency millimeter wave (mmW...
FLUID: Training-Free Face De-identification via Latent Identity Substitution : Abstract: We present FLUID (Face de-identification in the Latent space via Utility-preserving Identity Displacement), a training-free framework that directly substitutes identity in the latent space o...
Supervised Fine Tuning of Large Language Models for Domain Specific Knowledge Graph Construction:A Case Study on Hunan's Historical Celebrities : Abstract: Large language models and knowledge graphs offer strong potential for advancing research on historical culture by supporting the extraction, analysis, and interpretation of cultural heritage...
Parameter-Free Neural Lens Blur Rendering for High-Fidelity Composites : Abstract: Consistent and natural camera lens blur is important for seamlessly blending 3D virtual objects into photographed real-scenes. Since lens blur typically varies with scene depth, the placemen...
CLLMRec: LLM-powered Cognitive-Aware Concept Recommendation via Semantic Alignment and Prerequisite Knowledge Distillation : Abstract: The growth of Massive Open Online Courses (MOOCs) presents significant challenges for personalized learning, where concept recommendation is crucial. Existing approaches typically rely on he...
MedImageInsight for Thoracic Cavity Health Classification from Chest X-rays : Abstract: Chest radiography remains one of the most widely used imaging modalities for thoracic diagnosis, yet increasing imaging volumes and radiologist workload continue to challenge timely interpre...
RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis : Abstract: We introduce RacketVision, a novel dataset and benchmark for advancing computer vision in sports analytics, covering table tennis, tennis, and badminton. The dataset is the first to provide ...
OmniPT: Unleashing the Potential of Large Vision Language Models for Pedestrian Tracking and Understanding : Abstract: LVLMs have been shown to perform excellently in image-level tasks such as VQA and caption. However, in many instance-level tasks, such as visual grounding and object detection, LVLMs still s...
ReBrain: Brain MRI Reconstruction from Sparse CT Slice via Retrieval-Augmented Diffusion : Abstract: Magnetic Resonance Imaging (MRI) plays a crucial role in brain disease diagnosis, but it is not always feasible for certain patients due to physical or clinical constraints. Recent studies a...
Why Do Language Model Agents Whistleblow? : Abstract: The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that...
Spanning Tree Autoregressive Visual Generation : Abstract: We present Spanning Tree Autoregressive (STAR) modeling, which can incorporate prior knowledge of images, such as center bias and locality, to maintain sampling performance while also provid...
Geometric-Disentangelment Unlearning : Abstract: Machine unlearning, the removal of a training subset's influence from a deployed model, is critical for privacy preservation and model reliability, yet gradient ascent on forget samples ofte...
AutoGraphAD: A novel approach using Variational Graph Autoencoders for anomalous network flow detection : Abstract: Network Intrusion Detection Systems (NIDS) are essential tools for detecting network attacks and intrusions. While extensive research has explored the use of supervised Machine Learning for ...
Training Foundation Models on a Full-Stack AMD Platform: Compute, Networking, and System Design : Abstract: We report on the first large-scale mixture-of-experts (MoE) pretraining study on pure AMD hardware, utilizing both MI300X GPUs with Pollara interconnect. We distill practical guidance for bo...
Learning to Compress: Unlocking the Potential of Large Language Models for Text Representation : Abstract: Text representation plays a critical role in tasks like clustering, retrieval, and other downstream applications. With the emergence of large language models (LLMs), there is increasing inte...
UI-CUBE: Enterprise-Grade Computer Use Agent Benchmarking Beyond Task Accuracy to Operational Reliability : Abstract: While current Computer Use Agent (CUA) benchmarks measure task completion effectively, they provide limited assessment of enterprise deployment readiness, emphasizing functional correctness ...
Device-Guided Music Transfer : Abstract: Device-guided music transfer adapts playback across unseen devices for users who lack them. Existing methods mainly focus on modifying the timbre, rhythm, harmony, or instrumentation to mimi...
A lightweight detector for real-time detection of remote sensing images : Abstract: Remote sensing imagery is widely used across various fields, yet real-time detection remains challenging due to the prevalence of small objects and the need to balance accuracy with efficien...
The PLLuM Instruction Corpus : Abstract: This paper describes the instruction dataset used to fine-tune a set of transformer-based large language models (LLMs) developed in the PLLuM (Polish Large Language Model) project. We presen...
Hallucinate Less by Thinking More: Aspect-Based Causal Abstention for Large Language Models : Abstract: Large Language Models (LLMs) often produce fluent but factually incorrect responses, a phenomenon known as hallucination. Abstention, where the model chooses not to answer and instead output...
Attention-Guided Feature Fusion (AGFF) Model for Integrating Statistical and Semantic Features in News Text Classification : Abstract: News text classification is a crucial task in natural language processing, essential for organizing and filtering the massive volume of digital content. Traditional methods typically rely on...
Parrot: Persuasion and Agreement Robustness Rating of Output Truth -- A Sycophancy Robustness Benchmark for LLMs : Abstract: This study presents PARROT (Persuasion and Agreement Robustness Rating of Output Truth), a robustness focused framework designed to measure the degradation in accuracy that occurs under soci...
TP-MDDN: Task-Preferenced Multi-Demand-Driven Navigation with Autonomous Decision-Making : Abstract: In daily life, people often move through spaces to find objects that meet their needs, posing a key challenge in embodied AI. Traditional Demand-Driven Navigation (DDN) handles one need at a...
Algorithmic design and implementation considerations of deep MPC : Abstract: Deep Model Predictive Control (Deep MPC) is an evolving field that integrates model predictive control and deep learning. This manuscript is focused on a particular approach, which employs d...
Lost in Translation and Noise: A Deep Dive into the Failure Modes of VLMs on Real-World Tables : Abstract: The impressive performance of VLMs is largely measured on benchmarks that fail to capture the complexities of real-world scenarios. Existing datasets for tabular QA, such as WikiTableQuestio...
Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats : Abstract: Despite their impressive performance across a wide range of tasks, Large Vision-Language Models (LVLMs) remain prone to hallucination. In this study, we propose a comprehensive intervention ...
DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format : Abstract: Nowadays, we are witnessing an Artificial Intelligence revolution that dominates the technology landscape in various application domains, such as healthcare, robotics, automotive, security, ...
Range-Edit: Semantic Mask Guided Outdoor LiDAR Scene Editing : Abstract: Training autonomous driving and navigation systems requires large and diverse point cloud datasets that capture complex edge case scenarios from various dynamic urban settings. Acquiring suc...
Leveraging CVAE for Joint Configuration Estimation of Multifingered Grippers from Point Cloud Data : Abstract: This paper presents an efficient approach for determining the joint configuration of a multifingered gripper solely from the point cloud data of its poly-articulated chain, as generated by v...
Where Culture Fades: Revealing the Cultural Gap in Text-to-Image Generation : Abstract: Multilingual text-to-image (T2I) models have advanced rapidly in terms of visual realism and semantic alignment, and are now widely utilized. Yet outputs vary across cultural contexts: becau...
Large Language Models for Sentiment Analysis to Detect Social Challenges: A Use Case with South African Languages : Abstract: Sentiment analysis can aid in understanding people's opinions and emotions on social issues. In multilingual communities sentiment analysis systems can be used to quickly identify social cha...
MuM: Multi-View Masked Image Modeling for 3D Vision : Abstract: Self-supervised learning on images seeks to extract meaningful visual representations from unlabeled data. When scaled to large datasets, this paradigm has achieved state-of-the-art performa...
FORWARD: Dataset of a forwarder operating in rough terrain : Abstract: We present FORWARD, a high-resolution multimodal dataset of a cut-to-length forwarder operating in rough terrain on two harvest sites in the middle part of Sweden. The forwarder is a large K...
MusicAIR: A Multimodal AI Music Generation Framework Powered by an Algorithm-Driven Core : Abstract: Recent advances in generative AI have made music generation a prominent research focus. However, many neural-based models rely on large datasets, raising concerns about copyright infringemen...
AI Workers, Geopolitics, and Algorithmic Collective Action : Abstract: According to the theory of International Political Economy (IPE), states are often incentivized to rely on rather than constrain powerful corporations. For this reason, IPE provides a useful...
Is Phase Really Needed for Weakly-Supervised Dereverberation ? : Abstract: In unsupervised or weakly-supervised approaches for speech dereverberation, the target clean (dry) signals are considered to be unknown during training. In that context, evaluating to what e...
Quantum Masked Autoencoders for Vision Learning : Abstract: Classical autoencoders are widely used to learn features of input data. To improve the feature learning, classical masked autoencoders extend classical autoencoders to learn the features of ...
Designing and Generating Diverse, Equitable Face Image Datasets for Face Verification Tasks : Abstract: Face verification is a significant component of identity authentication in various applications including online banking and secure access to personal devices. The majority of the existing f...
Sparse Mixture-of-Experts for Multi-Channel Imaging: Are All Channel Interactions Required? : Abstract: Vision Transformers ($\text{ViTs}$) have become the backbone of vision foundation models, yet their optimization for multi-channel domains - such as cell painting or satellite imagery - rema...
Beyond Multiple Choice: A Hybrid Framework for Unifying Robust Evaluation and Verifiable Reasoning Training : Abstract: Multiple-choice question answering (MCQA) has been a popular format for evaluating and reinforcement fine-tuning (RFT) of modern multimodal language models. Its constrained output format all...
DS-Span: Single-Phase Discriminative Subgraph Mining for Efficient Graph Embeddings : Abstract: Graph representation learning seeks to transform complex, high-dimensional graph structures into compact vector spaces that preserve both topology and semantics. Among the various strategies...
Preventing Shortcut Learning in Medical Image Analysis through Intermediate Layer Knowledge Distillation from Specialist Teachers : Abstract: Deep learning models are prone to learning shortcut solutions to problems using spuriously correlated yet irrelevant features of their training data. In high-risk applications such as medica...
SMILE: A Composite Lexical-Semantic Metric for Question-Answering Evaluation : Abstract: Traditional evaluation metrics for textual and visual question answering, like ROUGE, METEOR, and Exact Match (EM), focus heavily on n-gram based lexical similarity, often missing the deeper...
InTAct: Interval-based Task Activation Consolidation for Continual Learning : Abstract: Continual learning aims to enable neural networks to acquire new knowledge without forgetting previously learned information. While recent prompt-based methods perform strongly in class-incr...
REMSA: An LLM Agent for Foundation Model Selection in Remote Sensing : Abstract: Foundation Models (FMs) are increasingly used in remote sensing (RS) for tasks such as environmental monitoring, disaster assessment, and land-use mapping. These models include unimodal visi...
GRAPHIC--Guidelines for Reviewing Algorithmic Practices in Human-centred Design and Interaction for Creativity : Abstract: Artificial Intelligence (AI) has been increasingly applied to creative domains, leading to the development of systems that collaborate with humans in design processes. In Graphic Design, int...
Planning with Sketch-Guided Verification for Physics-Aware Video Generation : Abstract: Recent video generation approaches increasingly rely on planning intermediate control signals such as object trajectories to improve temporal coherence and motion fidelity. However, these me...
PersonaAgent with GraphRAG: Community-Aware Knowledge Graphs for Personalized LLM : Abstract: We propose a novel framework for persona-based language model system, motivated by the need for personalized AI agents that adapt to individual user preferences. In our approach, the agent e...
Masked-and-Reordered Self-Supervision for Reinforcement Learning from Verifiable Rewards : Abstract: Test-time scaling has been shown to substantially improve large language models' (LLMs) mathematical reasoning. However, for a large portion of mathematical corpora, especially theorem provi...
Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition : Abstract: Recent advances in multimodal deep learning have greatly enhanced the capability of systems for speech analysis and pronunciation assessment. Accurate pronunciation detection remains a key c...
RTMol: Rethinking Molecule-text Alignment in a Round-trip View : Abstract: Aligning molecular sequence representations (e.g., SMILES notations) with textual descriptions is critical for applications spanning drug discovery, materials design, and automated chemical ...
KRAL: Knowledge and Reasoning Augmented Learning for LLM-assisted Clinical Antimicrobial Therapy : Abstract: Clinical antimicrobial therapy requires the dynamic integration of pathogen profiles,host factors, pharmacological properties of antimicrobials,and the severity of infection. This complexity...
Multi-Agent Collaborative Reward Design for Enhancing Reasoning in Reinforcement Learning : Abstract: We present CRM (Multi-Agent Collaborative Reward Model), a framework that replaces a single black-box reward model with a coordinated team of specialist evaluators to improve robustness and ...
You Only Forward Once: An Efficient Compositional Judging Paradigm : Abstract: Multimodal large language models (MLLMs) show strong potential as judges. However, existing approaches face a fundamental trade-off: adapting MLLMs to output a single score misaligns with th...
FaCells. Teaching Machines the Language of Lines: Per Point Attribute Scores for Face-Sketch Classification : Abstract: FaCells is a method, and an exhibition, that turns model internals into line based artworks. Aligned face photographs (CelebA, 260k images, 40 attributes) are translated into vector sketches...
MiniLLM: Knowledge Distillation of Large Language Models : Abstract: Knowledge Distillation (KD) is a promising technique for reducing the high computational demand of large language models (LLMs). However, previous KD methods are primarily applied to white-b...
AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Deepfake Detection of Frontal Face Videos : Abstract: Multimodal manipulations (also known as audio-visual deepfakes) make it difficult for unimodal deepfake detectors to detect forgeries in multimedia content. To avoid the spread of false prop...
Posts of Peril: Detecting Information About Hazards in Text : Abstract: Socio-linguistic indicators of affectively-relevant phenomena, such as emotion or sentiment, are often extracted from text to better understand features of human-computer interactions, inclu...
Generative AI and Power Imbalances in Global Education: Frameworks for Bias Mitigation : Abstract: This study examines how Generative Artificial Intelligence reproduces global power hierarchies in education and proposes a framework to address resulting inequities. Using a critical qualita...
CATCODER: Repository-Level Code Generation with Relevant Code and Type Context : Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, repository-level code generation presents unique challenges, particularly due to the...
AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models : Abstract: Aerospace embodied intelligence aims to empower unmanned aerial vehicles (UAVs) and other aerospace platforms to achieve autonomous perception, cognition, and action, as well as egocentric a...
MonoKAN: Certified Monotonic Kolmogorov-Arnold Network : Abstract: Artificial Neural Networks (ANNs) have significantly advanced various fields by effectively recognizing patterns and solving complex problems. Despite these advancements, their interpretabil...
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Design of Multi Active/Passive Core-Agent Architectures : Abstract: In an era where vast amounts of data are collected and processed from diverse sources, there is a growing demand for sophisticated AI systems capable of intelligently fusing and analyzing th...
Task-Aligned Tool Recommendation for Large Language Models : Abstract: By augmenting Large Language Models (LLMs) with external tools, their capacity to solve complex problems has been significantly enhanced. However, despite ongoing advancements in the parsing...
Stable diffusion models reveal a persisting human and AI gap in visual creativity : Abstract: While recent research suggests Large Language Models match human creative performance in divergent thinking tasks, visual creativity remains underexplored. This study compared image generati...
Cognitive BASIC: An In-Model Interpreted Reasoning Language for LLMs : Abstract: Cognitive BASIC is a minimal, BASIC-style prompting language and in-model interpreter that structures large language model (LLM) reasoning into explicit, stepwise execution traces. Inspired ...
Fantastic Bugs and Where to Find Them in AI Benchmarks : Abstract: Benchmarks are pivotal in driving AI progress, and invalid benchmark questions frequently undermine their reliability. Manually identifying and correcting errors among thousands of benchmark...
Hybrid Differential Reward: Combining Temporal Difference and Action Gradients for Efficient Multi-Agent Reinforcement Learning in Cooperative Driving : Abstract: In multi-vehicle cooperative driving tasks involving high-frequency continuous control, traditional state-based reward functions suffer from the issue of vanishing reward differences. This p...
Comparing verbal, visual and combined explanations for Bayesian Network inferences : Abstract: Bayesian Networks (BNs) are an important tool for assisting probabilistic reasoning, but despite being considered transparent models, people have trouble understanding them. Further, current...
MirrorMind: Empowering OmniScientist with the Expert Perspectives and Collective Knowledge of Human Scientists : Abstract: The emergence of AI Scientists has demonstrated remarkable potential in automating scientific research. However, current approaches largely conceptualize scientific discovery as a solitary o...
Budget-Aware Tool-Use Enables Effective Agent Scaling : Abstract: Scaling test-time computation improves performance across different tasks on large language models (LLMs), which has also been extended to tool-augmented agents. For these agents, scaling in...
DAPS++: Rethinking Diffusion Inverse Problems with Decoupled Posterior Annealing : Abstract: From a Bayesian perspective, score-based diffusion solves inverse problems through joint inference, embedding the likelihood with the prior to guide the sampling process. However, this formu...
Patient-level Information Extraction by Consistent Integration of Textual and Tabular Evidence with Bayesian Networks : Abstract: Electronic health records (EHRs) form an invaluable resource for training clinical decision support systems. To leverage the potential of such systems in high-risk applications, we need larg...
The Belief-Desire-Intention Ontology for modelling mental reality and agency : Abstract: The Belief-Desire-Intention (BDI) model is a cornerstone for representing rational agency in artificial intelligence and cognitive sciences. Yet, its integration into structured, semanticall...
MIR: Efficient Exploration in Episodic Multi-Agent Reinforcement Learning via Mutual Intrinsic Reward : Abstract: Episodic rewards present a significant challenge in reinforcement learning. While intrinsic reward methods have demonstrated effectiveness in single-agent rein-forcement learning scenarios, ...
Designing Domain-Specific Agents via Hierarchical Task Abstraction Mechanism : Abstract: LLM-driven agents, particularly those using general frameworks like ReAct or human-inspired role-playing, often struggle in specialized domains that necessitate rigorously structured workflo...
Agentifying Agentic AI : Abstract: Agentic AI seeks to endow systems with sustained autonomy, reasoning, and interaction capabilities. To realize this vision, its assumptions about agency must be complemented by explicit mode...
That's not natural: The Impact of Off-Policy Training Data on Probe Performance : Abstract: Probing has emerged as a promising method for monitoring Large Language Models (LLMs), enabling inference-time detection of concerning behaviours such as deception and sycophancy. However, n...
SRA-CP: Spontaneous Risk-Aware Selective Cooperative Perception : Abstract: Cooperative perception (CP) offers significant potential to overcome the limitations of single-vehicle sensing by enabling information sharing among connected vehicles (CVs). However, existi...
Instance Configuration for Sustainable Job Shop Scheduling : Abstract: The Job Shop Scheduling Problem (JSP) is a pivotal challenge in operations research and is essential for evaluating the effectiveness and performance of scheduling algorithms. Scheduling pro...
Joint Design of Protein Surface and Structure Using a Diffusion Bridge Model : Abstract: Protein-protein interactions (PPIs) are governed by surface complementarity and hydrophobic interactions at protein interfaces. However, designing diverse and physically realistic protein st...
Shona spaCy: A Morphological Analyzer for an Under-Resourced Bantu Language : Abstract: Despite rapid advances in multilingual natural language processing (NLP), the Bantu language Shona remains under-served in terms of morphological analysis and language-aware tools. This pape...
Towards Hyper-Efficient RAG Systems in VecDBs: Distributed Parallel Multi-Resolution Vector Search : Abstract: Retrieval-Augmented Generation (RAG) systems have become a dominant approach to augment large language models (LLMs) with external knowledge. However, existing vector database (VecDB) retrie...
Bench360: Benchmarking Local LLM Inference from 360{\deg} : Abstract: Running large language models (LLMs) locally is becoming increasingly common. While the growing availability of small open-source models and inference engines has lowered the entry barrier, ...
How Well Do LLMs Understand Tunisian Arabic? : Abstract: Large Language Models (LLMs) are the engines driving today's AI agents. The better these models understand human languages, the more natural and user-friendly the interaction with AI becomes...
Ellipsoid-Based Decision Boundaries for Open Intent Classification : Abstract: Textual open intent classification is crucial for real-world dialogue systems, enabling robust detection of unknown user intents without prior knowledge and contributing to the robustness of...
Prompt-Based Value Steering of Large Language Models : Abstract: Large language models are increasingly used in applications where alignment with human values is critical. While model fine-tuning is often employed to ensure safe responses, this technique ...
Concept-Based Interpretability for Toxicity Detection : Abstract: The rise of social networks has not only facilitated communication but also allowed the spread of harmful content. Although significant advances have been made in detecting toxic language in...
Falsely Accused: How AI Detectors Misjudge Slightly Polished Arabic Articles : Abstract: Many AI detection models have been developed to counter the presence of articles created by artificial intelligence (AI). However, if a human-authored article is slightly polished by AI, a s...
Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT : Abstract: SNOMED CT is a biomedical ontology with a hierarchical representation of large-scale concepts. Knowledge retrieval in SNOMED CT is critical for its application, but often proves challenging ...
Detecting and Steering LLMs' Empathy in Action : Abstract: We investigate empathy-in-action -- the willingness to sacrifice task efficiency to address human needs -- as a linear direction in LLM activation space. Using contrastive prompts grounded i...
RAG-Driven Data Quality Governance for Enterprise ERP Systems : Abstract: Enterprise ERP systems managing hundreds of thousands of employee records face critical data quality challenges when human resources departments perform decentralized manual entry across mul...
Large language models for automated PRISMA 2020 adherence checking : Abstract: Evaluating adherence to PRISMA 2020 guideline remains a burden in the peer review process. To address the lack of shareable benchmarks, we constructed a copyright-aware benchmark of 108 Crea...
Multi-Agent Code Verification with Compound Vulnerability Detection : Abstract: LLMs generate buggy code: 29.6% of SWE-bench "solved" patches fail, 62% of BaxBench solutions have vulnerabilities, and existing tools only catch 65% of bugs with 35% false positives. We bui...
AutoBackdoor: Automating Backdoor Attacks via LLM Agents : Abstract: Backdoor attacks pose a serious threat to the secure deployment of large language models (LLMs), enabling adversaries to implant hidden behaviors triggered by specific inputs. However, exist...
PairHuman: A High-Fidelity Photographic Dataset for Customized Dual-Person Generation : Abstract: Personalized dual-person portrait customization has considerable potential applications, such as preserving emotional memories and facilitating wedding photography planning. However, the abs...
DDTime: Dataset Distillation with Spectral Alignment and Information Bottleneck for Time-Series Forecasting : Abstract: Time-series forecasting is fundamental across many domains, yet training accurate models often requires large-scale datasets and substantial computational resources. Dataset distillation off...
Password Strength Analysis Through Social Network Data Exposure: A Combined Approach Relying on Data Reconstruction and Generative Models : Abstract: Although passwords remain the primary defense against unauthorized access, users often tend to use passwords that are easy to remember. This behavior significantly increases security risks, ...
A Machine Learning-Driven Solution for Denoising Inertial Confinement Fusion Images : Abstract: Neutron imaging is important in optimizing analysis of inertial confinement fusion (ICF) events such as those at the National Ignition Facility (NIF) and improving current and future ICF pla...
SAM 3: Segment Anything with Concepts : Abstract: We present Segment Anything Model (SAM) 3, a unified model that detects, segments, and tracks objects in images and videos based on concept prompts, which we define as either short noun phra...
SafeR-CLIP: Mitigating NSFW Content in Vision-Language Models While Preserving Pre-Trained Knowledge : Abstract: Improving the safety of vision-language models like CLIP via fine-tuning often comes at a steep price, causing significant drops in their generalization performance. We find this trade-off s...
Revisiting Audio-language Pretraining for Learning General-purpose Audio Representation : Abstract: Audio-language pretraining holds promise for general-purpose audio understanding, yet remains underexplored compared to its vision counterpart. While vision-language models like CLIP serve a...
Generative Augmented Reality: Paradigms, Technologies, and Future Applications : Abstract: This paper introduces Generative Augmented Reality (GAR) as a next-generation paradigm that reframes augmentation as a process of world re-synthesis rather than world composition by a conven...
Revisiting Multimodal KV Cache Compression: A Frequency-Domain-Guided Outlier-KV-Aware Approach : Abstract: Multimodal large language models suffer from substantial inference overhead since multimodal KV Cache grows proportionally with the visual input length. Existing multimodal KV Cache compress...
Mesh RAG: Retrieval Augmentation for Autoregressive Mesh Generation : Abstract: 3D meshes are a critical building block for applications ranging from industrial design and gaming to simulation and robotics. Traditionally, meshes are crafted manually by artists, a proces...
A Robust Federated Learning Approach for Combating Attacks Against IoT Systems Under non-IID Challenges : Abstract: In the context of the growing proliferation of user devices and the concurrent surge in data volumes, the complexities arising from the substantial increase in data have posed formidable cha...
Monte Carlo Expected Threat (MOCET) Scoring : Abstract: Evaluating and measuring AI Safety Level (ASL) threats are crucial for guiding stakeholders to implement safeguards that keep risks within acceptable limits. ASL-3+ models present a unique r...
WorldGen: From Text to Traversable and Interactive 3D Worlds : Abstract: We introduce WorldGen, a system that enables the automatic creation of large-scale, interactive 3D worlds directly from text prompts. Our approach transforms natural language descriptions in...
ManifoldFormer: Geometric Deep Learning for Neural Dynamics on Riemannian Manifolds : Abstract: Existing EEG foundation models mainly treat neural signals as generic time series in Euclidean space, ignoring the intrinsic geometric structure of neural dynamics that constrains brain acti...
Analysis of heart failure patient trajectories using sequence modeling : Abstract: Transformers have defined the state-of-the-art for clinical prediction tasks involving electronic health records (EHRs). The recently introduced Mamba architecture outperformed an advanced T...
ConCISE: A Reference-Free Conciseness Evaluation Metric for LLM-Generated Answers : Abstract: Large language models (LLMs) frequently generate responses that are lengthy and verbose, filled with redundant or unnecessary details. This diminishes clarity and user satisfaction, and it i...
Sex and age determination in European lobsters using AI-Enhanced bioacoustics : Abstract: Monitoring aquatic species, especially elusive ones like lobsters, presents challenges. This study focuses on Homarus gammarus (European lobster), a key species for fisheries and aquaculture...
MRI Super-Resolution with Deep Learning: A Comprehensive Survey : Abstract: High-resolution (HR) magnetic resonance imaging (MRI) is crucial for many clinical and research applications. However, achieving it remains costly and constrained by technical trade-offs and...
The use of vocal biomarkers in the detection of Parkinson's disease: a robust statistical performance comparison of classic machine learning models : Abstract: Parkinson's disease (PD) is a progressive neurodegenerative disorder that, in addition to directly impairing functional mobility, is frequently associated with vocal impairments such as hypo...
Generative AI in Sociological Research: State of the Discipline : Abstract: Generative artificial intelligence (GenAI) has garnered considerable attention for its potential utility in research and scholarship, even among those who typically do not rely on computatio...
Deep Improvement Supervision : Abstract: Recently, it was shown that small, looped architectures, such as Tiny Recursive Models (TRMs), can outperform Large Language Models (LLMs) on complex reasoning tasks, including the Abstracti...
PepEVOLVE: Position-Aware Dynamic Peptide Optimization via Group-Relative Advantage : Abstract: Macrocyclic peptides are an emerging modality that combines biologics-like affinity with small-molecule-like developability, but their vast combinatorial space and multi-parameter objectives...
OmniGround: A Comprehensive Spatio-Temporal Grounding Benchmark for Real-World Complex Scenarios : Abstract: Spatio-Temporal Video Grounding (STVG) aims to localize target objects in videos based on natural language descriptions. Despite recent advances in Multimodal Large Language Models, a signif...
RASTP: Representation-Aware Semantic Token Pruning for Generative Recommendation with Semantic Identifiers : Abstract: Generative recommendation systems typically leverage Semantic Identifiers (SIDs), which represent each item as a sequence of tokens that encode semantic information. However, representing it...
Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems : Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized mode...
The Finer the Better: Towards Granular-aware Open-set Domain Generalization : Abstract: Open-Set Domain Generalization (OSDG) tackles the realistic scenario where deployed models encounter both domain shifts and novel object categories. Despite impressive progress with vision-l...

Research Sources: 306 | Generated: 11/24/2025