AI RESEARCH PAPERS & ACADEMIC SOURCES
- LN3DIFF++: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation : Abstract: The field of neural rendering has witnessed significant progress with advancements in generative models and differentiable rendering techniques. Though 2D diffusion has achieved success, a u...
- Tracing the Roots: Leveraging Temporal Dynamics in Diffusion Trajectories for Origin Attribution : Abstract: Diffusion models have transformed image synthesis through iterative denoising, by defining trajectories from noise to coherent data. While their capabilities are widely celebrated, a critica...
- MSDiff: Multi-Scale Diffusion Model for Ultra-Sparse View CT Reconstruction : Abstract: Computed Tomography (CT) technology reduces radiation haz-ards to the human body through sparse sampling, but fewer sampling angles pose challenges for image reconstruction. Score-based gene...
- Text-guided multi-stage cross-perception network for medical image segmentation : Abstract: Medical image segmentation plays a crucial role in clinical medicine, serving as a key tool for auxiliary diagnosis, treatment planning, and disease monitoring. However, traditional segmenta...
- AVM: Towards Structure-Preserving Neural Response Modeling in the Visual Cortex Across Stimuli and Individuals : Abstract: While deep learning models have shown strong performance in simulating neural responses, they often fail to clearly separate stable visual encoding from condition-specific adaptation, which ...
- Endo-SemiS: Towards Robust Semi-Supervised Image Segmentation for Endoscopic Video : Abstract: In this paper, we present Endo-SemiS, a semi-supervised segmentation framework for providing reliable segmentation of endoscopic video frames with limited annotation. EndoSemiS uses 4 strate...
- A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos : Abstract: Long-form multimodal video understanding requires integrating vision, speech, and ambient audio with coherent long-range reasoning. Existing benchmarks emphasize either temporal length or mu...
- 4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation : Abstract: Despite advances in Multimodal LLMs (MLLMs), their ability to reason over 3D structures and temporal dynamics remains limited, constrained by weak 4D perception and temporal understanding. E...
- FORMSpoT: A Decade of Tree-Level, Country-Scale Forest Monitoring : Abstract: The recent decline of the European forest carbon sink highlights the need for spatially explicit and frequently updated forest monitoring tools. Yet, existing satellite-based disturbance pro...
- Infinite-Homography as Robust Conditioning for Camera-Controlled Video Generation : Abstract: Recent progress in video diffusion models has spurred growing interest in camera-controlled novel-view video generation for dynamic scenes, aiming to provide creators with cinematic camera c...
- Interpretable Similarity of Synthetic Image Utility : Abstract: Synthetic medical image data can unlock the potential of deep learning (DL)-based clinical decision support (CDS) systems through the creation of large scale, privacy-preserving, training se...
- DGH: Dynamic Gaussian Hair : Abstract: The creation of photorealistic dynamic hair remains a major challenge in digital human modeling because of the complex motions, occlusions, and light scattering. Existing methods often resor...
- Predictive Modeling of Maritime Radar Data Using Transformer Architecture : Abstract: Maritime autonomous systems require robust predictive capabilities to anticipate vessel motion and environmental dynamics. While transformer architectures have revolutionized AIS-based traje...
- Pro-Pose: Unpaired Full-Body Portrait Synthesis via Canonical UV Maps : Abstract: Photographs of people taken by professional photographers typically present the person in beautiful lighting, with an interesting pose, and flattering quality. This is unlike common photos p...
- Text-Conditioned Background Generation for Editable Multi-Layer Documents : Abstract: We present a framework for document-centric background generation with multi-page editing and thematic continuity. To ensure text regions remain readable, we employ a \emph{latent masking} f...
- PhysFire-WM: A Physics-Informed World Model for Emulating Fire Spread Dynamics : Abstract: Fine-grained fire prediction plays a crucial role in emergency response. Infrared images and fire masks provide complementary thermal and boundary information, yet current methods are predom...
- Can Synthetic Images Serve as Effective and Efficient Class Prototypes? : Abstract: Vision-Language Models (VLMs) have shown strong performance in zero-shot image classification tasks. However, existing methods, including Contrastive Language-Image Pre-training (CLIP), all ...
- ABE-CLIP: Training-Free Attribute Binding Enhancement for Compositional Image-Text Matching : Abstract: Contrastive Language-Image Pretraining (CLIP) has achieved remarkable performance in various multimodal tasks. However, it still struggles with compositional image-text matching, particularl...
- It is not always greener on the other side: Greenery perception across demographics and personalities in multiple cities : Abstract: Quantifying and assessing urban greenery is consequential for planning and development, reflecting the everlasting importance of green spaces for multiple climate and well-being dimensions o...
- Globally Optimal Solution to the Generalized Relative Pose Estimation Problem using Affine Correspondences : Abstract: Mobile devices equipped with a multi-camera system and an inertial measurement unit (IMU) are widely used nowadays, such as self-driving cars. The task of relative pose estimation using visu...
- Anatomical Region-Guided Contrastive Decoding: A Plug-and-Play Strategy for Mitigating Hallucinations in Medical VLMs : Abstract: Medical Vision-Language Models (MedVLMs) show immense promise in clinical applicability. However, their reliability is hindered by hallucinations, where models often fail to derive answers f...
- Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs : Abstract: Exploration capacity shapes both inference-time performance and reinforcement learning (RL) training for large (vision-) language models, as stochastic sampling often yields redundant reason...
- DAVE: A VLM Vision Encoder for Document Understanding and Web Agents : Abstract: While Vision-language models (VLMs) have demonstrated remarkable performance across multi-modal tasks, their choice of vision encoders presents a fundamental weakness: their low-level featur...
- Any-Optical-Model: A Universal Foundation Model for Optical Remote Sensing : Abstract: Optical satellites, with their diverse band layouts and ground sampling distances, supply indispensable evidence for tasks ranging from ecosystem surveillance to emergency response. However,...
- Robust Scene Coordinate Regression via Geometrically-Consistent Global Descriptors : Abstract: Recent learning-based visual localization methods use global descriptors to disambiguate visually similar places, but existing approaches often derive these descriptors from geometric cues a...
- Learning When to Look: A Disentangled Curriculum for Strategic Perception in Multimodal Reasoning : Abstract: Multimodal Large Language Models (MLLMs) demonstrate significant potential but remain brittle in complex, long-chain visual reasoning tasks. A critical failure mode is "visual forgetting", w...
- Video Detective: Seek Critical Clues Recurrently to Answer Question from Long Videos : Abstract: Long Video Question-Answering (LVQA) presents a significant challenge for Multi-modal Large Language Models (MLLMs) due to immense context and overloaded information, which could also lead t...
- Mitty: Diffusion-based Human-to-Robot Video Generation : Abstract: Learning directly from human demonstration videos is a key milestone toward scalable and generalizable robot learning. Yet existing methods rely on intermediate representations such as keypo...
- AnyCXR: Human Anatomy Segmentation of Chest X-ray at Any Acquisition Position using Multi-stage Domain Randomized Synthetic Data with Imperfect Annotations and Conditional Joint Annotation Regularization Learning : Abstract: Robust anatomical segmentation of chest X-rays (CXRs) remains challenging due to the scarcity of comprehensive annotations and the substantial variability of real-world acquisition condition...
- Diagnostic Performance of Universal-Learning Ultrasound AI Across Multiple Organs and Tasks: the UUSIC25 Challenge : Abstract: IMPORTANCE: Current ultrasound AI remains fragmented into single-task tools, limiting clinical utility compared to versatile modern ultrasound systems. OBJECTIVE: To evaluate the diagnosti...
- Vision-Language Model Guided Image Restoration : Abstract: Many image restoration (IR) tasks require both pixel-level fidelity and high-level semantic understanding to recover realistic photos with fine-grained details. However, previous approaches ...
- Towards Pixel-Wise Anomaly Location for High-Resolution PCBA \\ via Self-Supervised Image Reconstruction : Abstract: Automated defect inspection of assembled Printed Circuit Board Assemblies (PCBA) is quite challenging due to the insufficient labeled data, micro-defects with just a few pixels in visually-c...
- ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration : Abstract: Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers ...
- MatLat: Material Latent Space for PBR Texture Generation : Abstract: We propose a generative framework for producing high-quality PBR textures on a given 3D mesh. As large-scale PBR texture datasets are scarce, our approach focuses on effectively leveraging t...
- EMAG: Self-Rectifying Diffusion Sampling with Exponential Moving Average Guidance : Abstract: In diffusion and flow-matching generative models, guidance techniques are widely used to improve sample quality and consistency. Classifier-free guidance (CFG) is the de facto choice in mode...
- Deep But Reliable: Advancing Multi-turn Reasoning for Thinking with Images : Abstract: Recent advances in large Vision-Language Models (VLMs) have exhibited strong reasoning capabilities on complex visual tasks by thinking with images in their Chain-of-Thought (CoT), which is ...
- CodeDance: A Dynamic Tool-integrated MLLM for Executable Visual Reasoning : Abstract: Recent releases such as o3 highlight human-like "thinking with images" reasoning that combines structured tool use with stepwise verification, yet most open-source approaches still rely on t...
- Auxiliary Descriptive Knowledge for Few-Shot Adaptation of Vision-Language Model : Abstract: Despite the impressive zero-shot capabilities of Vision-Language Models (VLMs), they often struggle in downstream tasks with distribution shifts from the pre-training data. Few-Shot Adaptati...
- EMMA: Concept Erasure Benchmark with Comprehensive Semantic Metrics and Diverse Categories : Abstract: The widespread adoption of text-to-image (T2I) generation has raised concerns about privacy, bias, and copyright violations. Concept erasure techniques offer a promising solution by selectiv...
- Rotterdam artery-vein segmentation (RAV) dataset : Abstract: Purpose: To provide a diverse, high-quality dataset of color fundus images (CFIs) with detailed artery-vein (A/V) segmentation annotations, supporting the development and evaluation of machi...
- DESSERT: Diffusion-based Event-driven Single-frame Synthesis via Residual Training : Abstract: Video frame prediction extrapolates future frames from previous frames, but suffers from prediction errors in dynamic scenes due to the lack of information about the next frame. Event camera...
- Democratizing Pathology Co-Pilots: An Open Pipeline and Dataset for Whole-Slide Vision-Language Modelling : Abstract: Vision-language models (VLMs) have the potential to become co-pilots for pathologists. However, most VLMs either focus on small regions of interest within whole-slide images, provide only st...
- SynergyWarpNet: Attention-Guided Cooperative Warping for Neural Portrait Animation : Abstract: Recent advances in neural portrait animation have demonstrated remarked potential for applications in virtual avatars, telepresence, and digital content creation. However, traditional explic...
- Multi-level distortion-aware deformable network for omnidirectional image super-resolution : Abstract: As augmented reality and virtual reality applications gain popularity, image processing for OmniDirectional Images (ODIs) has attracted increasing attention. OmniDirectional Image Super-Reso...
- Beyond Semantic Features: Pixel-level Mapping for Generalized AI-Generated Image Detection : Abstract: The rapid evolution of generative technologies necessitates reliable methods for detecting AI-generated images. A critical limitation of current detectors is their failure to generalize to i...
- Towards Deeper Emotional Reflection: Crafting Affective Image Filters with Generative Priors : Abstract: Social media platforms enable users to express emotions by posting text with accompanying images. In this paper, we propose the Affective Image Filter (AIF) task, which aims to reflect visua...
- Beyond Occlusion: In Search for Near Real-Time Explainability of CNN-Based Prostate Cancer Classification : Abstract: Deep neural networks are starting to show their worth in critical applications such as assisted cancer diagnosis. However, for their outputs to get accepted in practice, the results they pro...
- AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments : Abstract: Accurate flood detection from visual data is a critical step toward improving disaster response and risk assessment, yet datasets for flood segmentation remain scarce due to the challenges o...
- Xiaomi MiMo-VL-Miloco Technical Report : Abstract: We open-source \textbf{MiMo-VL-Miloco-7B} and its quantized variant \textbf{MiMo-VL-Miloco-7B-GGUF}, a pair of home-centric vision-language models that achieve strong performance on both hom...
- LangDriveCTRL: Natural Language Controllable Driving Scene Editing with Multi-modal Agents : Abstract: LangDriveCTRL is a natural-language-controllable framework for editing real-world driving videos to synthesize diverse traffic scenarios. It leverages explicit 3D scene decomposition to repr...
- 3D-RE-GEN: 3D Reconstruction of Indoor Scenes with a Generative Framework : Abstract: Recent advances in 3D scene generation produce visually appealing output, but current representations hinder artists' workflows that require modifiable 3D textured mesh scenes for visual eff...
- LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models : Abstract: Current text-to-image (T2I) models have demonstrated remarkable progress in creative image generation, yet they still lack precise control over scene illuminants, which is a crucial factor f...
- MMLANDMARKS: a Cross-View Instance-Level Benchmark for Geo-Spatial Understanding : Abstract: Geo-spatial analysis of our world benefits from a multimodal approach, as every single geographic location can be described in numerous ways (images from various viewpoints, textual descript...
- GroundingME: Exposing the Visual Grounding Gap in MLLMs through Multi-Dimensional Evaluation : Abstract: Visual grounding, localizing objects from natural language descriptions, represents a critical bridge between language and vision understanding. While multimodal large language models (MLLMs...
- Validation of Diagnostic Artificial Intelligence Models for Prostate Pathology in a Middle Eastern Cohort : Abstract: Background: Artificial intelligence (AI) is improving the efficiency and accuracy of cancer diagnostics. The performance of pathology AI systems has been almost exclusively evaluated on Euro...
- Foundation Model Priors Enhance Object Focus in Feature Space for Source-Free Object Detection : Abstract: Current state-of-the-art approaches in Source-Free Object Detection (SFOD) typically rely on Mean-Teacher self-labeling. However, domain shift often reduces the detector's ability to maintai...
- FLEG: Feed-Forward Language Embedded Gaussian Splatting from Any Views : Abstract: We present FLEG, a feed-forward network that reconstructs language-embedded 3D Gaussians from any views. Previous straightforward solutions combine feed-forward reconstruction with Gaussian ...
- G3Splat: Geometrically Consistent Generalizable Gaussian Splatting : Abstract: 3D Gaussians have recently emerged as an effective scene representation for real-time splatting and accurate novel-view synthesis, motivating several works to adapt multi-view structure pred...
- RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis : Abstract: Virtual furniture synthesis, which seamlessly integrates reference objects into indoor scenes while maintaining geometric coherence and visual realism, holds substantial promise for home des...
- 3One2: One-step Regression Plus One-step Diffusion for One-hot Modulation in Dual-path Video Snapshot Compressive Imaging : Abstract: Video snapshot compressive imaging (SCI) captures dynamic scene sequences through a two-dimensional (2D) snapshot, fundamentally relying on optical modulation for hardware compression and th...
- Medical Imaging AI Competitions Lack Fairness : Abstract: Benchmarking competitions are central to the development of artificial intelligence (AI) in medical imaging, defining performance standards and shaping methodological progress. However, it r...
- HeadHunt-VAD: Hunting Robust Anomaly-Sensitive Heads in MLLM for Tuning-Free Video Anomaly Detection : Abstract: Video Anomaly Detection (VAD) aims to locate events that deviate from normal patterns in videos. Traditional approaches often rely on extensive labeled data and incur high computational cost...
- Semi-Supervised 3D Segmentation for Type-B Aortic Dissection with Slim UNETR : Abstract: Convolutional neural networks (CNN) for multi-class segmentation of medical images are widely used today. Especially models with multiple outputs that can separately predict segmentation cla...
- Self-Supervised Weighted Image Guided Quantitative MRI Super-Resolution : Abstract: High-resolution (HR) quantitative MRI (qMRI) relaxometry provides objective tissue characterization but remains clinically underutilized due to lengthy acquisition times. We propose a physic...
- StereoMV2D: A Sparse Temporal Stereo-Enhanced Framework for Robust Multi-View 3D Object Detection : Abstract: Multi-view 3D object detection is a fundamental task in autonomous driving perception, where achieving a balance between detection accuracy and computational efficiency remains crucial. Spar...
- PathFLIP: Fine-grained Language-Image Pretraining for Versatile Computational Pathology : Abstract: While Vision-Language Models (VLMs) have achieved notable progress in computational pathology (CPath), the gigapixel scale and spatial heterogeneity of Whole Slide Images (WSIs) continue to ...
- Generative Human-Object Interaction Detection via Differentiable Cognitive Steering of Multi-modal LLMs : Abstract: Human-object interaction (HOI) detection aims to localize human-object pairs and the interactions between them. Existing methods operate under a closed-world assumption, treating the task as...
- Region-Constraint In-Context Generation for Instructional Video Editing : Abstract: The In-context generation paradigm recently has demonstrated strong power in instructional image editing with both data efficiency and synthesis quality. Nevertheless, shaping such in-contex...
- Bitbox: Behavioral Imaging Toolbox for Computational Analysis of Behavior from Videos : Abstract: Computational measurement of human behavior from video has recently become feasible due to major advances in AI. These advances now enable granular and precise quantification of facial expre...
- FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation : Abstract: We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses ...
- SAVeD: A First-Person Social Media Video Dataset for ADAS-equipped vehicle Near-Miss and Crash Event Analyses : Abstract: The advancement of safety-critical research in driving behavior in ADAS-equipped vehicles require real-world datasets that not only include diverse traffic scenarios but also capture high-ri...
- MambaMIL+: Modeling Long-Term Contextual Patterns for Gigapixel Whole Slide Image : Abstract: Whole-slide images (WSIs) are an important data modality in computational pathology, yet their gigapixel resolution and lack of fine-grained annotations challenge conventional deep learning ...
- AdaptPrompt: Parameter-Efficient Adaptation of VLMs for Generalizable Deepfake Detection : Abstract: Recent advances in image generation have led to the widespread availability of highly realistic synthetic media, increasing the difficulty of reliable deepfake detection. A key challenge is ...
- LiteGE: Lightweight Geodesic Embedding for Efficient Geodesics Computation and Non-Isometric Shape Correspondence : Abstract: Computing geodesic distances on 3D surfaces is fundamental to many tasks in 3D vision and geometry processing, with deep connections to tasks such as shape correspondence. Recent learning-ba...
- UrbanDIFF: A Denoising Diffusion Model for Spatial Gap Filling of Urban Land Surface Temperature Under Dense Cloud Cover : Abstract: Satellite-derived Land Surface Temperature (LST) products are central to surface urban heat island (SUHI) monitoring due to their consistent grid-based coverage over large metropolitan regio...
- Long-Range depth estimation using learning based Hybrid Distortion Model for CCTV cameras : Abstract: Accurate camera models are essential for photogrammetry applications such as 3D mapping and object localization, particularly for long distances. Various stereo-camera based 3D localization ...
- Chorus: Multi-Teacher Pretraining for Holistic 3D Gaussian Scene Encoding : Abstract: While 3DGS has emerged as a high-fidelity scene representation, encoding rich, general-purpose features directly from its primitives remains under-explored. We address this gap by introducin...
- ReX-MLE: The Autonomous Agent Benchmark for Medical Imaging Challenges : Abstract: Autonomous coding agents built on large language models (LLMs) can now solve many general software and machine learning tasks, but they remain ineffective on complex, domain-specific scienti...
- Simulation-Driven Deep Learning Framework for Raman Spectral Denoising Under Fluorescence-Dominant Conditions : Abstract: Raman spectroscopy enables non-destructive, label-free molecular analysis with high specificity, making it a powerful tool for biomedical diagnostics. However, its application to biological ...
- InSPECT: Invariant Spectral Features Preservation of Diffusion Models : Abstract: Modern diffusion models (DMs) have achieved state-of-the-art image generation. However, the fundamental design choice of diffusing data all the way to white noise and then reconstructing it ...
- Keypoint Counting Classifiers: Turning Vision Transformers into Self-Explainable Models Without Training : Abstract: Current approaches for designing self-explainable models (SEMs) require complicated training procedures and specific architectures which makes them impractical. With the advance of general p...
- Diffusion Forcing for Multi-Agent Interaction Sequence Modeling : Abstract: Understanding and generating multi-person interactions is a fundamental challenge with broad implications for robotics and social computing. While humans naturally coordinate in groups, mode...
- Dexterous World Models : Abstract: Recent progress in 3D reconstruction has made it easy to create realistic digital twins from everyday environments. However, current digital twins remain largely static and are limited to na...
- Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing : Abstract: Modern Latent Diffusion Models (LDMs) typically operate in low-level Variational Autoencoder (VAE) latent spaces that are primarily optimized for pixel-level reconstruction. To unify vision ...
- Adaptive Covariance and Quaternion-Focused Hybrid Error-State EKF/UKF for Visual-Inertial Odometry : Abstract: This study presents an innovative hybrid Visual-Inertial Odometry (VIO) method for Unmanned Aerial Vehicles (UAVs) that is resilient to environmental challenges and capable of dynamically as...
- A Certified Unlearning Approach without Access to Source Data : Abstract: With the growing adoption of data privacy regulations, the ability to erase private or copyrighted information from trained models has become a crucial requirement. Traditional unlearning me...
- Training Deep Physics-Informed Kolmogorov-Arnold Networks : Abstract: Since their introduction, Kolmogorov-Arnold Networks (KANs) have been successfully applied across several domains, with physics-informed machine learning (PIML) emerging as one of the areas ...
- Differentially private Bayesian tests : Abstract: Differential privacy has emerged as an significant cornerstone in the realm of scientific hypothesis testing utilizing confidential data. In reporting scientific discoveries, Bayesian tests ...
- SCAFFLSA: Taming Heterogeneity in Federated Linear Stochastic Approximation and TD Learning : Abstract: In this paper, we analyze the sample and communication complexity of the federated linear stochastic approximation (FedLSA) algorithm. We explicitly quantify the effects of local training wi...
- Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? : Abstract: Many machine learning models require setting a parameter that controls their size before training, e.g. number of neurons in DNNs, or inducing points in GPs. Increasing capacity typically im...
- Non-Perturbative Trivializing Flows for Lattice Gauge Theories : Abstract: Continuous normalizing flows are known to be highly expressive and flexible, which allows for easier incorporation of large symmetries and makes them a powerful computational tool for lattic...
- Dynamic PET Image Prediction Using a Network Combining Reversible and Irreversible Modules : Abstract: Dynamic positron emission tomography (PET) images can reveal the distribution of tracers in the organism and the dynamic processes involved in biochemical reactions, and it is widely used in...
- Targeted Learning for Variable Importance : Abstract: Variable importance is one of the most widely used measures for interpreting machine learning with significant interest from both statistics and machine learning communities. Recently, incre...
- Refined Analysis of Federated Averaging and Federated Richardson-Romberg : Abstract: In this paper, we present a novel analysis of \FedAvg with constant step size, relying on the Markov property of the underlying process. We demonstrate that the global iterates of the algori...
- Embedding-Driven Data Distillation for 360-Degree IQA With Residual-Aware Refinement : Abstract: This article identifies and addresses a fundamental bottleneck in data-driven 360-degree image quality assessment (IQA): the lack of intelligent, sample-level data selection. Hence, we propo...
- 3D Cell Oversegmentation Correction via Geo-Wasserstein Divergence : Abstract: 3D cell segmentation methods are often hindered by \emph{oversegmentation}, where a single cell is incorrectly split into multiple fragments. This degrades the final segmentation quality and...
- Unified Acoustic Representations for Screening Neurological and Respiratory Pathologies from Voice : Abstract: Voice-based health assessment offers unprecedented opportunities for scalable, non-invasive disease screening, yet existing approaches typically focus on single conditions and fail to levera...
- XLM: A Python package for non-autoregressive language models : Abstract: In recent years, there has been a resurgence of interest in non-autoregressive text generation in the context of general language modeling. Unlike the well-established autoregressive languag...
- Data Augmentation Supporting a Conversational Agent Designed for Smoking Cessation Support Groups : Abstract: Online support groups for smoking cessation are economical and accessible, yet they often face challenges with low user engagement and stigma. The use of an automatic conversational agent wo...
- Enhancing Long Document Long Form Summarisation with Self-Planning : Abstract: We introduce a novel approach for long context summarisation, highlight-guided generation, that leverages sentence-level information as a content plan to improve the traceability and faithfu...
- Mindscape-Aware Retrieval Augmented Generation for Improved Long Context Understanding : Abstract: Humans understand long and complex texts by relying on a holistic semantic representation of the content. This global view helps organize prior knowledge, interpret new information, and inte...
- Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience : Abstract: Large language models have recently made significant progress to generate rigorous mathematical proofs. In contrast, utilizing LLMs for theorem proving in formal languages (such as Lean) rem...
- Governance-Aware Hybrid Fine-Tuning for Multilingual Large Language Models : Abstract: We present a governance-aware hybrid fine-tuning framework for multilingual, low-resource adaptation of large language models. The core algorithm combines gradient-aligned low-rank updates w...
- Stakeholder Suite: A Unified AI Framework for Mapping Actors, Topics and Arguments in Public Debates : Abstract: Public debates surrounding infrastructure and energy projects involve complex networks of stakeholders, arguments, and evolving narratives. Understanding these dynamics is crucial for antici...
- Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers : Abstract: Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by...
- UCoder: Unsupervised Code Generation by Internal Probing of Large Language Models : Abstract: Large language models (LLMs) have demonstrated remarkable capabilities in code generation tasks. However, their effectiveness heavily relies on supervised training with extensive labeled (e....
- Are Vision Language Models Cross-Cultural Theory of Mind Reasoners? : Abstract: Theory of Mind (ToM) -- the ability to attribute beliefs, desires, and emotions to others -- is fundamental for human social intelligence, yet remains a major challenge for artificial agents...
- Linear Personality Probing and Steering in LLMs: A Big Five Study : Abstract: Large language models (LLMs) exhibit distinct and consistent personalities that greatly impact trust and engagement. While this means that personality frameworks would be highly valuable too...
- Simulstream: Open-Source Toolkit for Evaluation and Demonstration of Streaming Speech-to-Text Translation Systems : Abstract: Streaming Speech-to-Text Translation (StreamST) requires producing translations concurrently with incoming speech, imposing strict latency constraints and demanding models that balance parti...
- Peeking Into The Future For Contextual Biasing : Abstract: While end-to-end (E2E) automatic speech recognition (ASR) models excel at general transcription, they struggle to recognize rare or unseen named entities (e.g., contact names, locations), wh...
- Toward Ethical AI Through Bayesian Uncertainty in Neural Question Answering : Abstract: We explore Bayesian reasoning as a means to quantify uncertainty in neural networks for question answering. Starting with a multilayer perceptron on the Iris dataset, we show how posterior i...
- When the Gold Standard isn't Necessarily Standard: Challenges of Evaluating the Translation of User-Generated Content : Abstract: User-generated content (UGC) is characterised by frequent use of non-standard language, from spelling errors to expressive choices such as slang, character repetitions, and emojis. This make...
- Affect, Body, Cognition, Demographics, and Emotion: The ABCDE of Text Features for Computational Affective Science : Abstract: Work in Computational Affective Science and Computational Social Science explores a wide variety of research questions about people, emotions, behavior, and health. Such work often relies on...
- DEER: A Comprehensive and Reliable Benchmark for Deep-Research Expert Reports : Abstract: As large language models (LLMs) advance, deep research systems can generate expert-level reports via multi-step reasoning and evidence-based synthesis, but evaluating such reports remains ch...
- CIFE: Code Instruction-Following Evaluation : Abstract: Large Language Models (LLMs) are increasingly applied to real-world code generation, where functional correctness alone is insufficient for reliable deployment, developers also expect adhere...
- Computational analysis reveals historical trajectory of East-Polynesian lunar calendars : Abstract: We investigate a type of lunar calendar known as lists of the 'nights of the moon', found throughout East Polynesia, including Rapa Nui (Easter Island). Using computational methods, we analy...
- Mapping the Podcast Ecosystem with the Structured Podcast Research Corpus : Abstract: Podcasts provide highly diverse content to a massive listener base through a unique on-demand modality. However, limited data has prevented large-scale computational analysis of the podcast ...
- Generating Completions for Broca's Aphasic Sentences Using Large Language Models : Abstract: Broca's aphasia is a type of aphasia characterized by non-fluent, effortful and agrammatic speech production with relatively good comprehension. Since traditional aphasia treatment methods a...
- Clean Up the Mess: Addressing Data Pollution in Cryptocurrency Abuse Reporting Services : Abstract: Cryptocurrency abuse reporting services are a valuable data source about abusive blockchain addresses, prevalent types of cryptocurrency abuse, and their financial impact on victims. However...
- Comparison of deep learning models: CNN and VGG-16 in identifying pornographic content : Abstract: In 2020, a total of 59,741 websites were blocked by the Indonesian government due to containing negative content, including pornography, with 14,266 websites falling into this category. Howe...
- SHARP-QoS: Sparsely-gated Hierarchical Adaptive Routing for joint Prediction of QoS : Abstract: Dependable service-oriented computing relies on multiple Quality of Service (QoS) parameters that are essential to assess service optimality. However, real-world QoS data are extremely spars...
- A Theoretical Analysis of State Similarity Between Markov Decision Processes : Abstract: The bisimulation metric (BSM) is a powerful tool for analyzing state similarities within a Markov decision process (MDP), revealing that states closer in BSM have more similar optimal value ...
- MINPO: Memory-Informed Neural Pseudo-Operator to Resolve Nonlocal Spatiotemporal Dynamics : Abstract: Many physical systems exhibit nonlocal spatiotemporal behaviors described by integro-differential equations (IDEs). Classical methods for solving IDEs require repeatedly evaluating convoluti...
- Alzheimer's Disease Brain Network Mining : Abstract: Machine learning approaches for Alzheimer's disease (AD) diagnosis face a fundamental challenges. Clinical assessments are expensive and invasive, leaving ground truth labels available for o...
- Task Schema and Binding: A Double Dissociation Study of In-Context Learning : Abstract: We provide causal mechanistic validation that in-context learning (ICL) decomposes into two separable mechanisms: Task Schema (abstract task type recognition) and Binding (specific input-out...
- Adversarially Robust Detection of Harmful Online Content: A Computational Design Science Approach : Abstract: Social media platforms are plagued by harmful content such as hate speech, misinformation, and extremist rhetoric. Machine learning (ML) models are widely adopted to detect such content; how...
- AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens : Abstract: Reward models and LLM-as-a-Judge systems are central to modern post-training pipelines such as RLHF, DPO, and RLAIF, where they provide scalar feedback and binary decisions that guide model ...
- DeepShare: Sharing ReLU Across Channels and Layers for Efficient Private Inference : Abstract: Private Inference (PI) uses cryptographic primitives to perform privacy preserving machine learning. In this setting, the owner of the network runs inference on the data of the client withou...
- meval: A Statistical Toolbox for Fine-Grained Model Performance Analysis : Abstract: Analyzing machine learning model performance stratified by patient and recording properties is becoming the accepted norm and often yields crucial insights about important model failure mode...
- Deep Learning-Based Surrogate Creep Modelling in Inconel 625: A High-Temperature Alloy Study : Abstract: Time-dependent deformation, particularly creep, in high-temperature alloys such as Inconel 625 is a key factor in the long-term reliability of components used in aerospace and energy systems...
- NetworkFF: Unified Layer Optimization in Forward-Only Neural Networks : Abstract: The Forward-Forward algorithm eliminates backpropagation's memory constraints and biological implausibility through dual forward passes with positive and negative data. However, conventional...
- Bayesian Optimisation: Which Constraints Matter? : Abstract: Bayesian optimisation has proven to be a powerful tool for expensive global black-box optimisation problems. In this paper, we propose new Bayesian optimisation variants of the popular Knowl...
- Machine Learning for Static and Single-Event Dynamic Complex Network Analysis : Abstract: The primary objective of this thesis is to develop novel algorithmic approaches for Graph Representation Learning of static and single-event dynamic networks. In such a direction, we focus o...
- Learning Safe Autonomous Driving Policies Using Predictive Safety Representations : Abstract: Safe reinforcement learning (SafeRL) is a prominent paradigm for autonomous driving, where agents are required to optimize performance under strict safety requirements. This dual objective c...
- Sharing Knowledge without Sharing Data: Stitches can improve ensembles of disjointly trained models : Abstract: Deep learning has been shown to be very capable at performing many real-world tasks. However, this performance is often dependent on the presence of large and varied datasets. In some settin...
- A Unified Representation of Neural Networks Architectures : Abstract: In this paper we consider the limiting case of neural networks (NNs) architectures when the number of neurons in each hidden layer and the number of hidden layers tend to infinity thus formi...
- A Systems-Theoretic View on the Convergence of Algorithms under Disturbances : Abstract: Algorithms increasingly operate within complex physical, social, and engineering systems where they are exposed to disturbances, noise, and interconnections with other dynamical systems. Thi...
- Estimating Spatially Resolved Radiation Fields Using Neural Networks : Abstract: We present an in-depth analysis on how to build and train neural networks to estimate the spatial distribution of scattered radiation fields for radiation protection dosimetry in medical rad...
- Polyharmonic Cascade : Abstract: This paper presents a deep machine learning architecture, the "polyharmonic cascade" -- a sequence of packages of polyharmonic splines, where each layer is rigorously derived from the theory...
- Convergence Guarantees for Federated SARSA with Local Training and Heterogeneous Agents : Abstract: We present a novel theoretical analysis of Federated SARSA (FedSARSA) with linear function approximation and local training. We establish convergence guarantees for FedSARSA in the presence ...
- Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting : Abstract: The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity represe...
- Mitigating Forgetting in Low Rank Adaptation : Abstract: Parameter-efficient fine-tuning methods, such as Low-Rank Adaptation (LoRA), enable fast specialization of large pre-trained models to different downstream applications. However, this proces...
- Can You Hear Me Now? A Benchmark for Long-Range Graph Propagation : Abstract: Effectively capturing long-range interactions remains a fundamental yet unresolved challenge in graph neural network (GNN) research, critical for applications across diverse fields of scienc...
- Calibratable Disambiguation Loss for Multi-Instance Partial-Label Learning : Abstract: Multi-instance partial-label learning (MIPL) is a weakly supervised framework that extends the principles of multi-instance learning (MIL) and partial-label learning (PLL) to address the cha...
- Exploiting ID-Text Complementarity via Ensembling for Sequential Recommendation : Abstract: Modern Sequential Recommendation (SR) models commonly utilize modality features to represent items, motivated in large part by recent advancements in language and vision modeling. To do so, ...
- Regularized Random Fourier Features and Finite Element Reconstruction for Operator Learning in Sobolev Space : Abstract: Operator learning is a data-driven approximation of mappings between infinite-dimensional function spaces, such as the solution operators of partial differential equations. Kernel-based oper...
- SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization : Abstract: Retrieving code units (e.g., files, classes, functions) that are semantically relevant to a given user query, bug report, or feature request from large codebases is a fundamental challenge f...
- Colormap-Enhanced Vision Transformers for MRI-Based Multiclass (4-Class) Alzheimer's Disease Classification : Abstract: Magnetic Resonance Imaging (MRI) plays a pivotal role in the early diagnosis and monitoring of Alzheimer's disease (AD). However, the subtle structural variations in brain MRI scans often po...
- Perturb Your Data: Paraphrase-Guided Training Data Watermarking : Abstract: Training data detection is critical for enforcing copyright and data licensing, as Large Language Models (LLM) are trained on massive text corpora scraped from the internet. We present SPECT...
- Disentangled representations via score-based variational autoencoders : Abstract: We present the Score-based Autoencoder for Multiscale Inference (SAMI), a method for unsupervised representation learning that combines the theoretical frameworks of diffusion models and VAE...
- Biosecurity-Aware AI: Agentic Risk Auditing of Soft Prompt Attacks on ESM-Based Variant Predictors : Abstract: Genomic Foundation Models (GFMs), such as Evolutionary Scale Modeling (ESM), have demonstrated remarkable success in variant effect prediction. However, their security and robustness under a...
- Application of machine learning to predict food processing level using Open Food Facts : Abstract: Ultra-processed foods are increasingly linked to health issues like obesity, cardiovascular disease, type 2 diabetes, and mental health disorders due to poor nutritional quality. This first-...
- Do Foundational Audio Encoders Understand Music Structure? : Abstract: In music information retrieval (MIR) research, the use of pretrained foundational audio encoders (FAEs) has recently become a trend. FAEs pretrained on large amounts of music and audio data ...
- CheXPO-v2: Preference Optimization for Chest X-ray VLMs with Knowledge Graph Consistency : Abstract: Medical Vision-Language Models (VLMs) are prone to hallucinations, compromising clinical reliability. While reinforcement learning methods like Group Relative Policy Optimization (GRPO) offe...
- Machine Learning Assisted Parameter Tuning on Wavelet Transform Amorphous Radial Distribution Function : Abstract: Understanding atomic structures is crucial, yet amorphous materials remain challenging due to their irregular and non-periodic nature. The wavelet-transform radial distribution function (WT-...
- Practical Framework for Privacy-Preserving and Byzantine-robust Federated Learning : Abstract: Federated Learning (FL) allows multiple clients to collaboratively train a model without sharing their private data. However, FL is vulnerable to Byzantine attacks, where adversaries manipul...
- Warmer for Less: A Cost-Efficient Strategy for Cold-Start Recommendations at Pinterest : Abstract: Pinterest is a leading visual discovery platform where recommender systems (RecSys) are key to delivering relevant, engaging, and fresh content to our users. In this paper, we study the prob...
- LibriVAD: A Scalable Open Dataset with Deep Learning Benchmarks for Voice Activity Detection : Abstract: Robust Voice Activity Detection (VAD) remains a challenging task, especially under noisy, diverse, and unseen acoustic conditions. Beyond algorithmic development, a key limitation in advanci...
- Penalized Fair Regression for Multiple Groups in Chronic Kidney Disease : Abstract: Fair regression methods have the potential to mitigate societal bias concerns in health care, but there has been little work on penalized fair regression when multiple groups experience such...
- Sharp Structure-Agnostic Lower Bounds for General Functional Estimation : Abstract: The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong struc...
- Timely Information Updating for Mobile Devices Without and With ML Advice : Abstract: This paper investigates an information update system in which a mobile device monitors a physical process and sends status updates to an access point (AP). A fundamental trade-off arises bet...
- Perfect reconstruction of sparse signals using nonconvexity control and one-step RSB message passing : Abstract: We consider sparse signal reconstruction via minimization of the smoothly clipped absolute deviation (SCAD) penalty, and develop one-step replica-symmetry-breaking (1RSB) extensions of appro...
- MULTIAQUA: A multimodal maritime dataset and robust training strategies for multimodal semantic segmentation : Abstract: Unmanned surface vehicles can encounter a number of varied visual circumstances during operation, some of which can be very difficult to interpret. While most cases can be solved only using ...
- When Data Quality Issues Collide: A Large-Scale Empirical Study of Co-Occurring Data Quality Issues in Software Defect Prediction : Abstract: Software Defect Prediction (SDP) models are central to proactive software quality assurance, yet their effectiveness is often constrained by the quality of available datasets. Prior research...
- Linear Attention for Joint Power Optimization and User-Centric Clustering in Cell-Free Networks : Abstract: Optimal AP clustering and power allocation are critical in user-centric cell-free massive MIMO systems. Existing deep learning models lack flexibility to handle dynamic network configuration...
- Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions : Abstract: We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n...
- TwinSegNet: A Digital Twin-Enabled Federated Learning Framework for Brain Tumor Analysis : Abstract: Brain tumor segmentation is critical in diagnosis and treatment planning for the disease. Yet, current deep learning methods rely on centralized data collection, which raises privacy concern...
- Resource-efficient medical image classification for edge devices : Abstract: Medical image classification is a critical task in healthcare, enabling accurate and timely diagnosis. However, deploying deep learning models on resource-constrained edge devices presents s...
- PathBench-MIL: A Comprehensive AutoML and Benchmarking Framework for Multiple Instance Learning in Histopathology : Abstract: We introduce PathBench-MIL, an open-source AutoML and benchmarking framework for multiple instance learning (MIL) in histopathology. The system automates end-to-end MIL pipeline construction...
- Enabling Disaggregated Multi-Stage MLLM Inference via GPU-Internal Scheduling and Resource Sharing : Abstract: Multimodal large language models (MLLMs) extend LLMs with visual understanding through a three-stage pipeline: multimodal preprocessing, vision encoding, and LLM inference. While these stage...
- SkinGenBench: Generative Model and Preprocessing Effects for Synthetic Dermoscopic Augmentation in Melanoma Diagnosis : Abstract: This work introduces SkinGenBench, a systematic biomedical imaging benchmark that investigates how preprocessing complexity interacts with generative model choice for synthetic dermoscopic i...
- Confidence-Credibility Aware Weighted Ensembles of Small LLMs Outperform Large LLMs in Emotion Detection : Abstract: This paper introduces a confidence-weighted, credibility-aware ensemble framework for text-based emotion detection, inspired by Condorcet's Jury Theorem (CJT). Unlike conventional ensembles ...
- Generative Multi-Objective Bayesian Optimization with Scalable Batch Evaluations for Sample-Efficient De Novo Molecular Design : Abstract: Designing molecules that must satisfy multiple, often conflicting objectives is a central challenge in molecular discovery. The enormous size of chemical space and the cost of high-fidelity ...
- Fraud detection in credit card transactions using Quantum-Assisted Restricted Boltzmann Machines : Abstract: Use cases for emerging quantum computing platforms become economically relevant as the efficiency of processing and availability of quantum computers increase. We assess the performance of R...
- Vidarc: Embodied Video Diffusion Model for Closed-loop Control : Abstract: Robotic arm manipulation in data-scarce settings is a highly challenging task due to the complex embodiment dynamics and diverse contexts. Recent video-based approaches have shown great prom...
- Imputation Uncertainty in Interpretable Machine Learning Methods : Abstract: In real data, missing values occur frequently, which affects the interpretation with interpretable machine learning (IML) methods. Recent work considers bias and shows that model explanation...
- Revisiting the Broken Symmetry Phase of Solid Hydrogen: A Neural Network Variational Monte Carlo Study : Abstract: The crystal structure of high-pressure solid hydrogen remains a fundamental open problem. Although the research frontier has mostly shifted toward ultra-high pressure phases above 400 GPa, w...
- Breast Cancer Neoadjuvant Chemotherapy Treatment Response Prediction Using Aligned Longitudinal MRI and Clinical Data : Abstract: Aim: This study investigates treatment response prediction to neoadjuvant chemotherapy (NACT) in breast cancer patients, using longitudinal contrast-enhanced magnetic resonance images (CE-MR...
- Domain-Aware Quantum Circuit for QML : Abstract: Designing parameterized quantum circuits (PQCs) that are expressive, trainable, and robust to hardware noise is a central challenge for quantum machine learning (QML) on noisy intermediate-s...
- Visually Prompted Benchmarks Are Surprisingly Fragile : Abstract: A key challenge in evaluating VLMs is testing models' ability to analyze visual content independently from their textual priors. Recent benchmarks such as BLINK probe visual perception throu...
- Learning vertical coordinates via automatic differentiation of a dynamical core : Abstract: Terrain-following coordinates in atmospheric models often imprint their grid structure onto the solution, particularly over steep topography, where distorted coordinate layers can generate s...
- Distributionally Robust Imitation Learning: Layered Control Architecture for Certifiable Autonomy : Abstract: Imitation learning (IL) enables autonomous behavior by learning from expert demonstrations. While more sample-efficient than comparative alternatives like reinforcement learning, IL is sensi...
- Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection : Abstract: Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detectio...
- HGQ: High Granularity Quantization for Real-time Neural Networks on FPGAs : Abstract: Neural networks with sub-microsecond inference latency are required by many critical applications. Targeting such applications deployed on FPGAs, we present High Granularity Quantization (HG...
- On the Identification of Temporally Causal Representation with Instantaneous Dependence : Abstract: Temporally causal representation learning aims to identify the latent causal process from time series observations, but most methods require the assumption that the latent causal processes d...
- Low-Rank Filtering and Smoothing for Sequential Deep Learning : Abstract: Learning multiple tasks sequentially requires neural networks to balance retaining knowledge, yet being flexible enough to adapt to new tasks. Regularizing network parameters is a common app...
- Hierarchical Multimodal LLMs with Semantic Space Alignment for Enhanced Time Series Classification : Abstract: Time series classification plays a fundamental role in a wide range of real-world applications. Recently, large language models (LLMs) have demonstrated strong generalization and reasoning c...
- Fairness via Independence: A (Conditional) Distance Covariance Framework : Abstract: We explore fairness from a statistical perspective by selectively utilizing either conditional distance covariance or distance covariance statistics as measures to assess the independence be...
- Data for Mathematical Copilots: Better Ways of Presenting Proofs for Machine Learning : Abstract: The datasets and benchmarks commonly used to train and evaluate the mathematical capabilities of AI-based mathematical copilots (primarily large language models) exhibit several shortcomings...
- Regularized Langevin Dynamics for Combinatorial Optimization : Abstract: This work proposes a simple yet effective sampling framework for combinatorial optimization (CO). Our method builds on discrete Langevin dynamics (LD), an efficient gradient-guided generativ...
- ShareChat: A Dataset of Chatbot Conversations in the Wild : Abstract: While Large Language Models (LLMs) have evolved into distinct platforms with unique interface designs and capabilities, existing public datasets treat models as generic text generators, stri...
- Planning as Descent: Goal-Conditioned Latent Trajectory Synthesis in Learned Energy Landscapes : Abstract: We present Planning as Descent (PaD), a framework for offline goal-conditioned reinforcement learning that grounds trajectory synthesis in verification. Instead of learning a policy or expli...
- Integrating Computational Methods and AI into Qualitative Studies of Aging and Later Life : Abstract: This chapter demonstrates how computational social science (CSS) tools are extending and expanding research on aging. The depth and context from traditionally qualitative methods such as par...
- InfSplign: Inference-Time Spatial Alignment of Text-to-Image Diffusion Models : Abstract: Text-to-image (T2I) diffusion models generate high-quality images but often fail to capture the spatial relations specified in text prompts. This limitation can be traced to two factors: lac...
- AnyTask: an Automated Task and Data Generation Framework for Advancing Sim-to-Real Policy Learning : Abstract: Generalist robot learning remains constrained by data: large-scale, diverse, and high-quality interaction data are expensive to collect in the real world. While simulation has become a promi...
- Interpretable Plant Leaf Disease Detection Using Attention-Enhanced CNN : Abstract: Plant diseases pose a significant threat to global food security, necessitating accurate and interpretable disease detection methods. This study introduces an interpretable attention-guided ...
- Weighted Stochastic Differential Equation to Implement Wasserstein-Fisher-Rao Gradient Flow : Abstract: Score-based diffusion models currently constitute the state of the art in continuous generative modeling. These methods are typically formulated via overdamped or underdamped Ornstein--Uhlen...
- Exploring the Effect of Basis Rotation on NQS Performance : Abstract: Neural Quantum States (NQS) use neural networks to represent wavefunctions of quantum many-body systems, but their performance depends on the choice of basis, yet the underlying mechanism re...
- RadarGen: Automotive Radar Point Cloud Generation from Cameras : Abstract: We present RadarGen, a diffusion model for synthesizing realistic automotive radar point clouds from multi-view camera imagery. RadarGen adapts efficient image-latent diffusion to the radar ...
- Adversarial Robustness of Vision in Open Foundation Models : Abstract: With the increase in deep learning, it becomes increasingly difficult to understand the model in which AI systems can identify objects. Thus, an adversary could aim to modify an image by add...
- Re-Depth Anything: Test-Time Depth Refinement via Self-Supervised Re-lighting : Abstract: Monocular depth estimation remains challenging as recent foundation models, such as Depth Anything V2 (DA-V2), struggle with real-world images that are far from the training distribution. We...
- Agnosticism About Artificial Consciousness : Abstract: Could an AI have conscious experiences? Any answer to this question should conform to Evidentialism - that is, it should be based not on intuition, dogma or speculation but on solid scientif...
- Best Practices For Empirical Meta-Algorithmic Research: Guidelines from the COSEAL Research Network : Abstract: Empirical research on meta-algorithmics, such as algorithm selection, configuration, and scheduling, often relies on extensive and thus computationally expensive experiments. With the large ...
- ParamExplorer: A framework for exploring parameters in generative art : Abstract: Generative art systems often involve high-dimensional and complex parameter spaces in which aesthetically compelling outputs occupy only small, fragmented regions. Because of this combinator...
- An AI-driven Assessment of Bone Density as a Biomarker Leading to the Aging Law : Abstract: As global population aging intensifies, there is growing interest in the study of biological age. Bones have long been used to evaluate biological age, and the decline in bone density with a...
- The Diffusion Duality : Abstract: Uniform-state discrete diffusion models hold the promise of fast text generation due to their inherent ability to self-correct. However, they are typically outperformed by autoregressive mod...
- ResearchQA: Evaluating Scholarly Question Answering at Scale Across 75 Fields with Survey-Mined Questions and Rubrics : Abstract: Evaluating long-form responses to research queries heavily relies on expert annotators, restricting attention to areas like AI where researchers can conveniently enlist colleagues. Yet, rese...
- Dion2: A Simple Method to Shrink Matrix in Muon : Abstract: The Muon optimizer enjoys strong empirical performance and theoretical grounding. However, the super-linear cost of its orthonormalization step introduces increasing overhead with scale. To ...
- BIONIX: A Wireless, Low-Cost Prosthetic Arm with Dual-Signal EEG and EMG Control : Abstract: Affordable upper-limb prostheses often lack intuitive control systems, limiting functionality and accessibility for amputees in low-resource settings. This project presents a low-cost, dual-...
- QSMOTE-PGM/kPGM: QSMOTE Based PGM and kPGM for Imbalanced Dataset Classification : Abstract: Quantum-inspired machine learning (QiML) leverages mathematical frameworks from quantum theory to enhance classical algorithms, with particular emphasis on inner product structures in high-d...
- Compression is Routing: Reconstruction Error as an Intrinsic Signal for Modular Language Models : Abstract: Current Large Language Models (LLMs) face three major challenges: context length limitations, high inference costs, and catastrophic forgetting during continual learning. While Mixture-of-Ex...
- Physics-Informed Lightweight Machine Learning for Aviation Visibility Nowcasting Across Multiple Climatic Regimes : Abstract: Short-term prediction (nowcasting) of low-visibility and precipitation events is critical for aviation safety and operational efficiency. Current operational approaches rely on computational...
- Turn-PPO: Turn-Level Advantage Estimation with PPO for Improved Multi-Turn RL in Agentic LLMs : Abstract: Reinforcement learning (RL) has re-emerged as a natural approach for training interactive LLM agents in real-world environments. However, directly applying the widely used Group Relative Pol...
- GB-DQN: Gradient Boosted DQN Models for Non-stationary Reinforcement Learning : Abstract: Non-stationary environments pose a fundamental challenge for deep reinforcement learning, as changes in dynamics or rewards invalidate learned value functions and cause catastrophic forgetti...
- SFBD-OMNI: Bridge models for lossy measurement restoration with limited clean samples : Abstract: In many real-world scenarios, obtaining fully observed samples is prohibitively expensive or even infeasible, while partial and noisy observations are comparatively easy to collect. In this ...
- Dynamic Tool Dependency Retrieval for Efficient Function Calling : Abstract: Function calling agents powered by Large Language Models (LLMs) select external tools to automate complex tasks. On-device agents typically use a retrieval module to select relevant tools, i...
- Universal consistency of the $k$-NN rule in metric spaces and Nagata dimension. III : Abstract: We prove the last remaining implication allowing to claim the equivalence of the following conditions for a complete separable metric space $X$: (1) The $k$-nearest neighbour classifier is...
- Bandwidth-Efficient Adaptive Mixture-of-Experts via Low-Rank Compensation : Abstract: Mixture-of-Experts (MoE) models scale capacity via sparse activation but stress memory and bandwidth. Offloading alleviates GPU memory by fetching experts on demand, yet token-level routing ...
- Fault Diagnosis and Quantification for Photovoltaic Arrays based on Differentiable Physical Models : Abstract: Accurate fault diagnosis and quantification are essential for the reliable operation and intelligent maintenance of photovoltaic (PV) arrays. However, existing fault quantification methods o...
- Atom: Efficient On-Device Video-Language Pipelines Through Modular Reuse : Abstract: Recent advances in video-language models have enabled powerful applications like video retrieval, captioning, and assembly. However, executing such multi-stage pipelines efficiently on mobil...
- Bridging Training and Merging Through Momentum-Aware Optimization : Abstract: Training large neural networks and merging task-specific models both exploit low-rank structure and require parameter importance estimation, yet these challenges have been pursued in isolati...
- Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts : Abstract: This paper presents the first end-to-end pipeline for Handwritten Text Recognition (HTR) for Old Nepali, a historically significant but low-resource language. We adopt a line-level transcrip...
- The Effect of Negation on CLIP in Medical Imaging: Limitations of Contrastive Language-Image Pretraining : Abstract: Large vision-language models like CLIP are increasingly used in medical imaging tasks due to their ability to align images and text without the need for extensive labeled data. This makes th...
- DiffeoMorph: Learning to Morph 3D Shapes Using Differentiable Agent-Based Simulations : Abstract: Biological systems can form complex three-dimensional structures through the collective behavior of identical agents -- cells that follow the same internal rules and communicate without cent...
- Distributed Learning in Markovian Restless Bandits over Interference Graphs for Stable Spectrum Sharing : Abstract: We study distributed learning for spectrum access and sharing among multiple cognitive communication entities, such as cells, subnetworks, or cognitive radio users (collectively referred to ...
- BumpNet: A Sparse Neural Network Framework for Learning PDE Solutions : Abstract: We introduce BumpNet, a sparse neural network framework for PDE numerical solution and operator learning. BumpNet is based on meshless basis function expansion, in a similar fashion to radia...
- Learning solution operator of dynamical systems with diffusion maps kernel ridge regression : Abstract: Many scientific and engineering systems exhibit complex nonlinear dynamics that are difficult to predict accurately over long time horizons. Although data-driven models have shown promise, t...
- Electric Vehicle Charging Load Forecasting: An Experimental Comparison of Machine Learning Methods : Abstract: With the growing popularity of electric vehicles as a means of addressing climate change, concerns have emerged regarding their impact on electric grid management. As a result, predicting EV...
- A Women's Health Benchmark for Large Language Models : Abstract: As large language models (LLMs) become primary sources of health information for millions, their accuracy in women's health remains critically unexamined. We introduce the Women's Health Ben...
- Adversarial VR: An Open-Source Testbed for Evaluating Adversarial Robustness of VR Cybersickness Detection and Mitigation : Abstract: Deep learning (DL)-based automated cybersickness detection methods, along with adaptive mitigation techniques, can enhance user comfort and interaction. However, recent studies show that the...
- Another Fit Bites the Dust: Conformal Prediction as a Calibration Standard for Machine Learning in High-Energy Physics : Abstract: Machine-learning techniques are essential in modern collider research, yet their probabilistic outputs often lack calibrated uncertainty estimates and finite-sample guarantees, limiting thei...
- Knowledge Distillation with Structured Chain-of-Thought for Text-to-SQL : Abstract: Deploying accurate Text-to-SQL systems at the enterprise level faces a difficult trilemma involving cost, security and performance. Current solutions force enterprises to choose between expe...
- On the Role of Contextual Information and Ego States in LLM Agent Behavior for Transactional Analysis Dialogues : Abstract: LLM-powered agents are now used in many areas, from customer support to education, and there is increasing interest in their ability to act more like humans. This includes fields such as soc...
- Bots Don't Sit Still: A Longitudinal Study of Bot Behaviour Change, Temporal Drift, and Feature-Structure Evolution : Abstract: Social bots are now deeply embedded in online platforms for promotion, persuasion, and manipulation. Most bot-detection systems still treat behavioural features as static, implicitly assumin...
- Can Large Reasoning Models Improve Accuracy on Mathematical Tasks Using Flawed Thinking? : Abstract: Chain-of-thought (CoT) prompting has become central to mathematical reasoning in large language models, yet models remain brittle to early errors: a single arithmetic slip or unjustified inf...
- When F1 Fails: Granularity-Aware Evaluation for Dialogue Topic Segmentation : Abstract: Dialogue topic segmentation supports summarization, retrieval, memory management, and conversational continuity. Despite decades of prior work, evaluation practice in dialogue topic segmenta...
- How to Square Tensor Networks and Circuits Without Squaring Them : Abstract: Squared tensor networks (TNs) and their extension as computational graphs--squared circuits--have been used as expressive distribution estimators, yet supporting closed-form marginalization....
- Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making : Abstract: We propose a new approach for solving planning problems with a hierarchical structure, fusing reinforcement learning and MPC planning. Our formulation tightly and elegantly couples the two p...
- UniCoMTE: A Universal Counterfactual Framework for Explaining Time-Series Classifiers on ECG Data : Abstract: Machine learning models, particularly deep neural networks, have demonstrated strong performance in classifying complex time series data. However, their black-box nature limits trust and ado...
- Smoothing DiLoCo with Primal Averaging for Faster Training of LLMs : Abstract: We propose Generalized Primal Averaging (GPA), an extension of Nesterov's method in its primal averaging formulation that addresses key limitations of recent averaging-based optimizers such ...
- SDUM: A Scalable Deep Unrolled Model for Universal MRI Reconstruction : Abstract: Clinical MRI encompasses diverse imaging protocols--spanning anatomical targets (cardiac, brain, knee), contrasts (T1, T2, mapping), sampling patterns (Cartesian, radial, spiral, kt-space), ...
- PILAR: Personalizing Augmented Reality Interactions with LLM-based Human-Centric and Trustworthy Explanations for Daily Use Cases : Abstract: Artificial intelligence (AI)-driven augmented reality (AR) systems are becoming increasingly integrated into daily life, and with this growth comes a greater need for explainability in real-...
- Conservative Bias in Multi-Teacher Learning: Why Agents Prefer Low-Reward Advisors : Abstract: Interactive reinforcement learning (IRL) has shown promise in enabling autonomous agents and robots to learn complex behaviours from human teachers, yet the dynamics of teacher selection rem...
- Systemic Risk Radar: A Multi-Layer Graph Framework for Early Market Crash Warning : Abstract: Financial crises emerge when structural vulnerabilities accumulate across sectors, markets, and investor behavior. Predicting these systemic transitions is challenging because they arise fro...
- Fose: Fusion of One-Step Diffusion and End-to-End Network for Pansharpening : Abstract: Pansharpening is a significant image fusion task that fuses low-resolution multispectral images (LRMSI) and high-resolution panchromatic images (PAN) to obtain high-resolution multispectral ...
- Research on Dead Reckoning Algorithm for Self-Propelled Pipeline Robots in Three-Dimensional Complex Pipelines : Abstract: In the field of gas pipeline location, existing pipeline location methods mostly rely on pipeline location instruments. However, when faced with complex and curved pipeline scenarios, these ...
- The Role of Islamic Ethics in Preventing the Abuse of Artificial Intelligence (AI) Based Deepfakes : Abstract: The significant development of deepfake technology powered by artificial intelligence (AI) has sparked worldwide concerns about the alteration of false information, the usurpation of online ...
- Privacy-Preserving Synthetic Dataset of Individual Daily Trajectories for City-Scale Mobility Analytics : Abstract: Urban mobility data are indispensable for urban planning, transportation demand forecasting, pandemic modeling, and many other applications; however, individual mobile phone-derived Global P...
- Incorporating Error Level Noise Embedding for Improving LLM-Assisted Robustness in Persian Speech Recognition : Abstract: Automatic Speech Recognition (ASR) systems suffer significant performance degradation in noisy environments, a challenge that is especially severe for low-resource languages such as Persian....
- AlignDP: Hybrid Differential Privacy with Rarity-Aware Protection for LLMs : Abstract: Large language models are exposed to risks of extraction, distillation, and unauthorized fine-tuning. Existing defenses use watermarking or monitoring, but these act after leakage. We design...
- From Priors to Predictions: Explaining and Visualizing Human Reasoning in a Graph Neural Network Framework : Abstract: Humans excel at solving novel reasoning problems from minimal exposure, guided by inductive biases, assumptions about which entities and relationships matter. Yet the computational form of t...
- Verifiability-First Agents: Provable Observability and Lightweight Audit Agents for Controlling Autonomous LLM Systems : Abstract: As LLM-based agents grow more autonomous and multi-modal, ensuring they remain controllable, auditable, and faithful to deployer intent becomes critical. Prior benchmarks measured the propen...
- AutoMetrics: Approximate Human Judgements with Automatically Generated Evaluators : Abstract: Evaluating user-facing AI applications remains a central challenge, especially in open-ended domains such as travel planning, clinical note generation, or dialogue. The gold standard is user...
- Understanding Generalization in Role-Playing Models via Information Theory : Abstract: Role-playing models (RPMs) are widely used in real-world applications but underperform when deployed in the wild. This degradation can be attributed to distribution shifts, including user, c...
- WDFFU-Mamba: A Wavelet-guided Dual-attention Feature Fusion Mamba for Breast Tumor Segmentation in Ultrasound Images : Abstract: Breast ultrasound (BUS) image segmentation plays a vital role in assisting clinical diagnosis and early tumor screening. However, challenges such as speckle noise, imaging artifacts, irregul...
- Subjective Question Generation and Answer Evaluation using NLP : Abstract: Natural Language Processing (NLP) is one of the most revolutionary technologies today. It uses artificial intelligence to understand human text and spoken words. It is used for text summariz...
- Robust TTS Training via Self-Purifying Flow Matching for the WildSpoof 2026 TTS Track : Abstract: This paper presents a lightweight text-to-speech (TTS) system developed for the WildSpoof Challenge TTS Track. Our approach fine-tunes the recently released open-weight TTS model, \textit{Su...
- M2RU: Memristive Minion Recurrent Unit for Continual Learning at the Edge : Abstract: Continual learning on edge platforms remains challenging because recurrent networks depend on energy-intensive training procedures and frequent data movement that are impractical for embedde...
- Explanation Beyond Intuition: A Testable Criterion for Inherent Explainability : Abstract: Inherent explainability is the gold standard in Explainable Artificial Intelligence (XAI). However, there is not a consistent definition or test to demonstrate inherent explainability. Work ...
- A Benchmark for Ultra-High-Resolution Remote Sensing MLLMs : Abstract: Multimodal large language models (MLLMs) demonstrate strong perception and reasoning performance on existing remote sensing (RS) benchmarks. However, most prior benchmarks rely on low-resolu...
- Adaptive Graph Pruning with Sudden-Events Evaluation for Traffic Prediction using Online Semi-Decentralized ST-GNNs : Abstract: Spatio-Temporal Graph Neural Networks (ST-GNNs) are well-suited for processing high-frequency data streams from geographically distributed sensors in smart mobility systems. However, their d...
- TakeAD: Preference-based Post-optimization for End-to-end Autonomous Driving with Expert Takeover Data : Abstract: Existing end-to-end autonomous driving methods typically rely on imitation learning (IL) but face a key challenge: the misalignment between open-loop training and closed-loop deployment. Thi...
- RadImageNet-VQA: A Large-Scale CT and MRI Dataset for Radiologic Visual Question Answering : Abstract: In this work, we introduce RadImageNet-VQA, a large-scale dataset designed to advance radiologic visual question answering (VQA) on CT and MRI exams. Existing medical VQA datasets are limite...
- Detection and Analysis of Sensitive and Illegal Content on the Ethereum Blockchain Using Machine Learning Techniques : Abstract: Blockchain technology, lauded for its transparent and immutable nature, introduces a novel trust model. However, its decentralized structure raises concerns about potential inclusion of mali...
- Optimisation of Aircraft Maintenance Schedules : Abstract: We present an aircraft maintenance scheduling problem, which requires suitably qualified staff to be assigned to maintenance tasks on each aircraft. The tasks on each aircraft must be comple...
- SWE-Bench++: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories : Abstract: Benchmarks like SWE-bench have standardized the evaluation of Large Language Models (LLMs) on repository-level software engineering tasks. However, these efforts remain limited by manual cur...
- A Systematic Reproducibility Study of BSARec for Sequential Recommendation : Abstract: In sequential recommendation (SR), the self-attention mechanism of Transformer-based models acts as a low-pass filter, limiting their ability to capture high-frequency signals that reflect s...
- Assessing Long-Term Electricity Market Design for Ambitious Decarbonization Targets using Multi-Agent Reinforcement Learning : Abstract: Electricity systems are key to transforming today's society into a carbon-free economy. Long-term electricity market mechanisms, including auctions, support schemes, and other policy instrum...
- Learning What to Write: Write-Gated KV for Efficient Long-Context Inference : Abstract: Long-context LLM inference is bottlenecked by the quadratic attention complexity and linear KV cache growth. Prior approaches mitigate this via post-hoc selection or eviction but overlook th...
- A lightweight Spatial-Temporal Graph Neural Network for Long-term Time Series Forecasting : Abstract: We propose Lite-STGNN, a lightweight spatial-temporal graph neural network for long-term multivariate forecasting that integrates decomposition-based temporal modeling with learnable sparse ...
- Fair Voting Methods as a Catalyst for Democratic Resilience: A Trilogy on Legitimacy, Impact and AI Safeguarding : Abstract: This article shows how fair voting methods can be a catalyst for change in the way we make collective decisions, and how such change can promote long-awaited upgrades of democracy. Based on ...
- Behavioural Effects of Agentic Messaging: A Case Study on a Financial Service Application : Abstract: Marketing and product personalisation provide a prominent and visible use-case for the application of Information Retrieval methods across several business domains. Recently, agentic approac...
- InsertAnywhere: Bridging 4D Scene Geometry and Diffusion Models for Realistic Video Object Insertion : Abstract: Recent advances in diffusion-based video generation have opened new possibilities for controllable video editing, yet realistic video object insertion (VOI) remains challenging due to limite...
- Key-Conditioned Orthonormal Transform Gating (K-OTG): Multi-Key Access Control with Hidden-State Scrambling for LoRA-Tuned Models : Abstract: We present a simple, PEFT-compatible mechanism that enforces secret-key access control in instruction-tuned language models. K-OTG trains on a dual-path corpus: authorized examples (prefixed...
- SafeBench-Seq: A Homology-Clustered, CPU-Only Baseline for Protein Hazard Screening with Physicochemical/Composition Features and Cluster-Aware Confidence Intervals : Abstract: Foundation models for protein design raise concrete biosecurity risks, yet the community lacks a simple, reproducible baseline for sequence-level hazard screening that is explicitly evaluate...
- Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding : Abstract: Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predom...
- HydroGym: A Reinforcement Learning Platform for Fluid Dynamics : Abstract: Modeling and controlling fluid flows is critical for several fields of science and engineering, including transportation, energy, and medicine. Effective flow control can lead to, e.g., lift...
- ClothHMR: 3D Mesh Recovery of Humans in Diverse Clothing from Single Image : Abstract: With 3D data rapidly emerging as an important form of multimedia information, 3D human mesh recovery technology has also advanced accordingly. However, current methods mainly focus on handli...
- When De-noising Hurts: A Systematic Study of Speech Enhancement Effects on Modern Medical ASR Systems : Abstract: Speech enhancement methods are commonly believed to improve the performance of automatic speech recognition (ASR) in noisy environments. However, the effectiveness of these techniques cannot...
- A unified FLAIR hyperintensity segmentation model for various CNS tumor types and acquisition time points : Abstract: T2-weighted fluid-attenuated inversion recovery (FLAIR) magnetic resonance imaging (MRI) scans are important for diagnosis, treatment planning and monitoring of brain tumors. Depending on th...
- GreedySnake: Accelerating SSD-Offloaded LLM Training with Efficient Scheduling and Optimizer Step Overlapping : Abstract: SSD-offloaded training offers a practical and promising approach to making LLM training cost-effective. Building on gradient accumulation with micro-batches, this paper introduces GreedySnak...
- MAD-OOD: A Deep Learning Cluster-Driven Framework for an Out-of-Distribution Malware Detection and Classification : Abstract: Out of distribution (OOD) detection remains a critical challenge in malware classification due to the substantial intra family variability introduced by polymorphic and metamorphic malware v...
- MGRegBench: A Novel Benchmark Dataset with Anatomical Landmarks for Mammography Image Registration : Abstract: Robust mammography registration is essential for clinical applications like tracking disease progression and monitoring longitudinal changes in breast tissue. However, progress has been limi...
- More Consistent Accuracy PINN via Alternating Easy-Hard Training : Abstract: Physics-informed neural networks (PINNs) have recently emerged as a prominent paradigm for solving partial differential equations (PDEs), yet their training strategies remain underexplored. ...
- SCOPE: Sequential Causal Optimization of Process Interventions : Abstract: Prescriptive Process Monitoring (PresPM) recommends interventions during business processes to optimize key performance indicators (KPIs). In realistic settings, interventions are rarely iso...
- Trust-Region Adaptive Policy Optimization : Abstract: Post-training methods, especially Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), play an important role in improving large language models' (LLMs) complex reasoning abilities....
- STAR: Semantic-Traffic Alignment and Retrieval for Zero-Shot HTTPS Website Fingerprinting : Abstract: Modern HTTPS mechanisms such as Encrypted Client Hello (ECH) and encrypted DNS improve privacy but remain vulnerable to website fingerprinting (WF) attacks, where adversaries infer visited s...
- Learning Spatio-Temporal Feature Representations for Video-Based Gaze Estimation : Abstract: Video-based gaze estimation methods aim to capture the inherently temporal dynamics of human eye gaze from multiple image frames. However, since models must capture both spatial and temporal...
- An Empirical Study of Sampling Hyperparameters in Diffusion-Based Super-Resolution : Abstract: Diffusion models have shown strong potential for solving inverse problems such as single-image super-resolution, where a high-resolution image is recovered from a low-resolution observation ...
- You Only Train Once: Differentiable Subset Selection for Omics Data : Abstract: Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, m...
- Digital and Web Forensics Model Cards, V1 : Abstract: This paper introduces a standardized model card framework specifically designed for digital and web forensics. Building upon established model card methodologies and recent work on abstract ...
- Diversity Recommendation via Causal Deconfounding of Co-purchase Relations and Counterfactual Exposure : Abstract: Beyond user-item modeling, item-to-item relationships are increasingly used to enhance recommendation. However, common methods largely rely on co-occurrence, making them prone to item popula...
- AncientBench: Towards Comprehensive Evaluation on Excavated and Transmitted Chinese Corpora : Abstract: Comprehension of ancient texts plays an important role in archaeology and understanding of Chinese history and civilization. The rapid development of large language models needs benchmarks t...
- Bangla MedER: Multi-BERT Ensemble Approach for the Recognition of Bangla Medical Entity : Abstract: Medical Entity Recognition (MedER) is an essential NLP task for extracting meaningful entities from the medical corpus. Nowadays, MedER-based research outcomes can remarkably contribute to t...
- Easy Adaptation: An Efficient Task-Specific Knowledge Injection Method for Large Models in Resource-Constrained Environments : Abstract: While the enormous parameter scale endows Large Models (LMs) with unparalleled performance, it also limits their adaptability across specific tasks. Parameter-Efficient Fine-Tuning (PEFT) ha...
- Pix2NPHM: Learning to Regress NPHM Reconstructions From a Single Image : Abstract: Neural Parametric Head Models (NPHMs) are a recent advancement over mesh-based 3d morphable models (3DMMs) to facilitate high-fidelity geometric detail. However, fitting NPHMs to visual inpu...
- MedNeXt-v2: Scaling 3D ConvNeXts for Large-Scale Supervised Representation Learning in Medical Image Segmentation : Abstract: Large-scale supervised pretraining is rapidly reshaping 3D medical image segmentation. However, existing efforts focus primarily on increasing dataset size and overlook the question of wheth...
- Intelligent Knowledge Mining Framework: Bridging AI Analysis and Trustworthy Preservation : Abstract: The unprecedented proliferation of digital data presents significant challenges in access, integration, and value creation across all data-intensive sectors. Valuable information is frequent...
- Animate Any Character in Any World : Abstract: Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construc...
- LLM-based Behaviour Driven Development for Hardware Design : Abstract: Test and verification are essential activities in hardware and system design, but their complexity grows significantly with increasing system sizes. While Behavior Driven Development (BDD) h...
- Navigating Taxonomic Expansions of Entity Sets Driven by Knowledge Bases : Abstract: Recognizing similarities among entities is central to both human cognition and computational intelligence. Within this broader landscape, Entity Set Expansion is one prominent task aimed at ...
- Probing Scientific General Intelligence of LLMs with Scientist-Aligned Workflows : Abstract: Despite advances in scientific AI, a coherent framework for Scientific General Intelligence (SGI)-the ability to autonomously conceive, investigate, and reason across scientific domains-rema...
- PAACE: A Plan-Aware Automated Agent Context Engineering Framework : Abstract: Large Language Model (LLM) agents are increasingly deployed in complex, multi-step workflows involving planning, tool use, reflection, and interaction with external knowledge systems. These ...
- Security Risks of Agentic Vehicles: A Systematic Analysis of Cognitive and Cross-Layer Threats : Abstract: Agentic AI is increasingly being explored and introduced in both manually driven and autonomous vehicles, leading to the notion of Agentic Vehicles (AgVs), with capabilities such as memory-b...
- UniRel-R1: RL-tuned LLM Reasoning for Knowledge Graph Relational Question Answering : Abstract: Knowledge Graph Question Answering (KGQA) has traditionally focused on entity-centric queries that return a single answer entity. However, real-world queries are often relational, seeking to...
- Realistic threat perception drives intergroup conflict: A causal, dynamic analysis using generative-agent simulations : Abstract: Human conflict is often attributed to threats against material conditions and symbolic values, yet it remains unclear how they interact and which dominates. Progress is limited by weak causa...
- Value Under Ignorance in Universal Artificial Intelligence : Abstract: We generalize the AIXI reinforcement learning agent to admit a wider class of utility functions. Assigning a utility to each possible interaction history forces us to confront the ambiguity ...
- A Solver-in-the-Loop Framework for Improving LLMs on Answer Set Programming for Logic Puzzle Solving : Abstract: The rise of large language models (LLMs) has sparked interest in coding assistants. While general-purpose programming languages are well supported, generating code for domain-specific langua...
- Reinforcement Learning for Self-Improving Agent with Skill Library : Abstract: Large Language Model (LLM)-based agents have demonstrated remarkable capabilities in complex reasoning and multi-turn interactions but struggle to continuously improve and adapt when deploye...
- Solomonoff-Inspired Hypothesis Ranking with LLMs for Prediction Under Uncertainty : Abstract: Reasoning under uncertainty is a key challenge in AI, especially for real-world tasks, where problems with sparse data demands systematic generalisation. Existing approaches struggle to bala...
- MMRAG-RFT: Two-stage Reinforcement Fine-tuning for Explainable Multi-modal Retrieval-augmented Generation : Abstract: Multi-modal Retrieval-Augmented Generation (MMRAG) enables highly credible generation by integrating external multi-modal knowledge, thus demonstrating impressive performance in complex mult...
- UmniBench: Unified Understand and Generation Model Oriented Omni-dimensional Benchmark : Abstract: Unifying multimodal understanding and generation has shown impressive capabilities in cutting-edge proprietary systems. However, evaluations of unified multimodal models (UMMs) remain decoup...
- Accelerating Multi-modal LLM Gaming Performance via Input Prediction and Mishit Correction : Abstract: Real-time sequential control agents are often bottlenecked by inference latency. Even modest per-step planning delays can destabilize control and degrade overall performance. We propose a sp...
- ScoutGPT: Capturing Player Impact from Team Action Sequences Using GPT-Based Framework : Abstract: Transfers play a pivotal role in shaping a football club's success, yet forecasting whether a transfer will succeed remains difficult due to the strong context-dependence of on-field perform...
- Large Language Models as Pok\'emon Battle Agents: Strategic Play and Content Generation : Abstract: Strategic decision-making in Pokémon battles presents a unique testbed for evaluating large language models. Pokémon battles demand reasoning about type matchups, statistical trade-offs, and...
- Dialectics for Artificial Intelligence : Abstract: Can artificial intelligence discover, from raw experience and without human supervision, concepts that humans have discovered? One challenge is that human concepts themselves are fluid: conc...
- Translating the Rashomon Effect to Sequential Decision-Making Tasks : Abstract: The Rashomon effect describes the phenomenon where multiple models trained on the same data produce identical predictions while differing in which features they rely on internally. This effe...
- Towards Explainable Conversational AI for Early Diagnosis with Large Language Models : Abstract: Healthcare systems around the world are grappling with issues like inefficient diagnostics, rising costs, and limited access to specialists. These problems often lead to delays in treatment ...
- About Time: Model-free Reinforcement Learning with Timed Reward Machines : Abstract: Reward specification plays a central role in reinforcement learning (RL), guiding the agent's behavior. To express non-Markovian rewards, formalisms such as reward machines have been introdu...
- Humanlike AI Design Increases Anthropomorphism but Yields Divergent Outcomes on Engagement and Trust Globally : Abstract: Over a billion users across the globe interact with AI systems engineered with increasing sophistication to mimic human traits. This shift has triggered urgent debate regarding Anthropomorph...
- When Reasoning Meets Its Laws : Abstract: Despite the superior performance of Large Reasoning Models (LRMs), their reasoning behaviors are often counterintuitive, leading to suboptimal reasoning capabilities. To theoretically formal...
- V-Agent: An Interactive Video Search System Using Vision-Language Models : Abstract: We introduce V-Agent, a novel multi-agent platform designed for advanced video search and interactive user-system conversations. By fine-tuning a vision-language model (VLM) with a small vid...
- Optimizing Text Search: A Novel Pattern Matching Algorithm Based on Ukkonen's Approach : Abstract: In the realm of computer science, the efficiency of text-search algorithms is crucial for processing vast amounts of data in areas such as natural language processing and bioinformatics. Tra...
- Enhancing Tree Species Classification: Insights from YOLOv8 and Explainable AI Applied to TLS Point Cloud Projections : Abstract: Classifying tree species has been a core research area in forest remote sensing for decades. New sensors and classification approaches like TLS and deep learning achieve state-of-the art acc...
- Lights, Camera, Consistency: A Multistage Pipeline for Character-Stable AI Video Stories : Abstract: Generating long, cohesive video stories with consistent characters is a significant challenge for current text-to-video AI. We introduce a method that approaches video generation in a filmma...
- MemoryGraft: Persistent Compromise of LLM Agents via Poisoned Experience Retrieval : Abstract: Large Language Model (LLM) agents increasingly rely on long-term memory and Retrieval-Augmented Generation (RAG) to persist experiences and refine future performance. While this experience l...
- InfoTok: Adaptive Discrete Video Tokenizer via Information-Theoretic Compression : Abstract: Accurate and efficient discrete video tokenization is essential for long video sequences processing. Yet, the inherent complexity and variable information density of videos present a signifi...
- Unexpected Knowledge: Auditing Wikipedia and Grokipedia Search Recommendations : Abstract: Encyclopedic knowledge platforms are key gateways through which users explore information online. The recent release of Grokipedia, a fully AI-generated encyclopedia, introduces a new altern...
Research Sources: 328 | Generated: 12/22/2025
