AI Research News Feeds for August 18th, 2025

AI RESEARCH PAPERS & ACADEMIC SOURCES

Efficient Image-to-Image Schr\"odinger Bridge for CT Field of View Extension
Fluid Dynamics and Domain Reconstruction from Noisy Flow Images Using Physics-Informed Neural Networks and Quasi-Conformal Mapping
Temporally-Similar Structure-Aware Spatiotemporal Fusion of Satellite Images
Allen: Rethinking MAS Design through Step-Level Policy Autonomy
Guiding WaveMamba with Frequency Maps for Image Debanding
AnatoMaskGAN: GNN-Driven Slice Feature Fusion and Noise Augmentation for Medical Semantic Image Synthesis
LKFMixer: Exploring Large Kernel Feature For Efficient Image Super-Resolution
Subcortical Masks Generation in CT Images via Ensemble-Based Cross-Domain Label Transfer
SPG: Style-Prompting Guidance for Style-Specific Content Creation
Relative Position Matters: Trajectory Prediction and Planning with Polar Representation
Scanpath Prediction in Panoramic Videos via Expected Code Length Minimization
Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition
Compositional Zero-shot Learning via Progressive Language-based Observations
Wild2Avatar: Rendering Humans Behind Occlusions
Effective Message Hiding with Order-Preserving Mechanisms
Reconstructing Satellites in 3D from Amateur Telescope Images
Towards Physically Realizable Adversarial Attacks in Embodied Vision Navigation
Efficient High-Resolution Visual Representation Learning with State Space Model for Human Pose Estimation
MUNBa: Machine Unlearning via Nash Bargaining
GBR: Generative Bundle Refinement for High-fidelity Gaussian Splatting with Enhanced Mesh Reconstruction
Learning an Adaptive and View-Invariant Vision Transformer for Real-Time UAV Tracking
Towards Consumer-Grade Cybersickness Prediction: Multi-Model Alignment for Real-Time Vision-Only Inference
LVFace: Progressive Cluster Optimization for Large Vision Models in Face Recognition
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
Introducing Unbiased Depth into 2D Gaussian Splatting for High-accuracy Surface Reconstruction
Seeing and Seeing Through the Glass: Real and Synthetic Data for Multi-Layer Depth Estimation
AFR-CLIP: Enhancing Zero-Shot Industrial Anomaly Detection with Stateless-to-Stateful Anomaly Feature Rectification
Image-to-Text for Medical Reports Using Adaptive Co-Attention and Triple-LSTM Module
Med3DVLM: An Efficient Vision-Language Model for 3D Medical Image Analysis
Towards Generalizable Forgery Detection and Reasoning
Casual3DHDR: High Dynamic Range 3D Gaussian Splatting from Casually Captured Videos
Marmot: Object-Level Self-Correction via Multi-Agent Reasoning
Physics-Guided Image Dehazing Diffusion
LSVG: Language-Guided Scene Graphs with 2D-Assisted Multi-Modal Encoding for 3D Visual Grounding
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging
PhysLab: A Benchmark Dataset for Multi-Granularity Visual Parsing of Physics Experiments
Learning Camera-Agnostic White-Balance Preferences
HealthiVert-GAN: A Novel Framework of Pseudo-Healthy Vertebral Image Synthesis for Interpretable Compression Fracture Grading
Pathology-Guided AI System for Accurate Segmentation and Diagnosis of Cervical Spondylosis
HepatoGEN: Generating Hepatobiliary Phase MRI with Perceptual and Adversarial Models
Privacy Enhancement for Gaze Data Using a Noise-Infused Autoencoder
A Survey on Video Temporal Grounding with Multimodal Large Language Model
VSF: Simple, Efficient, and Effective Negative Guidance in Few-Step Image Generation Models By \underline{V}alue \underline{S}ign \underline{F}lip
Relative Pose Regression with Pose Auto-Encoders: Enhancing Accuracy and Data Efficiency for Retail Applications
ViPE: Video Pose Engine for 3D Geometric Perception
Vision-Only Gaussian Splatting for Collaborative Semantic Occupancy Prediction
Personalized Face Super-Resolution with Identity Decoupling and Fitting
Deep Learning for Automated Identification of Vietnamese Timber Species: A Tool for Ecological Monitoring and Conservation
NIRMAL Pooling: An Adaptive Max Pooling Approach with Non-linear Activation for Enhanced Image Classification
Topological Structure Description for Artcode Detection Using the Shape of Orientation Histogram
Analysis of the Compaction Behavior of Textile Reinforcements in Low-Resolution In-Situ CT Scans via Machine-Learning and Descriptor-Based Methods
IPG: Incremental Patch Generation for Generalized Adversarial Patch Training
MedAtlas: Evaluating LLMs for Multi-Round, Multi-Task Medical Reasoning Across Diverse Imaging Modalities and Clinical Text
From Promise to Practical Reality: Transforming Diffusion MRI Analysis with Fast Deep Learning Enhancement
CSNR and JMIM Based Spectral Band Selection for Reducing Metamerism in Urban Driving
EVCtrl: Efficient Control Adapter for Visual Generation
Are Large Pre-trained Vision Language Models Effective Construction Safety Inspectors?
MedSAMix: A Training-Free Model Merging Approach for Medical Image Segmentation
Advancing 3D Scene Understanding with MV-ScanQA Multi-View Reasoning Evaluation and TripAlign Pre-training Dataset
Data-Driven Abdominal Phenotypes of Type 2 Diabetes in Lean, Overweight, and Obese Cohorts
HierOctFusion: Multi-scale Octree-based 3D Shape Generation via Part-Whole-Hierarchy Message Passing
UWB-PostureGuard: A Privacy-Preserving RF Sensing System for Continuous Ergonomic Sitting Posture Monitoring
Residual-based Efficient Bidirectional Diffusion Model for Image Dehazing and Haze Generation
LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction
Semi-supervised Image Dehazing via Expectation-Maximization and Bidirectional Brownian Bridge Diffusion Models
VFM-Guided Semi-Supervised Detection Transformer for Source-Free Object Detection in Remote Sensing Images
Exploring the Tradeoff Between Diversity and Discrimination for Continuous Category Discovery
Fine-Grained VLM Fine-tuning via Latent Hierarchical Adapter Learning
Versatile Video Tokenization with Generative 2D Gaussian Splatting
Generating Dialogues from Egocentric Instructional Videos for Task Assistance: Dataset, Method and Benchmark
UAV-VL-R1: Generalizing Vision-Language Models via Supervised Fine-Tuning and Multi-Stage GRPO for UAV Visual Reasoning
A Coarse-to-Fine Human Pose Estimation Method based on Two-stage Distillation and Progressive Graph Neural Network
FantasyTalking2: Timestep-Layer Adaptive Preference Optimization for Audio-Driven Portrait Animation
Domain-aware Category-level Geometry Learning Segmentation for 3D Point Clouds
Unifying Scale-Aware Depth Prediction and Perceptual Priors for Monocular Endoscope Pose Estimation and Tissue Reconstruction
TimeMachine: Fine-Grained Facial Age Editing with Identity Preservation
Hyperspectral vs. RGB for Pedestrian Segmentation in Urban Driving Scenes: A Comparative Study
Denoise-then-Retrieve: Text-Conditioned Video Denoising for Video Moment Retrieval
Logic Unseen: Revealing the Logical Blindspots of Vision-Language Models
Delving into Dynamic Scene Cue-Consistency for Robust 3D Multi-Object Tracking
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
GANDiff FR: Hybrid GAN Diffusion Synthesis for Causal Bias Attribution in Face Recognition
Index-Aligned Query Distillation for Transformer-based Incremental Object Detection
Cost-Effective Active Labeling for Data-Efficient Cervical Cell Classification
HOID-R1: Reinforcement Learning for Open-World Human-Object Interaction Detection Reasoning with Multimodal Large Language Model
RMFAT: Recurrent Multi-scale Feature Atmospheric Turbulence Mitigator
Training-free Dimensionality Reduction via Feature Truncation: Enhancing Efficiency in Privacy-preserving Multi-Biometric Systems
ImagiDrive: A Unified Imagination-and-Planning Framework for Autonomous Driving
Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting
MM-R1: Unleashing the Power of Unified Multimodal Large Language Models for Personalized Image Generation
Data-Driven Deepfake Image Detection Method -- The 2024 Global Deepfake Image Detection Challenge
CoFi: A Fast Coarse-to-Fine Few-Shot Pipeline for Glomerular Basement Membrane Segmentation
TACR-YOLO: A Real-time Detection Framework for Abnormal Human Behaviors Enhanced with Coordinate and Task-Aware Representations
OpenConstruction: A Systematic Synthesis of Open Visual Datasets for Data-Centric Artificial Intelligence in Construction Monitoring
CineTrans: Learning to Generate Videos with Cinematic Transitions via Masked Diffusion Models
Automated Building Heritage Assessment Using Street-Level Imagery
Perception in Plan: Coupled Perception and Planning for End-to-End Autonomous Driving
Hierarchical Graph Feature Enhancement with Adaptive Frequency Modulation for Visual Recognition
AIM: Amending Inherent Interpretability via Self-Supervised Masking
A Real-time Concrete Crack Detection and Segmentation Model Based on YOLOv11
Multi-State Tracker: Enhancing Efficient Object Tracking via Multi-State Specialization and Interaction
Reinforcing Video Reasoning Segmentation to Think Before It Segments
Training-Free Anomaly Generation via Dual-Attention Enhancement in Diffusion Model
TrajSV: A Trajectory-based Model for Sports Video Representations and Applications
Causality Matters: How Temporal Information Emerges in Video Language Models
DashCam Video: A complementary low-cost data stream for on-demand forest-infrastructure system monitoring
CoreEditor: Consistent 3D Editing via Correspondence-constrained Diffusion
LoRAtorio: An intrinsic approach to LoRA Skill Composition
Thyme: Think Beyond Images
The Role of Radiographic Knee Alignment in Knee Replacement Outcomes and Opportunities for Artificial Intelligence-Driven Assessment
Failures to Surface Harmful Contents in Video Large Language Models
GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning
Visual-RAG: Benchmarking Text-to-Image Retrieval Augmented Generation for Visual Knowledge Intensive Queries
Inaccuracy of an E-Dictionary and Its Influence on Chinese Language Users
Investigating the Effect of Parallel Data in the Cross-Lingual Transfer for Vision-Language Encoders
Relationship Detection on Tabular Data Using Statistical Analysis and Large Language Models
Causal Language in Observational Studies: Sociocultural Backgrounds and Team Composition
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
A2HCoder: An LLM-Driven Coding Agent for Hierarchical Algorithm-to-HDL Translation
PersonaTwin: A Multi-Tier Prompt Conditioning Framework for Generating and Evaluating Personalized Digital Twins
Hell or High Water: Evaluating Agentic Recovery from External Failures
BIPOLAR: Polarization-based granular framework for LLM bias evaluation
Approaching the Source of Symbol Grounding with Confluent Reductions of Abstract Meaning Representation Directed Graphs
Towards Reliable Multi-Agent Systems for Marketing Applications via Reflection, Memory, and Planning
MobQA: A Benchmark Dataset for Semantic Understanding of Human Mobility Data through Question Answering
Overcoming Low-Resource Barriers in Tulu: Neural Models and Corpus Creation for OffensiveLanguage Identification
Personalized Distractor Generation via MCTS-Guided Reasoning Reconstruction
Novel Parasitic Dual-Scale Modeling for Efficient and Accurate Multilingual Speech Translation
UNVEILING: What Makes Linguistics Olympiad Puzzles Tricky for LLMs?
AI in Mental Health: Emotional and Sentiment Analysis of Large Language Models' Responses to Depression, Anxiety, and Stress Queries
SafeConstellations: Steering LLM Safety to Reduce Over-Refusals Through Task-Specific Trajectory
LLM Compression: How Far Can We Go in Balancing Size and Performance?
SpecDetect: Simple, Fast, and Training-Free Detection of LLM-Generated Text via Spectral Analysis
Feedback Indicators: The Alignment between Llama and a Teacher in Language Learning
Survey-to-Behavior: Downstream Alignment of Human Values in LLMs via Survey Questions
HumorPlanSearch: Structured Planning and HuCoT for Contextual AI Humor
Online Anti-sexist Speech: Identifying Resistance to Gender Bias in Political Discourse
CoDiEmb: A Collaborative yet Distinct Framework for Unified Representation Learning in Information Retrieval and Semantic Textual Similarity
Speciesism in AI: Evaluating Discrimination Against Animals in Large Language Models
Language models align with brain regions that represent concepts across modalities
AgentMental: An Interactive Multi-Agent Framework for Explainable and Adaptive Mental Health Assessment
Representing Speech Through Autoregressive Prediction of Cochlear Tokens
Dataset Creation for Visual Entailment using Generative AI
TinyTim: A Family of Language Models for Divergent Generation
The Next Phase of Scientific Fact-Checking: Advanced Evidence Retrieval from Complex Structured Academic Papers
Empowering Multimodal LLMs with External Tools: A Comprehensive Survey
Can Multi-modal (reasoning) LLMs detect document manipulation?
PaperRegister: Boosting Flexible-grained Paper Search via Hierarchical Register Indexing
+VeriRel: Verification Feedback to Enhance Document Retrieval for Scientific Fact Checking
Expressive Speech Retrieval using Natural Language Descriptions of Speaking Style
Benchmarking Prosody Encoding in Discrete Speech Tokens
Emphasis Sensitivity in Speech Representations
RULEBREAKERS: Challenging LLMs at the Crossroads between Formal Logic and Human-like Reasoning
Personalized LLM for Generating Customized Responses to the Same Query from Different Users
A Dual-Perspective NLG Meta-Evaluation Framework with Automatic Benchmark and Better Interpretability
FAB-PPI: Frequentist, Assisted by Bayes, Prediction-Powered Inference
Generalizable speech deepfake detection via meta-learned LoRA
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four Habits of Highly Effective STaRs
Adaptive Bayesian Optimization for Robust Identification of Stochastic Dynamical Systems
Incorporating Coupling Knowledge into Echo State Networks for Learning Spatiotemporally Chaotic Dynamics
Pr$\epsilon\epsilon$mpt: Sanitizing Sensitive Prompts for LLMs
ElasticMM: Efficient Multimodal LLMs Serving with Elastic Multimodal Parallelism
Semantically Guided Adversarial Testing of Vision Models Using Language Models
Unified Knowledge Distillation Framework: Fine-Grained Alignment and Geometric Relationship Preservation for Deep Face Recognition
Model Interpretability and Rationale Extraction by Input Mask Optimization
Rationalizing Transformer Predictions via End-To-End Differentiable Self-Training
SelfAdapt: Unsupervised Domain Adaptation of Cell Segmentation Models
Semi-Supervised Learning with Online Knowledge Distillation for Skin Lesion Classification
An Efficient Medical Image Classification Method Based on a Lightweight Improved ConvNeXt-Tiny Architecture
Investigating Sensors and Methods in Grasp State Classification in Agricultural Manipulation
Nonparametric learning of stochastic differential equations from sparse and noisy data
Enabling Differentially Private Federated Learning for Speech Recognition: Benchmarks, Adaptive Optimizers and Gradient Clipping
Discovering Invariant Neighborhood Patterns for Heterophilic Graphs
A Spectral Framework for Evaluating Geodesic Distances Between Graphs
Incorporating Arbitrary Matrix Group Equivariance into KANs
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Perfect Counterfactuals in Imperfect Worlds: Modelling Noisy Implementation of Actions in Sequential Algorithmic Recourse
Embedding Safety into RL: A New Take on Trust Region Methods
Learning-based Sketches for Frequency Estimation in Data Streams without Ground Truth
Vulnerability of Text-Matching in ML/AI Conference Reviewer Assignments to Collusions
A Survey on Pre-Trained Diffusion Model Distillations
An Analytical Theory of Spectral Bias in the Learning Dynamics of Diffusion Models
SAND: One-Shot Feature Selection with Additive Noise Distortion
Neighbour-Driven Gaussian Process Variational Autoencoders for Scalable Structured Latent Modelling
Central Path Proximal Policy Optimization
Theory of Decentralized Robust Kernel-Based Learning
LETS Forecast: Learning Embedology for Time Series Forecasting
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Structured Generative Modeling with the Thermodynamic Kolmogorov-Arnold Model
Synthetic Data for Robust Stroke Segmentation
An Efficient Deep Learning Approach for Approximating Parameter-to-Solution Maps of PDEs
Modeling Sampling Distributions of Test Statistics with Autograd
Beyond algorithm hyperparameters: on preprocessing hyperparameters and associated pitfalls in machine learning applications
Convergence of Statistical Estimators via Mutual Information Bounds
AlgoTune: Can Language Models Speed Up General-Purpose Numerical Programs?
A Cooperative Game-Based Multi-Criteria Weighted Ensemble Approach for Multi-Class Classification
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale Pretraining
Quantization vs Pruning: Insights from the Strong Lottery Ticket Hypothesis
Conditional Independence Estimates for the Generalized Nonparanormal
SHLIME: Foiling adversarial attacks fooling SHAP and LIME
Abundance-Aware Set Transformer for Microbiome Sample Embedding
A Feasibility Experiment on the Application of Predictive Coding to Instant Messaging Corpora
Relative Advantage Debiasing for Watch-Time Prediction in Short-Video Recommendation
Predictive Multimodal Modeling of Diagnoses and Treatments in EHR
Hybrid-Hierarchical Fashion Graph Attention Network for Compatibility-Oriented and Personalized Outfit Recommendation
CTRL Your Shift: Clustered Transfer Residual Learning for Many Small Datasets
Towards the Next-generation Bayesian Network Classifiers
Mitigating Modality Quantity and Quality Imbalance in Multimodal Online Federated Learning
Meta-learning Structure-Preserving Dynamics
Borrowing From the Future: Enhancing Early Risk Assessment through Contrastive Learning
Air Quality PM2.5 Index Prediction Model Based on CNN-LSTM
Enhancing Interactive Voting-Based Map Matching: Improving Efficiency and Robustness for Heterogeneous GPS Trajectories
Group Fairness Meets the Black Box: Enabling Fair Algorithms on Closed LLMs via Post-Processing
Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble
Generalize across Homophily and Heterophily: Hybrid Spectral Graph Pre-Training and Prompt Tuning
Conformal Prediction Meets Long-tail Classification
A Global Dataset of Location Data Integrity-Assessed Reforestation Efforts
Harmonized Gradient Descent for Class Imbalanced Data Stream Online Learning
Fusing Rewards and Preferences in Reinforcement Learning
A Remedy for Over-Squashing in Graph Learning via Forman-Ricci Curvature based Graph-to-Hypergraph Structural Lifting
Generative Co-Design of Antibody Sequences and Structures via Black-Box Guidance in a Shared Latent Space
Robust Convolution Neural ODEs via Contractivity-promoting regularization
Multi-Sensory Cognitive Computing for Learning Population-level Brain Connectivity
Calibrated and uncertain? Evaluating uncertainty estimates in binary classification models
Predicting and Explaining Traffic Crash Severity Through Crash Feature Selection
DiCriTest: Testing Scenario Generation for Decision-Making Agents Considering Diversity and Criticality
Finite-Width Neural Tangent Kernels from Feynman Diagrams
Physics-Informed Diffusion Models for Unsupervised Anomaly Detection in Multivariate Time Series
DFed-SST: Building Semantic- and Structure-aware Topologies for Decentralized Federated Graph Learning
Nested Operator Inference for Adaptive Data-Driven Learning of Reduced-order Models
SeamlessFlow: A Trainer Agent Isolation RL Framework Achieving Bubble-Free Pipelines via Tag Scheduling
Optimal CO2 storage management considering safety constraints in multi-stakeholder multi-site CCS projects: a game theoretic perspective
Data-driven global ocean model resolving ocean-atmosphere coupling dynamics
Uncovering Latent Connections in Indigenous Heritage: Semantic Pipelines for Cultural Preservation in Brazil
Insect-Wing Structured Microfluidic System for Reservoir Computing
CleanCTG: A Deep Learning Model for Multi-Artefact Detection and Reconstruction in Cardiotocography
HQ-OV3D: A High Box Quality Open-World 3D Detection Framework based on Diffision Model
Non-asymptotic convergence bound of conditional diffusion models
iWatchRoad: Scalable Detection and Geospatial Visualization of Potholes for Smart Cities
Improving Text Style Transfer using Masked Diffusion Language Models with Inference-time Scaling
Counterfactual Survival Q Learning for Longitudinal Randomized Trials via Buckley James Boosting
Human-in-the-Loop Systems for Adaptive Learning Using Generative AI
Functional Analysis of Variance for Association Studies
The Role of Entanglement in Quantum Reservoir Computing with Coupled Kerr Nonlinear Oscillators
HistoViT: Vision Transformer for Accurate and Scalable Histopathological Cancer Diagnosis
CHARM3R: Towards Unseen Camera Height Robust Monocular 3D Detector
A CLIP-based Uncertainty Modal Modeling (UMM) Framework for Pedestrian Re-Identification in Autonomous Driving
Uniform convergence for Gaussian kernel ridge regression
Probing the Representational Power of Sparse Autoencoders in Vision Models
Approximating the universal thermal climate index using sparse regression with orthogonal polynomials
Repetitive TMS-based Identification of Methamphetamine-Dependent Individuals Using EEG Spectra
Weighted First Order Model Counting for Two-variable Logic with Axioms on Two Relations
A Comprehensive Perspective on Explainable AI across the Machine Learning Workflow
ADMIRE-BayesOpt: Accelerated Data MIxture RE-weighting for Language Models with Bayesian Optimization
Aware First, Think Less: Dynamic Boundary Self-Awareness Drives Extreme Reasoning Efficiency in Large Language Models
Visual Perception Engine: Fast and Flexible Multi-Head Inference for Robotic Vision Tasks
CryptoScope: Utilizing Large Language Models for Automated Cryptographic Logic Vulnerability Detection
Pretrained Conformers for Audio Fingerprinting and Retrieval
Controlling Multimodal LLMs via Reward-guided Decoding
Is ChatGPT-5 Ready for Mammogram VQA?
Sophisticated Learning: A novel algorithm for active learning during model-based planning
MetaAgents: Large Language Model Based Agents for Decision-Making on Teaming
Tool-Planner: Task Planning with Clusters across Multiple Tools
Sketch Decompositions for Classical Planning via Deep Reinforcement Learning
Learning to Be A Doctor: Searching for Effective Medical Agent Architectures
CogDDN: A Cognitive Demand-Driven Navigation with Decision Optimization and Dual-Process Thinking
Recent Advances in Generative AI for Healthcare Applications
Large-Scale Multi-Robot Assembly Planning for Autonomous Manufacturing
JMA: a General Algorithm to Craft Nearly Optimal Targeted Adversarial Example
A Survey on Recent Advances in LLM-Based Multi-turn Dialogue Systems
TokenRec: Learning to Tokenize ID for LLM-based Generative Recommendation
Clean-Label Physical Backdoor Attacks with Data Distillation
Bridging Context Gaps: Leveraging Coreference Resolution for Long Contextual Understanding
SLiM: One-shot Quantization and Sparsity with Low-rank Approximation for LLM Weight Compression
Data Diversity as Implicit Regularization: How Does Diversity Shape the Weight Space of Deep Neural Networks?
Language-Based Bayesian Optimization Research Assistant (BORA)
Uncertainty-Aware Adaptation of Large Language Models for Protein-Protein Interaction Analysis
Human-AI Experience in Integrated Development Environments: A Systematic Literature Review
Feather-SQL: A Lightweight NL2SQL Framework with Dual-Model Collaboration Paradigm for Small Language Models
EllieSQL: Cost-Efficient Text-to-SQL with Complexity-Aware Routing
L3AC: Towards a Lightweight and Lossless Audio Codec
Once Upon an AI: Six Scaffolds for Child-AI Interaction Design, Inspired by Disney
EmbodiedAgent: A Scalable Hierarchical Approach to Overcome Practical Challenge in Multi-Robot Control
SORT3D: Spatial Object-centric Reasoning Toolbox for Zero-Shot 3D Grounding Using Large Language Models
Convolutional Autoencoders for Data Compression and Anomaly Detection in Small Satellite Technologies
Blending 3D Geometry and Machine Learning for Multi-View Stereopsis
TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation
A Closer Look at Multimodal Representation Collapse
Omni-DPO: A Dual-Perspective Paradigm for Dynamic Preference Learning of LLMs
ViFusionTST: Deep Fusion of Time-Series Image Representations from Load Signals for Early Bed-Exit Prediction
Text-to-Level Diffusion Models With Various Text Encoders for Super Mario Bros
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Actor-Critic for Continuous Action Chunks: A Reinforcement Learning Framework for Long-Horizon Robotic Manipulation with Sparse Reward
AlphaAgents: Large Language Model based Multi-Agents for Equity Portfolio Constructions
Role-Augmented Intent-Driven Generative Search Engine Optimization
Better Supervised Fine-tuning for VQA: Integer-Only Loss
A Semi-supervised Generative Model for Incomplete Multi-view Data Integration with Missing Labels
Quantum-Boosted High-Fidelity Deep Learning
E-CaTCH: Event-Centric Cross-Modal Attention with Temporal Consistency and Class-Imbalance Handling for Misinformation Detection
Visuomotor Grasping with World Models for Surgical Robots
StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation
Multi-Group Equivariant Augmentation for Reinforcement Learning in Robot Manipulation
How Causal Abstraction Underpins Computational Explanation
ORFuzz: Fuzzing the "Other Side" of LLM Safety -- Testing Over-Refusal
Cross-Granularity Hypergraph Retrieval-Augmented Generation for Multi-hop Question Answering
Graph Neural Diffusion via Generalized Opinion Dynamics
Generalized Decoupled Learning for Enhancing Open-Vocabulary Dense Perception
Hallucination in LLM-Based Code Generation: An Automotive Case Study
Vision-Language Models display a strong gender bias
Enhancing Supervised Composed Image Retrieval via Reasoning-Augmented Representation Engineering
Is General-Purpose AI Reasoning Sensitive to Data-Induced Cognitive Biases? Dynamic Benchmarking on Typical Software Engineering Dilemmas
LETToT: Label-Free Evaluation of Large Language Models On Tourism Using Expert Tree-of-Thought
ToxiFrench: Benchmarking and Enhancing Language Models via CoT Fine-Tuning for French Toxicity Detection
Scene Graph-Guided Proactive Replanning for Failure-Resilient Embodied Agent
CSGO: Generalized Optimization for Cold Start in Wireless Collaborative Edge LLM Systems
Dynamic Quality-Latency Aware Routing for LLM Inference in Wireless Edge-Device Networks
SGSimEval: A Comprehensive Multifaceted and Similarity-Enhanced Benchmark for Automatic Survey Generation Systems
RegimeNAS: Regime-Aware Differentiable Architecture Search With Theoretical Guarantees for Financial Trading
NeMo: A Neuron-Level Modularizing-While-Training Approach for Decomposing DNN Models
Leveraging the RETFound foundation model for optic disc segmentation in retinal images
ETTRL: Balancing Exploration and Exploitation in LLM Test-Time Reinforcement Learning Via Entropy Mechanism
PTSM: Physiology-aware and Task-invariant Spatio-temporal Modeling for Cross-Subject EEG Decoding
Minimizing Surrogate Losses for Decision-Focused Learning using Differentiable Optimization
Does the Skeleton-Recall Loss Really Work?
G-CUT3R: Guided 3D Reconstruction with Camera and Depth Prior Integration
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Retrieval-augmented reasoning with lean language models
Trustworthy AI Psychotherapy: Multi-Agent LLM Workflow for Counseling and Explainable Mental Disorder Diagnosis
An Exploratory Study on Crack Detection in Concrete through Human-Robot Collaboration
Open, Reproducible and Trustworthy Robot-Based Experiments with Virtual Labs and Digital-Twin-Based Execution Tracing
On-Policy RL Meets Off-Policy Experts: Harmonizing Supervised Fine-Tuning and Reinforcement Learning via Dynamic Weighting
Informative Post-Hoc Explanations Only Exist for Simple Functions
Inside Knowledge: Graph-based Path Generation with Explainable Data Augmentation and Curriculum Learning for Visual Indoor Navigation
Reference Points in LLM Sentiment Analysis: The Role of Structured Context
RMSL: Weakly-Supervised Insider Threat Detection with Robust Multi-sphere Learning
Handwritten Text Recognition of Historical Manuscripts Using Transformer-Based Models
Sim2Dust: Mastering Dynamic Waypoint Tracking on Granular Media
Towards Faithful Class-level Self-explainability in Graph Neural Networks by Subgraph Dependencies
Grounding Rule-Based Argumentation Using Datalog
From Individual to Multi-Agent Algorithmic Recourse: Minimizing the Welfare Gap via Capacitated Bipartite Matching
Learn to optimize for automatic proton PBS treatment planning for H&N cancers
On Strong and Weak Admissibility in Non-Flat Assumption-Based Argumentation
Beyond Solving Math Quiz: Evaluating the Ability of Large Reasoning Models to Ask for Information
SAGE: Scale-Aware Gradual Evolution for Continual Knowledge Graph Embedding
CRAFT-GUI: Curriculum-Reinforced Agent For GUI Tasks
AIM-Bench: Evaluating Decision-making Biases of Agentic LLM as Inventory Manager
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Landmark-Assisted Monte Carlo Planning
Inspire or Predict? Exploring New Paradigms in Assisting Classical Planners with Large Language Models
A weighted U statistic for association analysis considering genetic heterogeneity
A Generalized Similarity U Test for Multivariate Analysis of Sequencing Data
A Weighted U Statistic for Genetic Association Analyses of Sequencing Data
Trees Assembling Mann Whitney Approach for Detecting Genome-wide Joint Association among Low Marginal Effect loci
Generalized Similarity U: A Non-parametric Test of Association Based on Similarity
FLUID: Flow-Latent Unified Integration via Token Distillation for Expert Specialization in Multimodal Learning
SDSNN: A Single-Timestep Spiking Neural Network with Self-Dropping Neuron and Bayesian Optimization
Multimodal Quantitative Measures for Multiparty Behaviour Evaluation
Managing the unexpected: Operator behavioural data and its value in predicting correct alarm responses
Human-AI collaboration or obedient and often clueless AI in instruct, serve, repeat dynamics?
gpt-oss-120b & gpt-oss-20b Model Card
Modeling and Detecting Company Risks from News: A Case Study in Bloomberg News
Apriel-Nemotron-15B-Thinker
Towards Efficient Prompt-based Continual Learning in Distributed Medical AI
ORBIT: An Object Property Reasoning Benchmark for Visual Inference Tasks
Retro-Expert: Collaborative Reasoning for Interpretable Retrosynthesis
Rule2Text: A Framework for Generating and Evaluating Natural Language Explanations of Knowledge Graph Rules
Not There Yet: Evaluating Vision Language Models in Simulating the Visual Perception of People with Low Vision
MCP-Guard: A Defense Framework for Model Context Protocol Integrity in Large Language Model Applications
Match & Choose: Model Selection Framework for Fine-tuning Text-to-Image Diffusion Models
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth
Deep Learning-Based Automated Segmentation of Uterine Myomas
CURE: Critical-Token-Guided Re-concatenation for Entropy-collapse Prevention
Beyond the Rosetta Stone: Unification Forces in Generalization Dynamics
Zono-Conformal Prediction: Zonotope-Based Uncertainty Quantification for Regression and Classification Tasks
Risk-Based Prognostics and Health Management
Note on Selection Bias in Observational Estimates of Algorithmic Progress
Learning with Confidence
AI That Helps Us Help Each Other: A Proactive System for Scaffolding Mentor-Novice Collaboration in Entrepreneurship Coaching
LD-LAudio-V1: Video-to-Long-Form-Audio Generation Extension with Dual Lightweight Adapters
Compressive Meta-Learning
Utilizing Vision-Language Models as Action Models for Intent Recognition and Assistance
Diffusion is a code repair operator and generator
Quantization through Piecewise-Affine Regularization: Optimization and Statistical Guarantees
Tabularis Formatus: Predictive Formatting for Tables
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
A Cross-Modal Rumor Detection Scheme via Contrastive Learning by Exploring Text and Image internal Correlations

Research Sources: 386 | Generated: 8/25/2025