AI RESEARCH PAPERS & ACADEMIC SOURCES
- From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios
- Learning an Ensemble Token from Task-driven Priors in Facial Analysis
- Taming Diffusion Transformer for Efficient Mobile Video Generation in Seconds
- tCURLoRA: Tensor CUR Decomposition Based Low-Rank Parameter Adaptation and Its Application in Medical Image Segmentation
- Towards autonomous photogrammetric forest inventory using a lightweight under-canopy robotic drone
- PolSAM: Polarimetric Scattering Mechanism Informed Segment Anything Model
- J-NeuS: Joint field optimization for Neural Surface reconstruction in urban scenes with limited image overlap
- Binary Diffusion Probabilistic Model
- RobustMerge: Parameter-Efficient Model Merging for MLLMs with Direction Robustness
- UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface
- DecAlign: Hierarchical Cross-Modal Alignment for Decoupled Multimodal Representation Learning
- A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models
- BoundMatch: Boundary detection applied to semi-supervised segmentation
- SuperEvent: Cross-Modal Learning of Event-based Keypoint Detection for SLAM
- BoxDreamer: Dreaming Box Corners for Generalizable Object Pose Estimation
- Neural Catalog: Scaling Species Recognition with Catalog of Life-Augmented Generation
- KDC-Diff: A Latent-Aware Diffusion Model with Knowledge Retention for Memory-Efficient Image Generation
- StPR: Spatiotemporal Preservation and Routing for Exemplar-Free Video Class-Incremental Learning
- Multi-View Projection for Unsupervised Domain Adaptation in 3D Semantic Segmentation
- How Do Large Vision-Language Models See Text in Image? Unveiling the Distinctive Role of OCR Heads
- Photography Perspective Composition: Towards Aesthetic Perspective Recommendation
- AVCD: Mitigating Hallucinations in Audio-Visual Large Language Models through Contrastive Decoding
- Fork-Merge Decoding: Enhancing Multimodal Understanding in Audio-Visual Large Language Models
- Anatomy-DT: A Cross-Diffusion Digital Twin for Anatomical Evolution
- Online Mapping for Autonomous Driving: Addressing Sensor Generalization and Dynamic Map Updates in Campus Environments
- LTA-L2S: Lexical Tone-Aware Lip-to-Speech Synthesis for Mandarin with Cross-Lingual Transfer Learning
- dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought
- Multi-modal Liver Segmentation and Fibrosis Staging Using Real-world MRI Images
- MR$^2$-Bench: Going Beyond Matching to Reasoning in Multimodal Retrieval
- GastroViT: A Vision Transformer Based Ensemble Learning Approach for Gastrointestinal Disease Classification with Grad CAM & SHAP Visualization
- Automated and Scalable SEM Image Analysis of Perovskite Solar Cell Materials via a Deep Segmentation Framework
- M$^{2}$SNet: Multi-scale in Multi-scale Subtraction Network for Medical Image Segmentation
- RealLiFe: Real-Time Light Field Reconstruction via Hierarchical Sparse Gradient Descent
- Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography
- Adaptive Modality Balanced Online Knowledge Distillation for Brain-Eye-Computer based Dim Object Detection
- LoRA-PT: Low-Rank Adapting UNETR for Hippocampus Segmentation Using Principal Tensor Singular Values and Vectors
- Rethinking Weak-to-Strong Augmentation in Source-Free Domain Adaptive Object Detection
- Multi-temporal crack segmentation in concrete structures using deep learning approaches
- ResGS: Residual Densification of 3D Gaussian for Efficient Detail Recovery
- PRISM: Progressive Rain removal with Integrated State-space Modeling
- Post-Training Quantization via Residual Truncation and Zero Suppression for Diffusion Models
- Multi-View Camera System for Variant-Aware Autonomous Vehicle Inspection and Defect Detection
- Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting
- CBAM Integrated Attention Driven Model For Betel Leaf Diseases Classification With Explainable AI
- Revealing the Power of Post-Training for Small Language Models via Knowledge Distillation
- DEPTHOR++: Robust Depth Enhancement from a Real-World Lightweight dToF and RGB Guidance
- Stable Cinemetrics : Structured Taxonomy and Evaluation for Professional Video Generation
- Autoproof: Automated Segmentation Proofreading for Connectomics
- DiffCamera: Arbitrary Refocusing on Images
- Video Object Segmentation-Aware Audio Generation
- Hy-Facial: Hybrid Feature Extraction by Dimensionality Reduction Methods for Enhanced Facial Expression Classification
- DA$^2$: Depth Anything in Any Direction
- HART: Human Aligned Reconstruction Transformer
- Benchmarking Egocentric Visual-Inertial SLAM at City Scale
- Query-Kontext: An Unified Multimodal Model for Image Generation and Editing
- TTT3R: 3D Reconstruction as Test-Time Training
- Challenges and Solutions in Selecting Optimal Lossless Data Compression Algorithms
- Geometric Learning of Canonical Parameterizations of $2D$-curves
- EasyOcc: 3D Pseudo-Label Supervision for Fully Self-Supervised Semantic Occupancy Prediction Models
- Predicting Penalty Kick Direction Using Multi-Modal Deep Learning with Pose-Guided Attention
- EchoGen: Generating Visual Echoes in Any Scene via Feed-Forward Subject-Driven Auto-Regressive Model
- Human-MME: A Holistic Evaluation Benchmark for Human-Centric Multimodal Large Language Models
- Beyond Overall Accuracy: Pose- and Occlusion-driven Fairness Analysis in Pedestrian Detection for Autonomous Driving
- TSalV360: A Method and Dataset for Text-driven Saliency Detection in 360-Degrees Videos
- Generalized Fine-Grained Category Discovery with Multi-Granularity Conceptual Experts
- IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance
- Interpret, prune and distill Donut : towards lightweight VLMs for VQA on document
- Seeing Space and Motion: Enhancing Latent Actions with Spatial and Dynamic Awareness for VLA
- Cat: Post-training quantization error reduction via cluster-based affine transformation
- Continuous Space-Time Video Super-Resolution with 3D Fourier Fields
- SQUARE: Semantic Query-Augmented Fusion and Efficient Batch Reranking for Training-free Zero-Shot Composed Image Retrieval
- Go with Your Gut: Scaling Confidence for Autoregressive Image Generation
- PANDA: Towards Generalist Video Anomaly Detection via Agentic AI Engineer
- MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation
- Image-Difficulty-Aware Evaluation of Super-Resolution Models
- LiDAR Point Cloud Colourisation Using Multi-Camera Fusion and Low-Light Image Enhancement
- MAPLE: Multi-scale Attribute-enhanced Prompt Learning for Few-shot Whole Slide Image Classification
- DeepSketcher: Internalizing Visual Manipulation for Multimodal Reasoning
- LLaVAShield: Safeguarding Multimodal Multi-Turn Dialogues in Vision-Language Models
- UniMMAD: Unified Multi-Modal and Multi-Class Anomaly Detection via MoE-Driven Feature Decompression
- Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation
- A Multi-purpose Tracking Framework for Salmon Welfare Monitoring in Challenging Environments
- PinPoint3D: Fine-Grained 3D Part Segmentation from a Few Clicks
- Towards Reliable and Holistic Visual In-Context Learning Prompt Selection
- AgenticIQA: An Agentic Framework for Adaptive and Interpretable Image Quality Assessment
- New Fourth-Order Grayscale Indicator-Based Telegraph Diffusion Model for Image Despeckling
- SETR: A Two-Stage Semantic-Enhanced Framework for Zero-Shot Composed Image Retrieval
- GeoLink: Empowering Remote Sensing Foundation Model with OpenStreetMap Data
- PatchVSR: Breaking Video Diffusion Resolution Limits with Patch-wise Video Super-Resolution
- Causally Guided Gaussian Perturbations for Out-Of-Distribution Generalization in Medical Imaging
- SGS: Segmentation-Guided Scoring for Global Scene Inconsistencies
- DGM4+: Dataset Extension for Global Scene Inconsistency
- AIMCoT: Active Information-driven Multimodal Chain-of-Thought for Vision-Language Reasoning
- How Diffusion Models Memorize
- ProbMed: A Probabilistic Framework for Medical Multimodal Binding
- SAGE: Spatial-visual Adaptive Graph Exploration for Visual Place Recognition
- LaTo: Landmark-tokenized Diffusion Transformer for Fine-grained Human Face Editing
- The 1st Solution for MOSEv1 Challenge on LSVOS 2025: CGFSeg
- LieHMR: Autoregressive Human Mesh Recovery with $SO(3)$ Diffusion
- Dragging with Geometry: From Pixels to Geometry-Guided Image Editing
- IPDRecon: Image-Plane Geometric Decoding for View-Invariant Indoor Scene Reconstruction
- ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On
- PCPO: Proportionate Credit Policy Optimization for Aligning Image Generation Models
- Self-Evolving Vision-Language Models for Image Quality Assessment via Voting and Ranking
- EchoingECG: An Electrocardiogram Cross-Modal Model for Echocardiogram Tasks
- Adapting SAM with Dynamic Similarity Graphs for Few-Shot Parameter-Efficient Small Dense Object Detection: A Case Study of Chickpea Pods in Field Conditions
- Overview of GeoLifeCLEF 2023: Species Composition Prediction with High Spatial Resolution at Continental Scale Using Remote Sensing
- MuSLR: Multimodal Symbolic Logical Reasoning
- PatchEAD: Unifying Industrial Visual Prompting Frameworks for Patch-Exclusive Anomaly Detection
- Seeing Through Deception: Uncovering Misleading Creator Intent in Multimodal News with Vision-Language Models
- ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding
- Taming the Tri-Space Tension: ARC-Guided Hallucination Modeling and Control for Text-to-Image Generation
- LUMA: Low-Dimension Unified Motion Alignment with Dual-Path Anchoring for Text-to-Motion Diffusion Model
- Editing Physiological Signals in Videos Using Latent Representations
- DepthLM: Metric Depth From Vision Language Models
- Infrastructure Sensor-enabled Vehicle Data Generation using Multi-Sensor Fusion for Proactive Safety Applications at Work Zone
- Seeing Before Reasoning: A Unified Framework for Generalizable and Explainable Fake Image Detection
- Robust Visual Localization in Compute-Constrained Environments by Salient Edge Rendering and Weighted Hamming Similarity
- FishNet++: Analyzing the capabilities of Multimodal Large Language Models in marine biology
- GaussianLens: Localized High-Resolution Reconstruction via On-Demand Gaussian Densification
- LMOD+: A Comprehensive Multimodal Dataset and Benchmark for Developing and Evaluating Multimodal Large Language Models in Ophthalmology
- Anchor-free Cross-view Object Geo-localization with Gaussian Position Encoding and Cross-view Association
- DescribeEarth: Describe Anything for Remote Sensing Images
- OmniDFA: A Unified Framework for Open Set Synthesis Image Detection and Few-Shot Attribution
- Cut the Deadwood Out: Backdoor Purification via Guided Module Substitution
- SAFE-SQL: Self-Augmented In-Context Learning with Fine-grained Example Selection for Text-to-SQL
- Safety is Not Only About Refusal: Reasoning-Enhanced Fine-tuning for Interpretable LLM Safety
- A Position Paper on the Automatic Generation of Machine Learning Leaderboards
- Wolf Hidden in Sheep's Conversations: Toward Harmless Data-Based Backdoor Attacks for Jailbreaking Large Language Models
- Frankentext: Stitching random text fragments into long-form narratives
- v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
- ConfRAG: Confidence-Guided Retrieval-Augmenting Generation
- TableRAG: A Retrieval Augmented Generation Framework for Heterogeneous Document Reasoning
- Medical Red Teaming Protocol of Language Models: On the Importance of User Perspectives in Healthcare Settings
- Text-Based Approaches to Item Alignment to Content Standards in Large-Scale Reading & Writing Tests
- CreAgentive: An Agent Workflow Driven Multi-Category Creative Generation Engine
- dParallel: Learnable Parallel Decoding for dLLMs
- BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
- Training Matryoshka Mixture-of-Experts for Elastic Inference-Time Expert Utilization
- Towards Reliable Benchmarking: A Contamination Free, Controllable Evaluation Framework for Multi-step LLM Function Calling
- Generating Difficult-to-Translate Texts
- Scaling Spoken Language Models with Syllabic Speech Tokenization
- ActorDB: A Unified Database Model Integrating Single-Writer Actors, Incremental View Maintenance, and Zero-Trust Messaging
- Fingerprinting LLMs via Prompt Injection
- FinCap: Topic-Aligned Captions for Short-Form Financial YouTube Videos
- A Multimodal LLM Approach for Visual Question Answering on Multiparametric 3D Brain MRI
- VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs
- ProfVLM: A Lightweight Video-Language Model for Multi-View Proficiency Estimation
- Exploring the Impact of Data Quantity on ASR in Extremely Low-resource Languages
- Aristotle: Mastering Logical Reasoning with A Logic-Complete Decompose-Search-Resolve Framework
- Vocabulary Customization for Efficient Domain-Specific LLM Deployment
- The Hunger Game Debate: On the Emergence of Over-Competition in Multi-Agent Systems
- CliniBench: A Clinical Outcome Prediction Benchmark for Generative and Encoder-Based Language Models
- MGen: Millions of Naturally Occurring Generics in Context
- Explaining novel senses using definition generation with open language models
- VietBinoculars: A Zero-Shot Approach for Detecting Vietnamese LLM-Generated Text
- Optimizing Speech Language Models for Acoustic Consistency
- One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
- Latent Thinking Optimization: Your Latent Reasoning Language Model Secretly Encodes Reward Signals in its Latent Thoughts
- Fast-dLLM v2: Efficient Block-Diffusion LLM
- An Annotation Scheme for Factuality and its Application to Parliamentary Proceedings
- Automatic Fact-checking in English and Telugu
- DyFlow: Dynamic Workflow Framework for Agentic Reasoning
- The Silent Judge: Unacknowledged Shortcut Bias in LLM-as-a-Judge
- Limited Preference Data? Learning Better Reward Model with Latent Space Synthesis
- IMProofBench: Benchmarking AI on Research-Level Mathematical Proof Generation
- Reinforced Strategy Optimization for Conversational Recommender Systems via Network-of-Experts
- The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks
- Mitigating Biases in Language Models via Bias Unlearning
- Atomic Thinking of LLMs: Decoupling and Exploring Mathematical Reasoning Abilities
- CATCH: A Novel Data Synthesis Framework for High Therapy Fidelity and Memory-Driven Planning Chain of Thought in AI Counseling
- Assessing Algorithmic Bias in Language-Based Depression Detection: A Comparison of DNN and LLM Approaches
- ReTAG: Retrieval-Enhanced, Topic-Augmented Graph-Based Global Sensemaking
- Personalized Scientific Figure Caption Generation: An Empirical Study on Author-Specific Writing Style Transfer
- Believing without Seeing: Quality Scores for Contextualizing Vision-Language Model Explanations
- ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
- ASR Under Noise: Exploring Robustness for Sundanese and Javanese
- Mem-{\alpha}: Learning Memory Construction via Reinforcement Learning
- Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel
- Bringing Emerging Architectures to Sequence Labeling in NLP
- Reliability Crisis of Reference-free Metrics for Grammatical Error Correction
- RAGferee: Building Contextual Reward Models for Retrieval-Augmented Generation
- RE$^2$: Improving Chinese Grammatical Error Correction via Retrieving Appropriate Examples with Explanation
- Unspoken Hints: Accuracy Without Acknowledgement in LLM Reasoning
- RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection
- The Rise of AfricaNLP: Contributions, Contributors, and Community Impact (2005-2025)
- Beyond WER: Probing Whisper's Sub-token Decoder Across Diverse Language Resource Levels
- Performance and competence intertwined: A computational model of the Null Subject stage in English-speaking children
- Don't Sweat the Small Stuff: Segment-Level Meta-Evaluation Based on Pairwise Difference Correlation
- Transformers through the lens of support-preserving maps between measures
- The Media Bias Detector: A Framework for Annotating and Analyzing the News at Scale
- QFrBLiMP: a Quebec-French Benchmark of Linguistic Minimal Pairs
- Apple: Toward General Active Perception via Reinforcement Learning
- A quantitative analysis of semantic information in deep representations of text and images
- GIM: Improved Interpretability for Large Language Models
- Discovering and Steering Interpretable Concepts in Large Generative Music Models
- tenSVD algorithm for compression
- A Copula Based Supervised Filter for Feature Selection in Diabetes Risk Prediction Using Machine Learning
- KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
- LoLA: Low-Rank Linear Attention With Sparse Caching
- Regularizing Learnable Feature Extraction for Automatic Speech Recognition
- TADA: Improved Diffusion Sampling with Training-free Augmented Dynamics
- Regret Analysis of Posterior Sampling-Based Expected Improvement for Bayesian Optimization
- BEDTime: A Unified Benchmark for Automatically Describing Time Series
- Asymptotic Classification Error for Heavy-Tailed Renewal Processes
- Efficient Fairness-Performance Pareto Front Computation
- Brain Tumor Classification on MRI in Light of Molecular Markers
- Information Design with Unknown Prior
- Unified Cross-Modal Image Synthesis with Hierarchical Mixture of Product-of-Experts
- Watermark under Fire: A Robustness Evaluation of LLM Watermarking
- LargeAD: Large-Scale Cross-Sensor Data Pretraining for Autonomous Driving
- Complexity Analysis of Normalizing Constant Estimation: from Jarzynski Equality to Annealed Importance Sampling and beyond
- Surrogate models for diffusion on graphs via sparse polynomials
- Scalable Fingerprinting of Large Language Models
- Controllable Motion Generation via Diffusion Modal Coupling
- A Review on Riemannian Metric Learning: Closer to You than You Imagine
- Approximation properties of neural ODEs
- ThinkEdit: Interpretable Weight Editing to Mitigate Overly Short Thinking in Reasoning Models
- Neural Kinematic Bases for Fluids
- AutoJudge: Judge Decoding Without Manual Annotation
- Fast Likelihood-Free Parameter Estimation for L\'evy Processes
- Using Knowledge Graphs to harvest datasets for efficient CLIP model training
- Detecting Instruction Fine-tuning Attacks on Language Models using Influence Function
- Neural Multivariate Regression: Qualitative Insights from the Unconstrained Feature Model
- AutoRAN: Automated Hijacking of Safety Reasoning in Large Reasoning Models
- What Can RL Bring to VLA Generalization? An Empirical Study
- Feature-aware Hypergraph Generation via Next-Scale Prediction
- On Fitting Flow Models with Large Sinkhorn Couplings
- Flatness After All?
- IMPACT: Importance-Aware Activation Space Reconstruction
- Towards Robust Surrogate Models: Benchmarking Machine Learning Approaches to Expediting Phase Field Simulations of Brittle Fracture
- The Serial Scaling Hypothesis
- FlowCast-ODE: Continuous Hourly Weather Forecasting with Dynamic Flow Matching and ODE Solver
- Bengali Document Layout Analysis -- A YOLOV8 Based Ensembling Approach
- Minimum Description Feature Selection for Complexity Reduction in Machine Learning-based Wireless Positioning
- Fast training of accurate physics-informed neural networks without gradient descent
- RobustNeuralNetworks.jl: a Package for Machine Learning and Data-Driven Control with Certified Robustness
- Efficiently Escaping Saddle Points for Policy Optimization
- Model Extraction Attacks Revisited
- Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics
- Complexity Reduction in Machine Learning-Based Wireless Positioning: Minimum Description Features
- From Mean to Extreme: Formal Differential Privacy Bounds on the Success of Real-World Data Reconstruction Attacks
- FedGCS: A Generative Framework for Efficient Client Selection in Federated Learning via Gradient-based Optimization
- Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
- Amelia: A Large Dataset and Model for Airport Surface Movement Forecasting
- Large-Scale Targeted Cause Discovery via Learning from Simulated Data
- Towards Convexity in Anomaly Detection: A New Formulation of SSLM with Unique Optimal Solutions
- Information-Geometric Barycenters for Bayesian Federated Learning
- Learning Theory for Kernel Bilevel Optimization
- Teaching Metric Distance to Discrete Autoregressive Language Models
- InverseBench: Benchmarking Plug-and-Play Diffusion Priors for Inverse Problems in Physical Sciences
- Understanding Formal Reasoning Failures in LLMs as Abstract Interpreters
- Beyond Next Token Probabilities: Learnable, Fast Detection of Hallucinations and Data Contamination on LLM Output Distributions
- Identifying and Evaluating Inactive Heads in Pretrained LLMs
- TrackCore-F: Deploying Transformer-Based Subatomic Particle Tracking on FPGAs
- Are neural scaling laws leading quantum chemistry astray?
- TrackFormers Part 2: Enhanced Transformer-Based Models for High-Energy Physics Track Reconstruction
- An Orthogonal Learner for Individualized Outcomes in Markov Decision Processes
- Stabilization of nonlinear systems with unknown delays via delay-adaptive neural operator approximate predictors
- Contrastive Diffusion Guidance for Spatial Inverse Problems
- Signal-Aware Workload Shifting Algorithms with Uncertainty-Quantified Predictors
- Ferret-UI Lite: Lessons from Building Small On-Device GUI Agents
- Towards Verified Code Reasoning by LLMs
- Pretrain-Test Task Alignment Governs Generalization in In-Context Learning
- Estimating Dimensionality of Neural Representations from Finite Samples
- DeepProv: Behavioral Characterization and Repair of Neural Networks via Inference Provenance Graph Analysis
- Source Separation for A Cappella Music
- DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively
- Convergence and Divergence of Language Models under Different Random Seeds
- Max-Sliced Wasserstein Distance and its use for GANs
- CO3: Contrasting Concepts Compose Better
- Scaling Equilibrium Propagation to Deeper Neural Network Architectures
- BALLAST: Bayesian Active Learning with Look-ahead Amendment for Sea-drifter Trajectories under Spatio-Temporal Vector Fields
- GaussEdit: Adaptive 3D Scene Editing with Text and Image Prompts
- Text-to-Scene with Large Reasoning Models
- Efficient Distributed Training via Dual Batch Sizes and Cyclic Progressive Learning
- EVODiff: Entropy-aware Variance Optimized Diffusion Inference
- Ordinal Label-Distribution Learning with Constrained Asymmetric Priors for Imbalanced Retinal Grading
- Non-Vacuous Generalization Bounds: Can Rescaling Invariances Help?
- Benchmarking Diarization Models
- Self-supervised learning for phase retrieval
- The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
- Hybrid Quantum-Classical Optimisation of Traveling Salesperson Problem
- Why is topology hard to learn?
- PRPO: Paragraph-level Policy Optimization for Vision-Language Deepfake Detection
- FLOWER: A Flow-Matching Solver for Inverse Problems
- Ultra-Reliable Risk-Aggregated Sum Rate Maximization via Model-Aided Deep Learning
- TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
- Personalized Auto-Grading and Feedback System for Constructive Geometry Tasks Using Large Language Models on an Online Math Platform
- Enhancing Split Learning with Sharded and Blockchain-Enabled SplitFed Approaches
- Conservative Decisions with Risk Scores
- MetaChest: Generalized few-shot learning of patologies from chest X-rays
- Coupling Generative Modeling and an Autoencoder with the Causal Bridge
- RFG: Test-Time Scaling for Diffusion Large Language Model Reasoning with Reward-Free Guidance
- When Langevin Monte Carlo Meets Randomization: Non-asymptotic Error Bounds beyond Log-Concavity and Gradient Lipschitzness
- Generalized Contrastive Learning for Universal Multimodal Retrieval
- Using Images from a Video Game to Improve the Detection of Truck Axles
- Importance Sampling for Multi-Negative Multimodal Direct Preference Optimization
- Transformer-Based Rate Prediction for Multi-Band Cellular Handsets
- Test time training enhances in-context learning of nonlinear functions
- Detecting Hope Across Languages: Multiclass Classification for Positive Online Discourse
- SAC Flow: Sample-Efficient Reinforcement Learning of Flow-Based Policies via Velocity-Reparameterized Sequential Modeling
- Sharpness of Minima in Deep Matrix Factorization: Exact Expressions
- Logo-VGR: Visual Grounded Reasoning for Open-world Logo Recognition
- RoBiologyDataChoiceQA: A Romanian Dataset for improving Biology understanding of Large Language Models
- Better Privilege Separation for Agents by Restricting Data Types
- AccidentBench: Benchmarking Multimodal Understanding and Reasoning in Vehicle Accidents and Beyond
- SPATA: Systematic Pattern Analysis for Detailed and Transparent Data Cards
- Understanding Practitioners Perspectives on Monitoring Machine Learning Systems
- Cyclic Ablation: Testing Concept Localization against Functional Regeneration in AI
- RANGER -- Repository-Level Agent for Graph-Enhanced Retrieval
- Evaluating the Impact of Radiographic Noise on Chest X-ray Semantic Segmentation and Disease Classification Using a Scalable Noise Injection Framework
- Position-Blind Ptychography: Viability of image reconstruction via data-driven variational inference
- Mechanisms of Matter: Language Inferential Benchmark on Physicochemical Hypothesis in Materials Synthesis
- Aspects of holographic entanglement using physics-informed-neural-networks
- Bayesian Transformer for Pan-Arctic Sea Ice Concentration Mapping and Uncertainty Estimation using Sentinel-1, RCM, and AMSR2 Data
- Neural Optimal Transport Meets Multivariate Conformal Prediction
- SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA
- Fair Classification by Direct Intervention on Operating Characteristics
- Scalable Boltzmann Generators for equilibrium sampling of large-scale materials
- One-shot Conditional Sampling: MMD meets Nearest Neighbors
- AGNOMIN - Architecture Agnostic Multi-Label Function Name Prediction
- Defeating Cerberus: Concept-Guided Privacy-Leakage Mitigation in Multimodal Language Models
- LLM-Assisted Emergency Triage Benchmark: Bridging Hospital-Rich and MCI-Like Field Simulation
- Data-to-Energy Stochastic Dynamics
- Refine Drugs, Don't Complete Them: Uniform-Source Discrete Flows for Fragment-Based Drug Discovery
- Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning
- fev-bench: A Realistic Benchmark for Time Series Forecasting
- DiVeQ: Differentiable Vector Quantization Using the Reparameterization Trick
- Equivariance by Local Canonicalization: A Matter of Representation
- Entropy After $\langle \texttt{/Think} \rangle$ for reasoning model early exiting
- Machine-Learning Driven Load Shedding to Mitigate Instability Attacks in Power Grids
- The Loss Kernel: A Geometric Probe for Deep Learning Interpretability
- TASP: Topology-aware Sequence Parallelism
- Bayesian Influence Functions for Hessian-Free Data Attribution
- Importance of localized dilatation and distensibility in identifying determinants of thoracic aortic aneurysm with neural operators
- Linking Process to Outcome: Conditional Reward Modeling for LLM Reasoning
- Clarification as Supervision: Reinforcement Learning for Vision-Language Interfaces
- Uncertainty Quantification for Regression using Proper Scoring Rules
- Recursive Self-Aggregation Unlocks Deep Thinking in Large Language Models
- Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models
- Alignment-Aware Decoding
- Neighbor-aware informal settlement mapping with graph convolutional networks
- PDE Solvers Should Be Local: Fast, Stable Rollouts with Learned Local Stencils
- Marginal Flow: a flexible and efficient framework for density estimation
- Thinking-Free Policy Initialization Makes Distilled Reasoning Models More Effective and Efficient Reasoners
- Machine Learning Detection of Lithium Plating in Lithium-ion Cells: A Gaussian Process Approach
- Beyond Linear Probes: Dynamic Safety Monitoring for Language Models
- From Fragile to Certified: Wasserstein Audits of Group Fairness Under Distribution Shift
- Wasserstein Distributionally Robust Optimization Through the Lens of Structural Causal Models and Individual Fairness
- Reframing Generative Models for Physical Systems using Stochastic Interpolants
- Tuning the Tuner: Introducing Hyperparameter Optimization for Auto-Tuning
- NeuroTTT: Bridging Pretraining-Downstream Task Misalignment in EEG Foundation Models via Test-Time Training
- Attribution-Guided Decoding
- A Review on Single-Problem Multi-Attempt Heuristic Optimization
- ACE: Adapting sampling for Counterfactual Explanations
- A Generalized Information Bottleneck Theory of Deep Learning
- FedMuon: Federated Learning with Bias-corrected LMO-based Optimization
- Memory-Driven Self-Improvement for Decision Making with Large Language Models
- Improving Sampling Efficiency in RLVR through Adaptive Rollout and Response Reuse
- Decentralized Asynchronous Multi-player Bandits
- Kairos: Towards Adaptive and Generalizable Time Series Foundation Models
- MIDAS: Misalignment-based Data Augmentation Strategy for Imbalanced Multimodal Learning
- RL-Guided Data Selection for Language Model Finetuning
- Federated Learning with Enhanced Privacy via Model Splitting and Random Client Participation
- ReNF: Rethinking the Design Space of Neural Long-Term Time Series Forecasters
- Reevaluating Convolutional Neural Networks for Spectral Analysis: A Focus on Raman Spectroscopy
- Exact Solutions to the Quantum Schr\"odinger Bridge Problem
- CAST: Continuous and Differentiable Semi-Structured Sparsity-Aware Training for Large Language Models
- Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access
- FITS: Towards an AI-Driven Fashion Information Tool for Sustainability
- Stealthy Yet Effective: Distribution-Preserving Backdoor Attacks on Graph Classification
- Scaling Up Temporal Domain Generalization via Temporal Experts Averaging
- Clip-Low Increases Entropy and Clip-High Decreases Entropy in Reinforcement Learning of Large Language Models
- UncertainGen: Uncertainty-Aware Representations of DNA Sequences for Metagenomic Binning
- Domain-Aware Hyperdimensional Computing for Edge Smart Manufacturing
- Accelerating Transformers in Online RL
- Guiding Mixture-of-Experts with Temporal Multimodal Interactions
- Minimalist Explanation Generation and Circuit Discovery
- A Unified Probabilistic Framework for Dictionary Learning with Parsimonious Activation
- Can VLM Pseudo-Labels Train a Time-Series QA Model That Outperforms the VLM?
- Physics-Informed Learning for Human Whole-Body Kinematics Prediction via Sparse IMUs
- Adaptive Graph Coarsening for Efficient GNN Training
- Expert Merging: Model Merging with Unsupervised Expert Alignment and Importance-Guided Layer Chunking
- Reweighted Flow Matching via Unbalanced OT for Label-free Long-tailed Generation
- MuPlon: Multi-Path Causal Optimization for Claim Verification through Controlling Confounding
- Beyond Point Estimates: Likelihood-Based Full-Posterior Wireless Localization
- A Physics-Guided Probabilistic Surrogate Modeling Framework for Digital Twins of Underwater Radiated Noise
- Less is More: Towards Simple Graph Contrastive Learning
- Rotation Control Unlearning: Quantifying and Controlling Continuous Unlearning for LLM with The Cognitive Rotation Space
- OPPO: Accelerating PPO-based RLHF via Pipeline Overlap
- Online Decision Making with Generative Action Sets
- A Hamiltonian driven Geometric Construction of Neural Networks on the Lognormal Statistical Manifold
- From Cheap Geometry to Expensive Physics: Elevating Neural Operators via Latent Shape Pretraining
- Characterization and Learning of Causal Graphs with Latent Confounders and Post-treatment Selection from Interventional Data
- Scalable Disk-Based Approximate Nearest Neighbor Search with Page-Aligned Graph
- Can Molecular Foundation Models Know What They Don't Know? A Simple Remedy with Preference Optimization
- EEsizer: LLM-Based AI Agent for Sizing of Analog and Mixed Signal Circuit
- World Model for AI Autonomous Navigation in Mechanical Thrombectomy
- Flow Matching with Semidiscrete Couplings
- Meta-Router: Bridging Gold-standard and Preference-based Evaluations in Large Language Model Routing
- Lightweight and Robust Federated Data Valuation
- Safe In-Context Reinforcement Learning
- Machine Learning Algorithms for Improving Black Box Optimization Solvers
- Binary Sparse Coding for Interpretability
- Effective Model Pruning
- Layer-wise dynamic rank for compressing large language models
- Swift: An Autoregressive Consistency Model for Efficient Weather Forecasting
- How Does Preconditioning Guide Feature Learning in Deep Neural Networks?
- Deep set based operator learning with uncertainty quantification
- Growing Winning Subnetworks, Not Pruning Them: A Paradigm for Density Discovery in Sparse Neural Networks
- Nudging the Boundaries of LLM Reasoning
- Norm-Q: Effective Compression Method for Hidden Markov Models in Neuro-Symbolic Applications
- Conformal Prediction for Signal Temporal Logic Inference
- MSCoD: An Enhanced Bayesian Updating Framework with Multi-Scale Information Bottleneck and Cooperative Attention for Structure-Based Drug Design
- Integrated Forecasting of Marine Renewable Power: An Adaptively Bayesian-Optimized MVMD-LSTM Framework for Wind-Solar-Wave Energy
- Simple, Fast and Efficient Injective Manifold Density Estimation with Random Projections
- WDformer: A Wavelet-based Differential Transformer Model for Time Series Forecasting
- Sampling via Gaussian Mixture Approximations
- Fine-tuning of Large Language Models for Domain-Specific Cybersecurity Knowledge
- Heterogeneous Multi-agent Collaboration in UAV-assisted Mobile Crowdsensing Networks
- MAESTRO : Adaptive Sparse Attention and Robust Learning for Multimodal Dynamic Time Series
- Optimisation of Resource Allocation in Heterogeneous Wireless Networks Using Deep Reinforcement Learning
- Gradient Descent with Large Step Sizes: Chaos and Fractal Convergence Region
- Deep Survival Analysis for Competing Risk Modeling with Functional Covariates and Missing Data Imputation
- On the Shape of Latent Variables in a Denoising VAE-MoG: A Posterior Sampling-Based Study
- Crowdsourcing Without People: Modelling Clustering Algorithms as Experts
- Multi-Task Equation Discovery
- Leveraging Vulnerabilities in Temporal Graph Neural Networks via Strategic High-Impact Assaults
- Feedback Control for Small Budget Pacing
- SOLD: SELFIES-based Objective-driven Latent Diffusion
- VLHSA: Vision-Language Hierarchical Semantic Alignment for Jigsaw Puzzle Solving with Eroded Gaps
- Polynomial Contrastive Learning for Privacy-Preserving Representation Learning on Graphs
- Hyperbolic Optimization
- DPSformer: A long-tail-aware model for improving heavy rainfall prediction
- LEMs: A Primer On Large Execution Models
- Anomaly detection by partitioning of multi-variate time series
- Evaluating Double Descent in Machine Learning: Insights from Tree-Based Models Applied to a Genomic Prediction Task
- On The Dynamic Ensemble Selection for TinyML-based Systems -- a Preliminary Study
- Sensor optimization for urban wind estimation with cluster-based probabilistic framework
- AMLA: MUL by ADD in FlashAttention Rescaling
- Mind the Gap: A Review of Arabic Post-Training Datasets and Their Limitations
- The Impact of Language Mixing on Bilingual LLM Reasoning
- Measuring the Measures: Discriminative Capacity of Representational Similarity Metrics Across Model Families
- Long-Horizon Visual Imitation Learning via Plan and Code Reflection
- Multi Layered Autonomy and AI Ecologies in Robotic Art Installations
- Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation
- Static Word Embeddings for Sentence Semantic Representation
- VITA: Zero-Shot Value Functions via Test-Time Adaptation of Vision-Language Models
- QGuard:Question-based Zero-shot Guard for Multi-modal LLM Safety
- FeDa4Fair: Client-Level Federated Datasets for Fairness Evaluation
- When Does Multimodality Lead to Better Time Series Forecasting?
- SoMi-ToM: Evaluating Multi-Perspective Theory of Mind in Embodied Social Interactions
- DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift
- Deep Graph Learning for Industrial Carbon Emission Analysis and Policy Impact
- LATTE: Latent Trajectory Embedding for Diffusion-Generated Image Detection
- HumanVideo-MME: Benchmarking MLLMs for Human-Centric Video Understanding
- On the Effectiveness of Methods and Metrics for Explainable AI in Remote Sensing Image Scene Classification
- CADmium: Fine-Tuning Code Language Models for Text-Driven Sequential CAD Design
- Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
- TensorRL-QAS: Reinforcement learning with tensor networks for improved quantum architecture search
- Modeling Saliency Dataset Bias
- DisCO: Reinforcing Large Reasoning Models with Discriminative Constrained Optimization
- scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data
- Structured Agent Distillation for Large Language Model
- ELEPHANT: Measuring and understanding social sycophancy in LLMs
- Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
- DEBATE, TRAIN, EVOLVE: Self Evolution of Language Model Reasoning
- Causal Interventions Reveal Shared Structure Across English Filler-Gap Constructions
- AudioTrust: Benchmarking the Multifaceted Trustworthiness of Audio Large Language Models
- Find the Fruit: Zero-Shot Sim2Real RL for Occlusion-Aware Plant Manipulation
- LLM Agents for Interactive Exploration of Historical Cadastre Data: Framework and Application to Venice
- Value-Guided Search for Efficient Chain-of-Thought Reasoning
- SelfReflect: Can LLMs Communicate Their Internal Answer Distribution?
- Finite Sample Analysis of Linear Temporal Difference Learning with Arbitrary Features
- ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models
- Personalized Subgraph Federated Learning with Differentiable Auxiliary Projections
- Linear Attention for Efficient Bidirectional Sequence Modeling
- Adaptive Conformal Guidance for Learning under Uncertainty
- Recent Advances in Large Langauge Model Benchmarks against Data Contamination: From Static to Dynamic Evaluation
- Voting or Consensus? Decision-Making in Multi-Agent Debate
- FANformer: Improving Large Language Models Through Effective Periodicity Modeling
- Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models
- Hierarchical Balance Packing: Towards Efficient Supervised Fine-tuning for Long-Context LLM
- Rethinking Diffusion Model in High Dimension
- Revisiting semi-supervised learning in the era of foundation models
- A Survey on SAR ship classification using Deep Learning
- FW-Merging: Scaling Model Merging with Frank-Wolfe Optimization
- Value Profiles for Encoding Human Variation
- CODA: Repurposing Continuous VAEs for Discrete Tokenization
- Enabling Rapid Shared Human-AI Mental Model Alignment via the After-Action Review
- Lobster: A GPU-Accelerated Framework for Neurosymbolic Programming
- Adaptive Rectification Sampling for Test-Time Compute Scaling
- Fair Uncertainty Quantification for Depression Prediction
- Stochastic Layer-wise Learning: Scalable and Efficient Alternative to Backpropagation
- Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
- SSTP: Efficient Sample Selection for Trajectory Prediction
- Composing Global Solutions to Reasoning Tasks via Algebraic Objects in Neural Nets
- FAN: Fourier Analysis Networks
- pEBR: A Probabilistic Approach to Embedding Based Retrieval
- Unlocking Transfer Learning for Open-World Few-Shot Recognition
- BianCang: A Traditional Chinese Medicine Large Language Model
- Learning Semantic Association Rules from Internet of Things Data
- Dynamic Pricing in High-Speed Railways Using Multi-Agent Reinforcement Learning
- Dagger Behind Smile: Fool LLMs with a Happy Ending Story
- A space-decoupling framework for optimization on bounded-rank matrices with orthogonally invariant constraints
- CE-SDWV: Effective and Efficient Concept Erasure for Text-to-Image Diffusion Models via a Semantic-Driven Word Vocabulary
- LFTR: Learning-Free Token Reduction for Multimodal Large Language Models
- Should You Use Your Large Language Model to Explore or Exploit?
- Dual Alignment Maximin Optimization for Offline Model-based RL
- iVISPAR -- An Interactive Visual-Spatial Reasoning Benchmark for VLMs
- Towards Reasoning Ability of Small Language Models
- Solving the Cold Start Problem on One's Own as an End User via Preference Transfer
- A physical approach to qualia and the emergence of conscious observers in qualia space
- Medical Question Summarization with Entity-driven Contrastive Learning
- Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance
- Fast Exact Unlearning for In-Context Learning Data for LLMs
- scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding
- Bird Eye-View to Street-View: A Survey
- Pretrained Hybrids with MAD Skills
- Preemptive Detection and Correction of Misaligned Actions in LLM Agents
- Investigating Long-term Training for Remote Sensing Object Detection
- The Quest for the Right Mediator: Surveying Mechanistic Interpretability Through the Lens of Causal Mediation Analysis
- MENLO: From Preferences to Proficiency - Evaluating and Modeling Native-like Quality Across 47 Languages
- Searching for Difficult-to-Translate Test Examples at Scale
- Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training
- Learning Generalizable Shape Completion with SIM(3) Equivariance
- OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
- Stitch: Training-Free Position Control in Multimodal Diffusion Transformers
- Sparse View Tomographic Reconstruction of Elongated Objects using Learned Primal-Dual Networks
- Efficient Dynamic Ensembling for Multiple LLM Experts
- FastCoder: Accelerating Repository-level Code Generation via Efficient Retrieval and Verification
- Memorize or Generalize? Evaluating LLM Code Generation with Code Rewriting
- Teaching AI to Handle Exceptions: Supervised Fine-Tuning with Human-Aligned Judgment
- TAMO: Fine-Grained Root Cause Analysis via Tool-Assisted LLM Agent with Multi-Modality Observation Data in Cloud-Native Systems
- AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and Challenges
- Survey: Multi-Armed Bandits Meet Large Language Models
- R1-Code-Interpreter: LLMs Reason with Code via Supervised and Multi-stage Reinforcement Learning
- VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments
- Ascent Fails to Forget
- AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
- ACT: Agentic Classification Tree
- Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search
- Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification
- On Deepfake Voice Detection - It's All in the Presentation
- Regression Language Models for Code
- VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
- Indoor/Outdoor Spectrum Sharing Enabled by GNSS-based Classifiers
- The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
- MUSE-Explainer: Counterfactual Explanations for Symbolic Music Graph Classification Models
- TAP: Two-Stage Adaptive Personalization of Multi-task and Multi-Modal Foundation Models in Federated Learning
- OceanGym: A Benchmark Environment for Underwater Embodied Agents
- The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models
- Parametric Neural Amp Modeling with Active Learning
- AI-assisted Advanced Propellant Development for Electric Propulsion
- Are Robust LLM Fingerprints Adversarially Robust?
- Deconstructing Self-Bias in LLM-generated Translation Benchmarks
- An Experimental Study on Generating Plausible Textual Explanations for Video Summarization
- 3DiFACE: Synthesizing and Editing Holistic 3D Facial Animation
- Sandbagging in a Simple Survival Bandit Problem
- Finetune Once: Decoupling General & Domain Learning with Dynamic Boosted Annealing
- Point2RBox-v3: Self-Bootstrapping from Point Annotations via Integrated Pseudo-Label Refinement and Utilization
- Representation-Based Data Quality Audits for Audio
- Noise-Guided Transport for Imitation Learning
- QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization
- Feedback Forensics: A Toolkit to Measure AI Personality
- LLM-MCoX: Large Language Model-based Multi-robot Coordinated Exploration and Search
- EditReward: A Human-Aligned Reward Model for Instruction-Guided Image Editing
- SoK: Systematic analysis of adversarial threats against deep learning approaches for autonomous anomaly detection systems in SDN-IoT networks
- TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos
- Vector-Valued Reproducing Kernel Banach Spaces for Neural Networks and Operators
- SDA-PLANNER: State-Dependency Aware Adaptive Planner for Embodied Task Planning
- Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
- Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
- SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
- Real-time Noise Detection and Classification in Single-Channel EEG: A Lightweight Machine Learning Approach for EMG, White Noise, and EOG Artifacts
- On Computing Top-$k$ Simple Shortest Paths from a Single Source
- End-to-End Aspect-Guided Review Summarization at Scale
- Enhancing PINN Performance Through Lie Symmetry Group
- AGOCS -- Accurate Google Cloud Simulator Framework
- Leveraging AI modelling for FDS with Simvue: monitor and optimise for more sustainable simulations
- OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
- Bubble, Bubble, AI's Rumble: Why Global Financial Regulatory Incident Reporting is Our Shield Against Systemic Stumbles
- EntroPE: Entropy-Guided Dynamic Patch Encoder for Time Series Forecasting
- Towards Continual Expansion of Data Coverage: Automatic Text-guided Edge-case Synthesis
- Auto-ARGUE: LLM-Based Report Generation Evaluation
- AttriGen: Automated Multi-Attribute Annotation for Blood Cell Datasets
- Optimizing Indoor Environmental Quality in Smart Buildings Using Deep Learning
- Toward an Unbiased Collective Memory for Efficient LLM-Based Agentic 6G Cross-Domain Management
- Comparative Analysis of Ant Colony Optimization and Google OR-Tools for Solving the Open Capacitated Vehicle Routing Problem in Logistics
- Beyond Pixels: Efficient Dataset Distillation via Sparse Gaussian Representation
- Type-Less yet Type-Aware Inductive Link Prediction with Pretrained Language Models
- PerQ: Efficient Evaluation of Multilingual Text Personalization Quality
- User-Centric Communication Service Provision for Edge-Assisted Mobile Augmented Reality
- Accelerating LLM Inference with Precomputed Query Storage
- The Impact of Scaling Training Data on Adversarial Robustness
- From MNIST to ImageNet: Understanding the Scalability Boundaries of Differentiable Logic Gate Networks
- AIM: Adaptive Intervention for Deep Multi-task Learning of Molecular Properties
- Data-Free Continual Learning of Server Models in Model-Heterogeneous Federated learning
- Reconcile Certified Robustness and Accuracy for DNN-based Smoothed Majority Vote Classifier
- R-Log: Incentivizing Log Analysis Capability in LLMs via Reasoning-based Reinforcement Learning
- MHINDR - a DSM5 based mental health diagnosis and recommendation framework using LLM
- VRWKV-Editor: Reducing quadratic complexity in transformer-based video editing
- Learning Egocentric In-Hand Object Segmentation through Weak Supervision from Human Narrations
- PFDepth: Heterogeneous Pinhole-Fisheye Joint Depth Estimation via Distortion-aware Gaussian-Splatted Volumetric Fusion
- Indirect Attention: Turning Context Misalignment into a Feature
- Muon Outperforms Adam in Tail-End Associative Memory Learning
- SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP
- CEAID: Benchmark of Multilingual Machine-Generated Text Detection Methods for Central European Languages
- Editable Noise Map Inversion: Encoding Target-image into Noise For High-Fidelity Image Manipulation
- Point-It-Out: Benchmarking Embodied Reasoning for Vision Language Models in Multi-Stage Visual Grounding
- Better with Less: Small Proprietary Models Surpass Large Language Models in Financial Transaction Understanding
- CardioForest: An Explainable Ensemble Learning Model for Automatic Wide QRS Complex Tachycardia Diagnosis from ECG
- Learning to Reason as Action Abstractions with Scalable Mid-Training RL
- VELA: An LLM-Hybrid-as-a-Judge Approach for Evaluating Long Image Captions
- Overthinking Reduction with Decoupled Rewards and Curriculum Data Scheduling
- Supporting Creative Ownership through Deep Learning-Based Music Variation
- Distillation of Large Language Models via Concrete Score Matching
- RAE: A Neural Network Dimensionality Reduction Method for Nearest Neighbors Preservation in Vector Search
- S$^2$FS: Spatially-Aware Separability-Driven Feature Selection in Fuzzy Decision Systems
- Training-Free Reward-Guided Image Editing via Trajectory Optimal Control
- More Thought, Less Accuracy? On the Dual Nature of Reasoning in Vision-Language Models
- Knapsack RL: Unlocking Exploration of LLMs via Optimizing Budget Allocation
- Vector sketch animation generation with differentialable motion trajectories
- Efficient On-Policy Reinforcement Learning via Exploration of Sparse Parameter Space
- scUnified: An AI-Ready Standardized Resource for Single-Cell RNA Sequencing Analysis
- RoleConflictBench: A Benchmark of Role Conflict Scenarios for Evaluating LLMs' Contextual Sensitivity
- Capacity-Net-Based RIS Precoding Design without Channel Estimation for mmWave MIMO System
- Deep Reinforcement Learning-Based Precoding for Multi-RIS-Aided Multiuser Downlink Systems with Practical Phase Shift
- EEG-based AI-BCI Wheelchair Advancement: Hybrid Deep Learning with Motor Imagery for Brain Computer Interface
- LD-MoLE: Learnable Dynamic Routing for Mixture of LoRA Experts
- Annotation-Efficient Active Test-Time Adaptation with Conformal Prediction
- HNote: Extending YNote with Hexadecimal Encoding for Fine-Tuning LLMs in Music Modeling
- DeepCodeSeek: Real-Time API Retrieval for Context-Aware Code Generation
- The AI Productivity Index (APEX)
- Towards A Universally Transferable Acceleration Method for Density Functional Theory
- Boundary-to-Region Supervision for Offline Safe Reinforcement Learning
- Controlled Generation for Private Synthetic Text
- Think Less, Label Better: Multi-Stage Domain-Grounded Synthetic Data Generation for Fine-Tuning Large Language Models in Telecommunications
- Dolphin v1.0 Technical Report
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
- Free Lunch Alignment of Text-to-Image Diffusion Models without Preference Image Pairs
- V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
- Autonomy-Aware Clustering: When Local Decisions Supersede Global Prescriptions
- LLM-RG: Referential Grounding in Outdoor Scenarios using Large Language Models
- MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources
- Calibrating Verbalized Confidence with Self-Generated Distractors
- VISOR++: Universal Visual Inputs based Steering for Large Vision Language Models
- Self-Rewarding Rubric-Based Reinforcement Learning for Open-Ended Reasoning
- Steering an Active Learning Workflow Towards Novel Materials Discovery via Queue Prioritization
- Toxicity in Online Platforms and AI Systems: A Survey of Needs, Challenges, Mitigations, and Future Directions
- Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play
- Aligning Multilingual Reasoning with Verifiable Semantics from a High-Resource Expert Model
- Hybrid Approach for Enhancing Lesion Segmentation in Fundus Images
- Probing the Limits of Stylistic Alignment in Vision-Language Models
- AttentionViG: Cross-Attention-Based Dynamic Neighbor Aggregation in Vision GNNs
- K-Prism: A Knowledge-Guided and Prompt Integrated Universal Medical Image Segmentation Model
- Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN
- Quadratic Programming Approach for Nash Equilibrium Computation in Multiplayer Imperfect-Information Games
- STAC: When Innocent Tools Form Dangerous Chains to Jailbreak LLM Agents
- BaB-prob: Branch and Bound with Preactivation Splitting for Probabilistic Verification of Neural Networks
- YOLO-Based Defect Detection for Metal Sheets
- Data-Efficient Multitask DAgger
- Discontinuous Epitope Fragments as Sufficient Target Templates for Efficient Binder Design
- Translation from Wearable PPG to 12-Lead ECG
- EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
- Not Wrong, But Untrue: LLM Overconfidence in Document-Based Queries
- DeepFake Detection in Dyadic Video Calls using Point of Gaze Tracking
- XR Blocks: Accelerating Human-centered AI + XR Innovation
- Economic Competition, EU Regulation, and Executive Orders: A Framework for Discussing AI Policy Implications in CS Courses
- FlashOmni: A Unified Sparse Attention Engine for Diffusion Transformers
- From Faithfulness to Correctness: Generative Reward Models that Think Critically
- Rethinking Parameter Sharing for LLM Fine-Tuning with Multiple LoRAs
- Emotion-Aligned Generation in Diffusion Text to Speech Models via Preference-Guided Optimization
- Polychromic Objectives for Reinforcement Learning
- Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
- Joint Embeddings Go Temporal
- Multi-patch isogeometric neural solver for partial differential equations on computer-aided design domains
- PIPer: On-Device Environment Setup via Online Reinforcement Learning
- Effectiveness of Large Language Models in Simulating Regional Psychological Structures: An Empirical Examination of Personality and Subjective Well-being
- Artificial Authority: From Machine Minds to Political Alignments. An Experimental Analysis of Democratic and Autocratic Biases in Large-Language Models
- ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation
- A Measurement Study of Model Context Protocol
- AI in Pakistani Schools: Adoption, Usage, and Perceived Impact among Educators
- Learning Relationships Between Separate Audio Tracks for Creative Applications
- Automatically Generating Web Applications from Requirements Via Multi-Agent Test-Driven Development
- Scaling Behaviors of LLM Reinforcement Learning Post-Training: An Empirical Study in Mathematical Reasoning
- Uncertainty-Aware Generative Oversampling Using an Entropy-Guided Conditional Variational Autoencoder
- VisualOverload: Probing Visual Understanding of VLMs in Really Dense Scenes
- From Internal Representations to Text Quality: A Geometric Approach to LLM Evaluation
- Generative Value Conflicts Reveal LLM Priorities
- Cold-Start Active Correlation Clustering
- Let Physics Guide Your Protein Flows: Topology-aware Unfolding and Generation
- Predicting Training Re-evaluation Curves Enables Effective Data Curriculums for LLMs
- SpinBench: Perspective and Rotation as a Lens on Spatial Reasoning in VLMs
- A Deep Learning Approach for Spatio-Temporal Forecasting of InSAR Ground Deformation in Eastern Ireland
- A Cartography of Open Collaboration in Open Source AI: Mapping Practices, Motivations, and Governance in 14 Open Large Language Model Projects
- Reinforcement Learning-Guided Chain-of-Draft for Token-Efficient Code Generation
- Comprehensive Analysis of VQC for Financial Fraud Detection: A Comparative Study of Quantum Encoding Techniques and Architectural Optimizations
- Protocode: Prototype-Driven Interpretability for Code Generation in LLMs
- BuildBench: Benchmarking LLM Agents on Compiling Real-World Open-Source Software
- BEV-VLM: Trajectory Planning via Unified BEV Abstraction
- Knowledge distillation through geometry-aware representational alignment
- The Sandbox Configurator: A Framework to Support Technical Assessment in AI Regulatory Sandboxes
- Artificial Intelligence-Powered Assessment Framework for Skill-Oriented Engineering Lab Education
- How Effective Are Time-Series Models for Rainfall Nowcasting? A Comprehensive Benchmark for Rainfall Nowcasting Incorporating PWV Data
- From NL2SQL to NL2GeoSQL: GeoSQL-Eval for automated evaluation of LLMs on PostGIS queries
- Cognifying Education: Mapping AI's transformative role in emotional, creative, and collaborative learning
- Dynamic Policy Induction for Adaptive Prompt Optimization: Bridging the Efficiency-Accuracy Gap via Lightweight Reinforcement Learning
- A Weather Foundation Model for the Power Grid
- InfMasking: Unleashing Synergistic Information by Contrastive Multimodal Interactions
- DNABERT-2: Fine-Tuning a Genomic Language Model for Colorectal Gene Enhancer Classification
- VoiceBridge: Designing Latent Bridge Models for General Speech Restoration at Scale
- Devstral: Fine-tuning Language Models for Coding Agent Applications
- APRIL: API Synthesis with Automatic Prompt Optimization and Reinforcement Learning
- Towards Repository-Level Program Verification with Large Language Models
- Generating High-Quality Datasets for Code Editing via Open-Source Language Models
- Spectral Logit Sculpting: Adaptive Low-Rank Logit Transformation for Controlled Text Generation
- Multi-level Diagnosis and Evaluation for Robust Tabular Feature Engineering with Large Language Models
- STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting
- Six Sigma For Neural Networks: Taguchi-based optimization
- On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
- Learning to Condition: A Neural Heuristic for Scalable MPE Inference
- Enhancing Linear Attention with Residual Learning
- Energy Guided Geometric Flow Matching
- FedCLF - Towards Efficient Participant Selection for Federated Learning in Heterogeneous IoV Networks
- Machine Learning for Pattern Detection in Printhead Nozzle Logging
- Quantum est in Libris: Navigating Archives with GenAI, Uncovering Tension Between Preservation and Innovation
- PALADIN: Self-Correcting Language Model Agents to Cure Tool-Failure Cases
- HAMMER: Hamiltonian Curiosity Augmented Large Language Model Reinforcement
- A Benchmark for Localizing Code and Non-Code Issues in Software Projects
- Zero-Shot Decentralized Federated Learning
- Extreme Self-Preference in Language Models
- STaR-Attack: A Spatio-Temporal and Narrative Reasoning Attack Framework for Unified Multimodal Understanding and Generation Models
- The Average Patient Fallacy
- TVS Sidekick: Challenges and Practical Insights from Deploying Large Language Models in the Enterprise
- Combining Knowledge Graphs and NLP to Analyze Instant Messaging Data in Criminal Investigations
- OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
- SCUBA: Salesforce Computer Use Benchmark
- Rearchitecting Datacenter Lifecycle for AI: A TCO-Driven Framework
- HilbertA: Hilbert Attention for Image Generation with Diffusion Models
- Probing the Critical Point (CritPt) of AI Reasoning: a Frontier Physics Research Benchmark
- Fairness Testing in Retrieval-Augmented Generation: How Small Perturbations Reveal Bias in Small Language Models
- Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning
- TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance
- Branching Out: Broadening AI Measurement and Evaluation with Measurement Trees
- Diagnosing and Addressing Pitfalls in KG-RAG Datasets: Toward More Reliable Benchmarking
- FMIP: Joint Continuous-Integer Flow For Mixed-Integer Linear Programming
- AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
- Beyond the Algorithm: A Field Guide to Deploying AI Agents in Clinical Practice
- 90% Faster, 100% Code-Free: MLLM-Driven Zero-Code 3D Game Development
- 'Too much alignment; not enough culture': Re-balancing cultural alignment practices in LLMs
- LLM Agents for Knowledge Discovery in Atomic Layer Processing
- Human-Centered Evaluation of RAG outputs: a framework and questionnaire for human-AI collaboration
- Diversity-Incentivized Exploration for Versatile Reasoning
- Benchmarking Deep Learning Convolutions on Energy-constrained CPUs
- SlimPack: Fine-Grained Asymmetric Packing for Balanced and Efficient Variable-Length LLM Training
- ExoPredicator: Learning Abstract Models of Dynamic Worlds for Robot Planning
- Interactive Learning for LLM Reasoning
- AI Playing Business Games: Benchmarking Large Language Models on Managerial Decision-Making in Dynamic Simulations
- SafeBehavior: Simulating Human-Like Multistage Reasoning to Mitigate Jailbreak Attacks in Large Language Models
- How Far Do Time Series Foundation Models Paint the Landscape of Real-World Benchmarks ?
- Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents
- MC-GNNAS-Dock: Multi-criteria GNN-based Algorithm Selection for Molecular Docking
- Commmunication-Efficient and Accurate Approach for Aggregation in Federated Low-Rank Adaptation
- OntoAligner Meets Knowledge Graph Embedding Aligners
- Transformer Classification of Breast Lesions: The BreastDCEDL_AMBL Benchmark Dataset and 0.92 AUC Baseline
- CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search
- Lita: Light Agent Uncovers the Agentic Coding Capabilities of LLMs
- SafeMind: Benchmarking and Mitigating Safety Risks in Embodied LLM Agents
- DeepJSONEval: Benchmarking Complex Nested JSON Data Mining for Large Language Models
- KIRETT: Smart Integration of Vital Signs Data for Intelligent Decision Support in Rescue Scenarios
- Quantitative Evaluation of KIRETT Wearable Demonstrator for Rescue Operations
- Boosting Process-Correct CoT Reasoning by Modeling Solvability of Multiple-Choice QA
- NuRisk: A Visual Question Answering Dataset for Agent-Level Risk Assessment in Autonomous Driving
- Automated Model Discovery via Multi-modal & Multi-step Pipeline
- RoRecomp: Enhancing Reasoning Efficiency via Rollout Response Recomposition in Reinforcement Learning
- Scalable and Robust LLM Unlearning by Correcting Responses with Retrieved Exclusions
- Towards Unified Multimodal Misinformation Detection in Social Media: A Benchmark Dataset and Baseline
- Towards Human Engagement with Realistic AI Combat Pilots
- CoLLM-NAS: Collaborative Large Language Models for Efficient Knowledge-Guided Neural Architecture Search
- Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
- SafeEvalAgent: Toward Agentic and Self-Evolving Safety Evaluation of LLMs
- MEDAKA: Construction of Biomedical Knowledge Graphs Using Large Language Models
- LMILAtt: A Deep Learning Model for Depression Detection from Social Media Users Enhanced by Multi-Instance Learning Based on Attention Mechanism
- On Explaining Proxy Discrimination and Unfairness in Individual Decisions Made by AI Systems
- GroundSight: Augmenting Vision-Language Models with Grounding Information and De-hallucination
- SING-SQL: A Synthetic Data Generation Framework for In-Domain Text-to-SQL Translation
- Collaborative Compression for Large-Scale MoE Deployment on Edge
- ScheduleMe: Multi-Agent Calendar Assistant
- Cooperative Autonomous Driving in Diverse Behavioral Traffic: A Heterogeneous Graph Reinforcement Learning Approach
- NePTune: A Neuro-Pythonic Framework for Tunable Compositional Reasoning on Vision-Language
- Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
- Galton's Law of Mediocrity: Why Large Language Models Regress to the Mean and Fail at Creativity in Advertising
- Planner-R1: Reward Shaping Enables Efficient Agentic RL with Smaller LLMs
- Deontic Argumentation
- PUREVQ-GAN: Defending Data Poisoning Attacks through Vector-Quantized Bottlenecks
- Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search
- HiStyle: Hierarchical Style Embedding Predictor for Text-Prompt-Guided Controllable Speech Synthesis
- ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
- Aging Decline in Basketball Career Trend Prediction Based on Machine Learning and LSTM Model
- RadOnc-GPT: An Autonomous LLM Agent for Real-Time Patient Outcomes Labeling at Scale
- Learning to Interact in World Latent for Team Coordination
- Evaluating Foundation Models with Pathological Concept Learning for Kidney Cancer
- A(I)nimism: Re-enchanting the World Through AI-Mediated Object Interaction
- Radiology's Last Exam (RadLE): Benchmarking Frontier Multimodal AI Against Human Experts and a Taxonomy of Visual Reasoning Errors in Radiology
- IRIS: Intrinsic Reward Image Synthesis
- Skip-It? Theoretical Conditions for Layer Skipping in Vision-Language Models
- ATLAS: Constraints-Aware Multi-Agent Collaboration for Real-World Travel Planning
- Building the EHR Foundation Model via Next Event Prediction
- Causal Autoencoder-like Generation of Feedback Fuzzy Cognitive Maps with an LLM Agent
- Hybrid Reward Normalization for Process-supervised Non-verifiable Agentic Tasks
- Echoes of Humanity: Exploring the Perceived Humanness of AI Music
- A Framework for Studying AI Agent Behavior: Evidence from Consumer Choice Experiments
- SMS: Self-supervised Model Seeding for Verification of Machine Unlearning
- SOCK: A Benchmark for Measuring Self-Replication in Large Language Models
- AutoLabs: Cognitive Multi-Agent Systems with Self-Correction for Autonomous Chemical Experimentation
- Iterative Residual Cross-Attention Mechanism: An Integrated Approach for Audio-Visual Navigation Tasks
- Landmark-Guided Knowledge for Vision-and-Language Navigation
- Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents
- Spontaneous High-Order Generalization in Neural Theory-of-Mind Networks
- SynthPert: Enhancing LLM Biological Reasoning via Synthetic Reasoning Traces for Cellular Perturbation Prediction
- Structural Reward Model: Enhancing Interpretability, Efficiency, and Scalability in Reward Modeling
- Where LLM Agents Fail and How They can Learn From Failures
- From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
- Saliency Guided Longitudinal Medical Visual Question Answering
- Boolean Satisfiability via Imitation Learning
- Adaptive Test-Time Reasoning via Reward-Guided Dual-Phase Search
- RADAR: Reasoning-Ability and Difficulty-Aware Routing for Reasoning LLMs
- The Open Syndrome Definition
- GESA: Graph-Enhanced Semantic Allocation for Generalized, Fair, and Explainable Candidate-Role Matching
- DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
- Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition
- TDHook: A Lightweight Framework for Interpretability
- Message passing-based inference in an autoregressive active inference agent
- Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
- Beyond Static Retrieval: Opportunities and Pitfalls of Iterative Retrieval in GraphRAG
- Blueprint-Bench: Comparing spatial intelligence of LLMs, agents and image models
- The Causal Abstraction Network: Theory and Learning
- A Formal Comparison Between Chain-of-Thought and Latent Thought
- Neo-Grounded Theory: A Methodological Innovation Integrating High-Dimensional Vector Clustering and Multi-Agent Collaboration for Qualitative Research
- Memory Management and Contextual Consistency for Long-Running Low-Code Agents
- Fact Grounded Attention: Eliminating Hallucination in Large Language Models Through Attention Level Knowledge Integration
- Language Model Planning from an Information Theoretic Perspective
- RADAR: A Risk-Aware Dynamic Multi-Agent Framework for LLM Safety Evaluation via Role-Specialized Collaboration
- RL in the Wild: Characterizing RLVR Training in LLM Deployment
- Toward Causal-Visual Programming: Enhancing Agentic Reasoning in Low-Code Environments
- ID-RAG: Identity Retrieval-Augmented Generation for Long-Horizon Persona Coherence in Generative Agents
- Flash-Searcher: Fast and Effective Web Agents via DAG-Based Parallel Execution
Research Sources: 849 | Generated: 10/1/2025