AI RESEARCH PAPERS & ACADEMIC SOURCES
- Learning Association via Track-Detection Matching for Multi-Object Tracking
- ProEdit: Inversion-based Editing From Prompts Done Right
- See Less, See Right: Bi-directional Perceptual Shaping For Multimodal Reasoning
- A Graph-Augmented knowledge Distillation based Dual-Stream Vision Transformer with Region-Aware Attention for Gastrointestinal Disease Classification with Explainable AI
- Modified TSception for Analyzing Driver Drowsiness and Mental Workload from EEG
- RT-Focuser: A Real-Time Lightweight Model for Edge-side Image Deblurring
- The Color-Clinical Decoupling: Why Perceptual Calibration Fails Clinical Biomarkers in Smartphone Dermatology
- SketchPlay: Intuitive Creation of Physically Realistic VR Content with Gesture-Driven Sketching
- Co-Teaching for Unsupervised Domain Adaptation and Expansion
- Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
- AlignFreeNet: Is Cross-Modal Pre-Alignment Necessary? An End-to-End Alignment-Free Lightweight Network for Visible-Infrared Object Detection
- Multi-Part Object Representations via Graph Structures and Co-Part Discovery
- Total Normal Curvature Regularization and its Minimization for Surface and Image Smoothing
- Non-Contrast CT Esophageal Varices Grading through Clinical Prior-Enhanced Multi-Organ Analysis
- D2Pruner: Debiased Importance and Structural Diversity for MLLM Token Pruning
- Efficient Vision Mamba for MRI Super-Resolution via Hybrid Selective Scanning
- AlignAR: Generative Sentence Alignment for Arabic-English Parallel Corpora of Legal and Literary Texts
- TimeBill: Time-Budgeted Inference for Large Language Models
- Explainable Statute Prediction via Attention-based Model and LLM Prompting
- Accelerate Speculative Decoding with Sparse Computation in Verification
- SWE-RM: Execution-free Feedback For Software Engineering Agents
- Broken Words, Broken Performance: Effect of Tokenization on Performance of LLMs
- Self-attention vector output similarities reveal how machines pay attention
- Context as a Tool: Context Management for Long-Horizon SWE-Agents
- Toward Secure and Compliant AI: Organizational Standards and Protocols for NLP Model Lifecycle Management
- MAD: Multi-Alignment MEG-to-Text Decoding
- Understanding Virality: A Rubric based Vision-Language Model Framework for Short-Form Edutainment Evaluation
- IMA++: ISIC Archive Multi-Annotator Dermoscopic Skin Lesion Segmentation Dataset
- Generative Multi-Focus Image Fusion
- SVBench: Evaluation of Video Generation Models on Social Reasoning
- Fixed-Budget Parameter-Efficient Training with Frozen Encoders Improves Multimodal Chest X-Ray Classification
- Fixed-Threshold Evaluation of a Hybrid CNN-ViT for AI-Generated Image Detection Across Photos and Art
- MuS-Polar3D: A Benchmark Dataset for Computational Polarimetric 3D Imaging under Multi-Scattering Conditions
- Vision Transformers are Circulant Attention Learners
- EraseLoRA: MLLM-Driven Foreground Exclusion and Background Subtype Aggregation for Dataset-Free Object Removal
- Toward Intelligent Scene Augmentation for Context-Aware Object Placement and Sponsor-Logo Integration
- LLM-Free Image Captioning Evaluation in Reference-Flexible Settings
- UltraLBM-UNet: Ultralight Bidirectional Mamba-based Model for Skin Lesion Segmentation
- From Shallow Humor to Metaphor: Towards Label-Free Harmful Meme Detection via LMM Agent Self-Improvement
- GaussianEM: Model compositional and conformational heterogeneity using 3D Gaussians
- TAMEing Long Contexts in Personalization: Towards Training-Free and State-Aware MLLM Personalized Assistant
- CausalFSFG: Rethinking Few-Shot Fine-Grained Visual Categorization from Causal Perspective
- SymDrive: Realistic and Controllable Driving Simulator via Symmetric Auto-regressive Online Restoration
- Training-Free Disentangled Text-Guided Image Editing via Sparse Latent Constraints
- Omni-Weather: Unified Multimodal Foundation Model for Weather Generation and Understanding
- UniPercept: Towards Unified Perceptual-Level Image Understanding across Aesthetics, Quality, Structure, and Texture
- Contrastive Graph Modeling for Cross-Domain Few-Shot Medical Image Segmentation
- SlideChain: Semantic Provenance for Lecture Understanding via Blockchain Registration
- Analyzing the Mechanism of Attention Collapse in VGGT from a Dynamics Perspective
- ShinyNeRF: Digitizing Anisotropic Appearance in Neural Radiance Fields
- Prior-AttUNet: Retinal OCT Fluid Segmentation Based on Normal Anatomical Priors and Attention Gating
- FUSE: Unifying Spectral and Semantic Cues for Robust AI-Generated Image Detection
- Spatiotemporal-Untrammelled Mixture of Experts for Multi-Person Motion Prediction
- RAPTOR: Real-Time High-Resolution UAV Video Prediction with Efficient Video Attention
- AstraNav-World: World Model for Foresight Control and Consistency
- Knot Forcing: Taming Autoregressive Video Diffusion Models for Real-time Infinite Interactive Portrait Animation
- SyncAnyone: Implicit Disentanglement via Progressive Self-Correction for Lip-Syncing in the wild
- Scene-VLM: Multimodal Video Scene Segmentation via Vision-Language Models
- AI for Mycetoma Diagnosis in Histopathological Images: The MICCAI 2024 Challenge
- Diffusion Posterior Sampling for Super-Resolution under Gaussian Measurement Noise
- End-to-End 3D Spatiotemporal Perception with Multimodal Fusion and V2X Collaboration
- Breaking Alignment Barriers: TPS-Driven Semantic Correlation Learning for Alignment-Free RGB-T Salient Object Detection
- Fast Inference of Visual Autoregressive Model with Adjacency-Adaptive Dynamical Draft Trees
- Training-free Conditional Image Embedding Framework Leveraging Large Vision Language Models
- EasyOmnimatte: Taming Pretrained Inpainting Diffusion Models for End-to-End Video Layered Decomposition
- DPAR: Dynamic Patchification for Efficient Autoregressive Visual Generation
- SLIM-Brain: A Data- and Training-Efficient Foundation Model for fMRI Data Analysis
- Reloc-VGGT: Visual Re-localization with Geometry Grounded Transformer
- CrownGen: Patient-customized Crown Generation via Point Diffusion Model
- High-Fidelity and Long-Duration Human Image Animation with Diffusion Transformer
- Patch as Node: Human-Centric Graph Representation Learning for Multimodal Action Recognition
- Automated Discovery of Parsimonious Spectral Indices via Normalized Difference Polynomials
- Perceive and Calibrate: Analyzing and Enhancing Robustness of Medical Multi-Modal Large Language Models
- A Lightweight Multi-Scale Attention Framework for Real-Time Spinal Endoscopic Instance Segmentation
- iSHIFT: Lightweight Slow-Fast GUI Agent with Adaptive Perception
- Patch-Discontinuity Mining for Generalized Deepfake Detection
- Backdoor Attacks on Prompt-Driven Video Segmentation Foundation Models
- MAI-UI Technical Report: Real-World Centric Foundation GUI Agents
- Yume-1.5: A Text-Controlled Interactive World Generation Model
- GQ-VAE: A gated quantized VAE for learning variable length tokens
- Exploring the Heterogeneity of Tabular Data: A Diversity-aware Data Generator via LLMs
- Hybrid Combinatorial Multi-armed Bandits with Probabilistically Triggered Arms
- DuaDeep-SeqAffinity: Dual-Stream Deep Learning Framework for Sequence-Only Antigen-Antibody Affinity Prediction
- HWL-HIN: A Hypergraph-Level Hypergraph Isomorphism Network as Powerful as the Hypergraph Weisfeiler-Lehman Test with Application to Higher-Order Network Robustness
- Direction Finding with Sparse Arrays Based on Variable Window Size Spatial Smoothing
- Why Smooth Stability Assumptions Fail for ReLU Learning
- Scaling Adversarial Training via Data Selection
- Explainable Multimodal Regression via Information Decomposition
- Harnessing Data Spaces to Build Intelligent Smart City Infrastructures Across the Cloud-Edge Continuum
- Sensitivity Analysis of the Consistency Assumption
- Deep learning-enhanced dual-mode multiplexed optical sensor for point-of-care diagnostics of cardiovascular diseases
- Learning to Reconfigure: Using Device Status to Select the Right Constrained Coding Scheme
- A Tool Bottleneck Framework for Clinically-Informed and Interpretable Medical Image Understanding
- Cerberus: Multi-Agent Reasoning and Coverage-Guided Exploration for Static Detection of Runtime Errors
- Scalable Deep Subspace Clustering Network
- Dynamic Attention (DynAttn): Interpretable High-Dimensional Spatio-Temporal Forecasting (with Application to Conflict Fatalities)
- Fuzzwise: Intelligent Initial Corpus Generation for Fuzzing
- An approach to Fisher-Rao metric for infinite dimensional non-parametric information geometry
- CCAD: Compressed Global Feature Conditioned Anomaly Detection
- Quantum Nondecimated Wavelet Transform: Theory, Circuits, and Applications
- nncase: An End-to-End Compiler for Efficient LLM Deployment on Heterogeneous Storage Architectures
- Incorporating rank-free coupling and external field via an amplitude-only modulated spatial photonic Ising machine
- Quantitative Verification of Omega-regular Properties in Probabilistic Programming
- Semantic Codebooks as Effective Priors for Neural Speech Compression
- The Deepfake Detective: Interpreting Neural Forensics Through Sparse Features and Manifolds
- Assessing the Effectiveness of Membership Inference on Generative Music
- BertsWin: Resolving Topological Sparsity in 3D Masked Autoencoders via Component-Balanced Structural Optimization
- Few Tokens Matter: Entropy Guided Attacks on Vision-Language Models
- Tilt Matching for Scalable Sampling and Fine-Tuning
- Scalable Class-Incremental Learning Based on Parametric Neural Collapse
- AutoPP: Towards Automated Product Poster Generation and Optimization
- Data relativistic uncertainty framework for low-illumination anime scenery image enhancement
- Modeling high dimensional point clouds with the spherical cluster model
- Look Closer! An Adversarial Parametric Editing Framework for Hallucination Mitigation in VLMs
- Prefill vs. Decode Bottlenecks: SRAM-Frequency Tradeoffs and the Memory-Bandwidth Ceiling
- A Frobenius-Optimal Projection for Enforcing Linear Conservation in Learned Dynamical Models
- Revisiting Bi-Encoder Neural Search: An Encoding--Searching Separation Perspective
- HopCast: Calibration of Autoregressive Dynamics Models
- Bias-variance decompositions: the exclusive privilege of Bregman divergences
- Surrogate Representation Inference for Text and Image Annotations
- Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models
- Generative Language Models on Nucleotide Sequences of Human Genes
- Robust Federated Learning in Unreliable Wireless Networks: A Client Selection Approach
- Beyond Heuristics: A Decision-Theoretic Framework for Agent Memory Management
- Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM
- Rethinking Sample Polarity in Reinforcement Learning with Verifiable Rewards
- Heaven-Sent or Hell-Bent? Benchmarking the Intelligence and Defectiveness of LLM Hallucinations
- MoRAgent: Parameter Efficient Agent Tuning with Mixture-of-Roles
- Ara-HOPE: Human-Centric Post-Editing Evaluation for Dialectal Arabic to Modern Standard Arabic Translation
- On The Conceptualization and Societal Impact of Cross-Cultural Bias
- Method Decoration (DeMe): A Framework for LLM-Driven Adaptive Method Generation in Dynamic IoT Environments
- Knowledge Reasoning of Large Language Models Integrating Graph-Structured Information for Pest and Disease Control in Tobacco
- LibContinual: A Comprehensive Library towards Realistic Continual Learning
- From In Silico to In Vitro: Evaluating Molecule Generative Models for Hit Generation
- StreamAvatar: Streaming Diffusion Models for Real-Time Interactive Human Avatars
- Unifying Learning Dynamics and Generalization in Transformers Scaling Law
- Introducing TrGLUE and SentiTurca: A Comprehensive Benchmark for Turkish General Language Understanding and Sentiment Analysis
- A2P-Vis: an Analyzer-to-Presenter Agentic Pipeline for Visual Insights Generation and Reporting
- Agentic Structured Graph Traversal for Root Cause Analysis of Code-related Incidents in Cloud Applications
- Creative Agents: Empowering Agents with Imagination for Creative Tasks
- Pre-training Vision Transformers with Formula-driven Supervised Learning
- SCALA: Split Federated Learning with Concatenated Activations and Logit Adjustments
- GroupDebate: Enhancing the Efficiency of Multi-Agent Debate Using Group Discussion
- An Exploration of Higher Education Course Evaluation by Large Language Models
- A Causal Lens for Evaluating Faithfulness Metrics
- Physics-Informed Neural Solvers for Periodic Quantum Eigenproblems
- A Reinforcement Learning Approach to Synthetic Data Generation
- kooplearn: A Scikit-Learn Compatible Library of Algorithms for Evolution Operator Learning
- A Survey of Freshness-Aware Wireless Networking with Reinforcement Learning
- DeepCQ: General-Purpose Deep-Surrogate Framework for Lossy Compression Quality Prediction
- An Equivariance Toolbox for Learning Dynamics
- RLLaVA: An RL-central Framework for Language and Vision Assistants
- Statistical vs. Deep Learning Models for Estimating Substance Overdose Excess Mortality in the US
- When Bayesian Tensor Completion Meets Multioutput Gaussian Processes: Functional Universality and Rank Learning
- Missing Pattern Tree based Decision Grouping and Ensemble for Deep Incomplete Multi-View Clustering
- Perplexity-Aware Data Scaling Law: Perplexity Landscapes Predict Performance for Continual Pre-training
- Global-Graph Guided and Local-Graph Weighted Contrastive Learning for Unified Clustering on Incomplete and Noise Multi-View Data
- First Provable Guarantees for Practical Private FL: Beyond Restrictive Assumptions
- Generative Actor Critic
- AVP-Fusion: Adaptive Multi-Modal Fusion and Contrastive Learning for Two-Stage Antiviral Peptide Identification
- Discovering Sparse Recovery Algorithms Using Neural Architecture Search
- AnchorGK: Anchor-based Incremental and Stratified Graph Learning Framework for Inductive Spatio-Temporal Kriging
- RefineBridge: Generative Bridge Models Improve Financial Forecasting by Foundation Models
- Videos are Sample-Efficient Supervisions: Behavior Cloning from Videos via Latent Representations
- Robustness and Scalability Of Machine Learning for Imbalanced Clinical Data in Emergency and Critical Care
- A Data-Driven Multi-Objective Approach for Predicting Mechanical Performance, Flowability, and Porosity in Ultra-High-Performance Concrete (UHPC)
- MAD-NG: Meta-Auto-Decoder Neural Galerkin Method for Solving Parametric Partial Differential Equations
- Mechanical Strength Prediction of Steel-Polypropylene Fiber-based High-Performance Concrete Using Hybrid Machine Learning Algorithms
- Causal-HM: Restoring Physical Generative Logic in Multimodal Anomaly Detection via Hierarchical Modulation
- Rethinking Output Alignment For 1-bit Post-Training Quantization of Large Language Models
- Dictionary-Transform Generative Adversarial Networks
- Dynamic Feedback Engines: Layer-Wise Control for Self-Regulating Continual Learning
- Approximation Capabilities of Feedforward Neural Networks with GELU Activations
- VAMP-Net: An Interpretable Multi-Path Framework of Genomic Permutation-Invariant Set Attention and Quality-Aware 1D-CNN for MTB Drug Resistance
- Synthetic Financial Data Generation for Enhanced Financial Modelling
- Smart IoT-Based Leak Forecasting and Detection for Energy-Efficient Liquid Cooling in AI Data Centers
- SpatialBench: Can Agents Analyze Real-World Spatial Biology Data?
- Pruning as a Game: Equilibrium-Driven Sparsification of Neural Networks
- EcoNet: Multiagent Planning and Control Of Household Energy Resources Using Active Inference
- Atomistic Simulation Guided Convolutional Neural Networks for Thermal Modeling of Friction Stir Welding
- Query Carefully: Detecting the Unanswerables in Text-to-SQL Tasks
- Fairness Is Not Just Ethical: Performance Trade-Off via Data Correlation Tuning to Mitigate Bias in ML Software
- CosmoCore-Evo: Evolutionary Dream-Replay Reinforcement Learning for Adaptive Code Generation
- Multi-Agent LLM Committees for Autonomous Software Beta Testing
- Reflection-Driven Control for Trustworthy Code Agents
- AInsteinBench: Benchmarking Coding Agents on Scientific Repositories
- Safe Path Planning and Observation Quality Enhancement Strategy for Unmanned Aerial Vehicles in Water Quality Monitoring Tasks
- LLM-Driven Feature-Level Adversarial Attacks on Android Malware Detectors
- Teaching People LLM's Errors and Getting it Right
- Morality is Contextual: Learning Interpretable Moral Contexts from Human Data with Probabilistic Clustering and Large Language Models
- dUltra: Ultra-Fast Diffusion Language Models via Reinforcement Learning
- Intelligent recognition of GPR road hidden defect images based on feature fusion and attention mechanism
- GPF-Net: Gated Progressive Fusion Learning for Polyp Re-Identification
- Efficient MoE Inference with Fine-Grained Scheduling of Disaggregated Expert Parallelism
- Oogiri-Master: Benchmarking Humor Understanding via Oogiri
- MotionTeller: Multi-modal Integration of Wearable Time-Series with LLMs for Health and Behavioral Understanding
- DiverseGRPO: Mitigating Mode Collapse in Image Generation via Diversity-Aware GRPO
- Selective LLM-Guided Regularization for Enhancing Recommendation Models
- Hierarchy-Aware Fine-Tuning of Vision-Language Models
- Human-AI Interaction Alignment: Designing, Evaluating, and Evolving Value-Centered AI For Reciprocal Human-AI Futures
- Bidirectional Human-AI Alignment in Education for Trustworthy Learning Environments
- Exploration of Reproducible Generated Image Detection
- Towards Long-window Anchoring in Vision-Language Model Distillation
- A Unified Definition of Hallucination, Or: It's the World Model, Stupid
- Residual Prior Diffusion: A Probabilistic Framework Integrating Coarse Latent Priors with Diffusion Models
- LLM-I2I: Boost Your Small Item2Item Recommendation Model with Large Language Model
- TrackTeller: Temporal Multimodal 3D Grounding for Behavior-Dependent Object References
- Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search
- Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with a Generalist Foundation Model and Multimodal Database
- Structural Induced Exploration for Balanced and Scalable Multi-Robot Path Planning
- Near-Optimal Coalition Structures in Polynomial Time
- Comparative Analysis of Deep Learning Models for Perception in Autonomous Vehicles
- RIPCN: A Road Impedance Principal Component Network for Probabilistic Traffic Flow Forecasting
- BeHGAN: Bengali Handwritten Word Generation from Plain Text Using Generative Adversarial Networks
- Zero-Shot to Zero-Lies: Detecting Bengali Deepfake Audio through Transfer Learning
- Enabling Conversational Behavior Reasoning Capabilities in Full-Duplex Speech
- Detecting AI-Generated Paraphrases in Bengali: A Comparative Study of Zero-Shot and Fine-Tuned Transformers
- Do Latent Tokens Think? A Causal and Adversarial Analysis of Chain-of-Continuous-Thought
- CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation
- Multiconnectivity for SAGIN: Current Trends, Challenges, AI-driven Solutions, and Opportunities
- An Information Theoretic Perspective on Agentic System Design
- HELP: Hierarchical Embodied Language Planner for Household Tasks
- A Model of Causal Explanation on Neural Networks for Tabular Data
- How Do Agents Perform Code Optimization? An Empirical Study
- A-QCF-Net: An Adaptive Quaternion Cross-Fusion Network for Multimodal Liver Tumor Segmentation from Unpaired Datasets
- Inference-based GAN Video Generation
- InstructMoLE: Instruction-Guided Mixture of Low-rank Experts for Multi-Conditional Image Generation
- Five Years of SciCap: What We Learned and Future Directions for Scientific Figure Captioning
- Multi-agent Adaptive Mechanism Design
- Applications of synthetic financial data in portfolio and risk modeling
- CellMamba: Adaptive Mamba for Accurate and Efficient Cell Detection
- S&P 500 Stock's Movement Prediction using CNN
- HeartBench: Probing Core Dimensions of Anthropomorphic Intelligence in LLMs
- A Comedy of Estimators: On KL Regularization in RL Training of LLMs
- MoonBot: Modular and On-Demand Reconfigurable Robot Toward Moon Base Construction
- Balancing Accuracy and Efficiency: CNN Fusion Models for Diabetic Retinopathy Screening
- Secure and Explainable Fraud Detection in Finance via Hierarchical Multi-source Dataset Distillation
- Bridging the Copyright Gap: Do Large Vision-Language Models Recognize and Respect Copyrighted Content?
- CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics
- MASFIN: A Multi-Agent System for Decomposed Financial Reasoning and Forecasting
- Optimizing Resource Allocation for Geographically-Distributed Inference by Large Language Models
- Aerial World Model for Long-horizon Visual Generation and Navigation in 3D Space
- MMCTOP: A Multimodal Textualization and Mixture-of-Experts Framework for Clinical Trial Outcome Prediction
- Flexible Multitask Learning with Factorized Diffusion Policy
- Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model
- Unsupervised Anomaly Detection in Brain MRI via Disentangled Anatomy Learning
- LVLM-Aided Alignment of Task-Specific Vision Models
- LongFly: Long-Horizon UAV Vision-and-Language Navigation with Spatiotemporal Context Integration
- Meta-Learning-Based Handover Management in NextG O-RAN
- From Visual Perception to Deep Empathy: An Automated Assessment Framework for House-Tree-Person Drawings Using Multimodal LLMs and Multi-Agent Collaboration
- A Study of Solving Life-and-Death Problems in Go Using Relevance-Zone Based Solvers
- Three-way conflict analysis based on alliance and conflict functions
- Feasible strategies in three-way conflict analysis with three-valued ratings
- Three-way decision with incomplete information based on similarity and satisfiability
- LogicLens: Visual-Logical Co-Reasoning for Text-Centric Forgery Analysis
- Leash: Adaptive Length Penalty and Reward Shaping for Efficient Large Reasoning Model
- NEMO-4-PAYPAL: Leveraging NVIDIA's Nemo Framework for empowering PayPal's Commerce Agent
- A Medical Multimodal Diagnostic Framework Integrating Vision-Language Models and Logic Tree Reasoning
- AMS-IO-Bench and AMS-IO-Agent: Benchmarking and Structured Reasoning for Analog and Mixed-Signal Integrated Circuit Input/Output Design
- Democratizing Drug Discovery with an Orchestrated, Knowledge-Driven Multi-Agent Team for User-Guided Therapeutic Design
- Multiple-play Stochastic Bandits with Prioritized Arm Capacity Sharing
- Towards Responsible and Explainable AI Agents with Consensus-Driven Reasoning
- Compliance Rating Scheme: A Data Provenance Framework for Generative AI Datasets
- Accelerating Scientific Discovery with Autonomous Goal-evolving Agents
Research Sources: 264 | Generated: 12/29/2025
