AI RESEARCH PAPERS & ACADEMIC SOURCES
- 3D-ADAM: A Dataset for 3D Anomaly Detection in Additive Manufacturing
- DWTGS: Rethinking Frequency Regularization for Sparse-view 3D Gaussian Splatting
- Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable
- AvatarShield: Visual Reinforcement Learning for Human-Centric Synthetic Video Detection
- Do It Yourself: Learning Semantic Correspondence from Pseudo-Labels
- Image Segmentation and Classification of E-waste for Training Robots for Waste Segregation
- Gaussian Herding across Pens: An Optimal Transport Perspective on Global Gaussian Reduction for 3DGS
- WaveFormer: A Lightweight Transformer Model for sEMG-based Gesture Recognition
- Uncertainty-Aware Information Pursuit for Interpretable and Reliable Medical Image Analysis
- Exploring Image Generation via Mutually Exclusive Probability Spaces and Local Correlation Hypothesis
- JL1-CD: A New Benchmark for Remote Sensing Change Detection and a Robust Multi-Teacher Knowledge Distillation Framework
- HDM: Hybrid Diffusion Model for Unified Image Anomaly Detection
- Leveraging Large Models to Evaluate Novel Content: A Case Study on Advertisement Creativity
- Latent Beam Diffusion Models for Generating Visual Sequences
- Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images
- SpinMeRound: Consistent Multi-View Identity Generation Using Diffusion Models
- In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
- Split Matching for Inductive Zero-shot Semantic Segmentation
- InstanceBEV: Unifying Instance and BEV Representation for 3D Panoptic Segmentation
- REACT: Real-time Efficiency and Accuracy Compromise for Tradeoffs in Scene Graph Generation
- Deep Spherical Superpixels
- Your Turn: At Home Turning Angle Estimation for Parkinson's Disease Severity Assessment
- Variational Bayes Gaussian Splatting
- CrossEarth: Geospatial Vision Foundation Model for Domain Generalizable Remote Sensing Semantic Segmentation
- Superpixel Segmentation: A Long-Lasting Ill-Posed Problem
- SparseDiT: Token Sparsification for Efficient Diffusion Transformer
- Token Preference Optimization with Self-Calibrated Visual-Anchored Rewards for Hallucination Mitigation
- Injecting Explainability and Lightweight Design into Weakly Supervised Video Anomaly Detection Systems
- EventVL: Understand Event Streams via Multimodal Large Language Model
- Without Paired Labeled Data: End-to-End Self-Supervised Learning for Drone-view Geo-Localization
- COLT: Enhancing Video Large Language Models with Continual Tool Usage
- Evaluation Framework of Superpixel Methods with a Global Regularity Measure
- ZoDIAC: Zoneout Dropout Injection Attention Calculation
- Fix your downsampling ASAP! Be natively more robust via Aliasing and Spectral Artifact free Pooling
- Individualized Mapping of Aberrant Cortical Thickness via Stochastic Cortical Self-Reconstruction
- Quantum Annealing for Minimum Bisection Problem: A Machine Learning-based Approach for Penalty Parameter Tuning
- A Fast Initialization Method for Neural Network Controllers: A Case Study of Image-based Visual Servoing Control for the multicopter Interception
- LLM-based Vulnerability Discovery through the Lens of Code Metrics
- Circuit Complexity From Physical Constraints: Scaling Limitations of Attention
- VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks
- Large-Scale, Longitudinal Study of Large Language Models During the 2024 US Election Season
- Robotic Skill Diversification via Active Mutation of Reward Functions in Reinforcement Learning During a Liquid Pouring Task
- BRAID: Input-Driven Nonlinear Dynamical Modeling of Neural-Behavioral Data
- Scalable bayesian shadow tomography for quantum property estimation with set transformers
- Query-Centric Diffusion Policy for Generalizable Robotic Assembly
- Confidential LLM Inference: Performance and Cost Across CPU and GPU TEEs
- Integrating Stacked Intelligent Metasurfaces and Power Control for Dynamic Edge Inference via Over-The-Air Neural Networks
- Accurate and Efficient Prediction of Wi-Fi Link Quality Based on Machine Learning
- A Cost-Benefit Analysis of On-Premise Large Language Model Deployment: Breaking Even with Commercial LLM Services
- Energy-convergence trade off for the training of neural networks on bio-inspired hardware
- SPADE: A Large Language Model Framework for Soil Moisture Pattern Recognition and Anomaly Detection in Precision Agriculture
- Learning Progression-Guided AI Evaluation of Scientific Models To Support Diverse Multi-Modal Understanding in NGSS Classroom
- Synthesizing Attitudes, Predicting Actions (SAPA): Behavioral Theory-Guided LLMs for Ridesourcing Mode Choice Modeling
- Multimodal Health Risk Prediction System for Chronic Diseases via Vision-Language Fusion and Large Language Models
- Towards General Computer Control with Hierarchical Agents and Multi-Level Action Spaces
- G\"odel Test: Can Large Language Models Solve Easy Conjectures?
- A Validation Strategy for Deep Learning Models: Evaluating and Enhancing Robustness
- PPG-Distill: Efficient Photoplethysmography Signals Analysis via Foundation Model Distillation
- Study Design and Demystification of Physics Informed Neural Networks for Power Flow Simulation
- Stability and Generalization of Adversarial Diffusion Training
- The ICML 2023 Ranking Experiment: Examining Author Self-Assessment in ML/AI Peer Review
- EarthquakeNPP: A Benchmark for Earthquake Forecasting with Neural Point Processes
- Bayesian Multivariate Density-Density Regression
- Single-stream Policy Optimization
- Hierarchical Semi-Markov Models with Duration-Aware Dynamics for Activity Sequences
- Enhanced Survival Trees
- Clapping: Removing Per-sample Storage for Pipeline Parallel Distributed Optimization with Communication Compression
- Unveiling the Role of Learning Rate Schedules via Functional Scaling Laws
- Linear Regression under Missing or Corrupted Coordinates
- A Neural Difference-of-Entropies Estimator for Mutual Information
- Propagation of Chaos in One-hidden-layer Neural Networks beyond Logarithmic Time
- Demystifying Spectral Feature Learning for Instrumental Variable Regression
- Temporal Conformal Prediction (TCP): A Distribution-Free Statistical and Machine Learning Framework for Adaptive Risk Forecasting
- Packed-Ensembles for Efficient Uncertainty Estimation
- Sum-of-norms regularized Nonnegative Matrix Factorization
- Surrogate Modelling of Proton Dose with Monte Carlo Dropout Uncertainty Quantification
- Statistical Insight into Meta-Learning via Predictor Subspace Characterization and Quantification of Task Diversity
- End-Cut Preference in Survival Trees
- Estimating Heterogeneous Causal Effect on Networks via Orthogonal Learning
- Consistency of Selection Strategies for Fraud Detection
- Neighbor Embeddings Using Unbalanced Optimal Transport Metrics
- Recovering Wasserstein Distance Matrices from Few Measurements
- A Gradient Flow Approach to Solving Inverse Problems with Latent Diffusion Models
- Augmenting Limited and Biased RCTs through Pseudo-Sample Matching-Based Observational Data Fusion Method
- Tensor Train Completion from Fiberwise Observations Along a Single Mode
- Forest tree species classification and entropy-derived uncertainty mapping using extreme gradient boosting and Sentinel-1/2 data
- Reconstruction of Optical Coherence Tomography Images from Wavelength-space Using Deep-learning
- Human-Interpretable Uncertainty Explanations for Point Cloud Registration
- DexSkin: High-Coverage Conformable Robotic Skin for Learning Contact-Rich Manipulation
- Text Slider: Efficient and Plug-and-Play Continuous Concept Control for Image/Video Synthesis via LoRA Adapters
- Quantum Random Synthetic Skyrmion Texture Generation, a Qiskit Simulation
- One-shot Embroidery Customization via Contrastive LoRA Modulation
- Towards Robust LiDAR Localization: Deep Learning-based Uncertainty Estimation
- Category-Level Object Shape and Pose Estimation in Less Than a Millisecond
- FUNCanon: Learning Pose-Aware Action Primitives via Functional Object Canonicalization for Generalizable Robotic Manipulation
- MOIS-SAM2: Exemplar-based Segment Anything Model 2 for multilesion interactive segmentation of neurobromas in whole-body MRI
- CAR-Flow: Condition-Aware Reparameterization Aligns Source and Target for Better Flow Matching
- Semantic-Aware Particle Filter for Reliable Vineyard Robot Localisation
- Neural Network-Driven Direct CBCT-Based Dose Calculation for Head-and-Neck Proton Treatment Planning
- Does Embodiment Matter to Biomechanics and Function? A Comparative Analysis of Head-Mounted and Hand-Held Assistive Devices for Individuals with Blindness and Low Vision
- Latent Action Pretraining Through World Modeling
- Zero-Shot Visual Deepfake Detection: Can AI Predict and Prevent Fake Content Before It's Created?
- Machine learning approach to single-shot multiparameter estimation for the non-linear Schr\"odinger equation
- Differentiable Light Transport with Gaussian Surfels via Adapted Radiosity for Efficient Relighting and Geometry Reconstruction
- Dynamical Modeling of Behaviorally Relevant Spatiotemporal Patterns in Neural Imaging Data
- Efficient Breast and Ovarian Cancer Classification via ViT-Based Preprocessing and Transfer Learning
- VLN-Zero: Rapid Exploration and Cache-Enabled Neurosymbolic Vision-Language Planning for Zero-Shot Transfer in Robot Navigation
- Enabling Plant Phenotyping in Weedy Environments using Multi-Modal Imagery via Synthetic and Generated Training Data
- HyKid: An Open MRI Dataset with Expert-Annotated Multi-Structure and Choroid Plexus in Pediatric Hydrocephalus
- MsFIN: Multi-scale Feature Interaction Network for Traffic Accident Anticipation
- DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
- Lavida-O: Elastic Masked Diffusion Models for Unified Multimodal Understanding and Generation
- ConViS-Bench: Estimating Video Similarity Through Semantic Concepts
- Adversarially-Refined VQ-GAN with Dense Motion Tokenization for Spatio-Temporal Heatmaps
- Graph-Radiomic Learning (GrRAiL) Descriptor to Characterize Imaging Heterogeneity in Confounding Tumor Pathologies
- Moving by Looking: Towards Vision-Driven Avatar Motion Generation
- OverLayBench: A Benchmark for Layout-to-Image Generation with Dense Overlaps
- Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
- VolSplat: Rethinking Feed-Forward 3D Gaussian Splatting with Voxel-Aligned Prediction
- Zero-Shot Multi-Spectral Learning: Reimagining a Generalist Multimodal Gemini 2.5 Model for Remote Sensing Applications
- Investigating Traffic Accident Detection Using Multimodal Large Language Models
- Track-On2: Enhancing Online Point Tracking with Memory
- KAMERA: Enhancing Aerial Surveys of Ice-associated Seals in Arctic Environments
- NeuCODEX: Edge-Cloud Co-Inference with Spike-Driven Compression and Dynamic Early-Exit
- RoSe: Robust Self-supervised Stereo Matching under Adverse Weather Conditions
- YOLO-LAN: Precise Polyp Detection via Optimized Loss, Augmentations and Negatives
- The 1st Solution for MOSEv2 Challenge 2025: Long-term and Concept-aware Video Segmentation via SeC
- Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models
- Vision-Free Retrieval: Rethinking Multimodal Search with Textual Scene Descriptions
- Long Story Short: Disentangling Compositionality and Long-Caption Understanding in VLMs
- Audio-Driven Universal Gaussian Head Avatars
- SynapFlow: A Modular Framework Towards Large-Scale Analysis of Dendritic Spines
- No Labels Needed: Zero-Shot Image Classification with Collaborative Self-Learning
- Seeing Through Reflections: Advancing 3D Scene Reconstruction in Mirror-Containing Environments with Gaussian Splatting
- Generative data augmentation for biliary tract detection on intraoperative images
- Prompt-DAS: Annotation-Efficient Prompt Learning for Domain Adaptive Semantic Segmentation of Electron Microscopy Images
- Unveiling Chain of Step Reasoning for Vision-Language Models with Fine-grained Rewards
- Weakly Supervised Food Image Segmentation using Vision Transformers and Segment Anything Model
- A DyL-Unet framework based on dynamic learning for Temporally Consistent Echocardiographic Segmentation
- WaveletGaussian: Wavelet-domain Diffusion for Sparse-view 3D Gaussian Object Reconstruction
- 3rd Place Report of LSVOS 2025 MeViS Track: Sa2VA-i: Improving Sa2VA Results with Consistent Training and Inference
- Benchmarking Vision-Language and Multimodal Large Language Models in Zero-shot and Few-shot Scenarios: A study on Christian Iconography
- ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction
- Attack for Defense: Adversarial Agents for Point Prompt Optimization Empowering Segment Anything Model
- SmartWilds: Multimodal Wildlife Monitoring Dataset
- RS3DBench: A Comprehensive Benchmark for 3D Spatial Perception in Remote Sensing
- DeblurSplat: SfM-free 3D Gaussian Splatting with Event Camera for Robust Deblurring
- Moir\'eNet: A Compact Dual-Domain Network for Image Demoir\'eing
- Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation
- xAI-CV: An Overview of Explainable Artificial Intelligence in Computer Vision
- LiDAR Point Cloud Image-based Generation Using Denoising Diffusion Probabilistic Models
- Advancing Metallic Surface Defect Detection via Anomaly-Guided Pretraining on a Large Industrial Dataset
- Knowledge Transfer from Interaction Learning
- HyPSAM: Hybrid Prompt-driven Segment Anything Model for RGB-Thermal Salient Object Detection
- TriFusion-AE: Language-Guided Depth and LiDAR Fusion for Robust Point Cloud Processing
- FixingGS: Enhancing 3D Gaussian Splatting via Training-Free Score Distillation
- Bi-VLM: Pushing Ultra-Low Precision Post-Training Quantization Boundaries in Vision-Language Models
- DiSSECT: Structuring Transfer-Ready Medical Image Representations through Discrete Self-Supervision
- Real-time Deer Detection and Warning in Connected Vehicles via Thermal Sensing and Deep Learning
- Towards Application Aligned Synthetic Surgical Image Synthesis
- A Kernel Space-based Multidimensional Sparse Model for Dynamic PET Image Denoising
- Surgical Video Understanding with Label Interpolation
- Hyper-Bagel: A Unified Acceleration Framework for Multimodal Understanding and Generation
- Understanding-in-Generation: Reinforcing Generative Capability of Unified Model via Infusing Understanding into Generation
- Zero-shot Monocular Metric Depth for Endoscopic Images
- LEAF-Mamba: Local Emphatic and Adaptive Fusion State Space Model for RGB-D Salient Object Detection
- Lightweight Vision Transformer with Window and Spatial Attention for Food Image Classification
- OSDA: A Framework for Open-Set Discovery and Automatic Interpretation of Land-cover in Remote Sensing Imagery
- Overview of PlantCLEF 2021: cross-domain plant identification
- AGSwap: Overcoming Category Boundaries in Object Fusion via Adaptive Group Swapping
- Overview of LifeCLEF Plant Identification task 2019: diving into data deficient tropical countries
- RSVG-ZeroOV: Exploring a Training-Free Framework for Zero-Shot Open-Vocabulary Visual Grounding in Remote Sensing Images
- What Makes You Unique? Attribute Prompt Composition for Object Re-Identification
- Pre-training CLIP against Data Poisoning with Optimal Transport-based Matching and Alignment
- GeoRemover: Removing Objects and Their Causal Visual Artifacts
- SEGA: A Transferable Signed Ensemble Gaussian Black-Box Attack against No-Reference Image Quality Assessment Models
- HadaSmileNet: Hadamard fusion of handcrafted and deep-learning features for enhancing facial emotion recognition of genuine smiles
- Event-guided 3D Gaussian Splatting for Dynamic Human and Scene Reconstruction
- Live-E2T: Real-time Threat Monitoring in Video via Deduplicated Event Reasoning and Chain-of-Thought
- The Photographer Eye: Teaching Multimodal Large Language Models to See and Critique like Photographers
- Enhancing Video Object Segmentation in TrackRAD Using XMem Memory Network
- SSCM: A Spatial-Semantic Consistent Model for Multi-Contrast MRI Super-Resolution
- Training-Free Multi-Style Fusion Through Reference-Based Adaptive Modulation
- MLF-4DRCNet: Multi-Level Fusion with 4D Radar and Camera for 3D Object Detection in Autonomous Driving
- Prompt-Guided Dual Latent Steering for Inversion Problems
- Learning neuroimaging models from health system-scale data
- Improving the color accuracy of lighting estimation models
- Check Field Detection Agent (CFD-Agent) using Multimodal Large Language and Vision Language Models
- Losing the Plot: How VLM responses degrade on imperfect charts
- CPT-4DMR: Continuous sPatial-Temporal Representation for 4D-MRI Reconstruction
- An Analysis of Kalman Filter based Object Tracking Methods for Fast-Moving Tiny Objects
- MoCrop: Training Free Motion Guided Cropping for Efficient Video Action Recognition
- Codebook-Based Adaptive Feature Compression With Semantic Enhancement for Edge-Cloud Systems
- MK-UNet: Multi-kernel Lightweight CNN for Medical Image Segmentation
- BridgeSplat: Bidirectionally Coupled CT and Non-Rigid Gaussian Splatting for Deformable Intraoperative Surgical Navigation
- Source-Free Domain Adaptive Semantic Segmentation of Remote Sensing Images with Diffusion-Guided Label Enrichment
- Hyperbolic Coarse-to-Fine Few-Shot Class-Incremental Learning
- HazeFlow: Revisit Haze Physical Model as ODE and Non-Homogeneous Haze Generation for Real-World Dehazing
- TinyEcoWeedNet: Edge Efficient Real-Time Aerial Agricultural Weed Detection
- Learning Contrastive Multimodal Fusion with Improved Modality Dropout for Disease Detection and Prediction
- Rethinking Pulmonary Embolism Segmentation: A Study of Current Approaches and Challenges with an Open Weight Model
- Improving Handshape Representations for Sign Language Processing: A Graph Neural Network Approach
- Influence of Classification Task and Distribution Shift Type on OOD Detection in Fetal Ultrasound
- OrthoLoC: UAV 6-DoF Localization and Calibration Using Orthographic Geodata
- A Single Image Is All You Need: Zero-Shot Anomaly Localization Without Training Data
- Align Where the Words Look: Cross-Attention-Guided Patch Alignment with Contrastive and Transport Regularization for Bengali Captioning
- TinyBEV: Cross Modal Knowledge Distillation for Efficient Multi Task Bird's Eye View Perception and Planning
- BlurBall: Joint Ball and Motion Blur Estimation for Table Tennis Ball Tracking
- MVP: Motion Vector Propagation for Zero-Shot Video Object Detection
- Self Identity Mapping
- MAGIA: Sensing Per-Image Signals from Single-Round Averaged Gradients for Label-Inference-Free Gradient Inversion
- A Deep Learning Approach for Spatio-Temporal Forecasting of InSAR Ground Deformation in Eastern Ireland
- A Framework for Generating Artificial Datasets to Validate Absolute and Relative Position Concepts
- The Describe-Then-Generate Bottleneck: How VLM Descriptions Alter Image Generation Outcomes
- AI-Derived Structural Building Intelligence for Urban Resilience: An Application in Saint Vincent and the Grenadines
- VLA-LPAF: Lightweight Perspective-Adaptive Fusion for Vision-Language-Action to Enable More Unconstrained Robotic Manipulation
- URNet: Uncertainty-aware Refinement Network for Event-based Stereo Depth Estimation
- Visionerves: Automatic and Reproducible Hybrid AI for Peripheral Nervous System Recognition Applied to Endometriosis Cases
- V-SenseDrive: A Privacy-Preserving Road Video and In-Vehicle Sensor Fusion Framework for Road Safety & Driver Behaviour Modelling
- Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
- Meta-Semantics Augmented Few-Shot Relational Learning
- Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders
- MiCRo: Mixture Modeling and Context-aware Routing for Personalized Preference Learning
- Athena: Enhancing Multimodal Reasoning with Data-efficient Process Reward Models
- When and How Long Did Therapy Happen? Soft-Supervising Temporal Localization Using Audio-Language Models
- RAG+: Enhancing Retrieval-Augmented Generation with Application-Aware Reasoning
- LogicGuard: Improving Embodied LLM agents through Temporal Logic based Critics
- PolypSeg-GradCAM: Towards Explainable Computer-Aided Gastrointestinal Disease Detection Using U-Net Based Segmentation and Grad-CAM Visualization on the Kvasir Dataset
- PerceptronCARE: A Deep Learning-Based Intelligent Teleopthalmology Application for Diabetic Retinopathy Diagnosis
- Improving Image Captioning Descriptiveness by Ranking and LLM-based Fusion
- Is Pre-training Truly Better Than Meta-Learning?
- MediSyn: A Generalist Text-Guided Latent Diffusion Model For Diverse Medical Image Synthesis
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models
- EMMA: End-to-End Multimodal Model for Autonomous Driving
- Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
- Fine-Tuning is Subgraph Search: A New Lens on Learning Dynamics
- DeepResonance: Enhancing Multimodal Music Understanding via Music-centric Multi-way Instruction Tuning
- Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer
- A Survey on Sparse Autoencoders: Interpreting the Internal Mechanisms of Large Language Models
- ReSearch: Learning to Reason with Search for LLMs via Reinforcement Learning
- Automating Steering for Safe Multimodal Large Language Models
- Benchmarking Critical Questions Generation: A Challenging Reasoning Task for Large Language Models
- JOLT-SQL: Joint Loss Tuning of Text-to-SQL with Confusion-aware Noisy Schema Sampling
- Memorization or Reasoning? Exploring the Idiom Understanding of LLMs
- Large Language Models Implicitly Learn to See and Hear Just By Reading
- Large Language Models Do Multi-Label Classification Differently
- Unraveling Misinformation Propagation in LLM Reasoning
- LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference
- Please Translate Again: Two Simple Experiments on Whether Human-Like Reasoning Helps Translation
- Combining Constrained and Unconstrained Decoding via Boosting: BoostCD and Its Application to Information Extraction
- A suite of allotaxonometric tools for the comparison of complex systems using rank-turbulence divergence
- Think in Safety: Unveiling and Mitigating Safety Alignment Collapse in Multimodal Large Reasoning Model
- Language Models Can Predict Their Own Behavior
- Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests
- LightThinker: Thinking Step-by-Step Compression
- Can LLMs Explain Themselves Counterfactually?
- Anything Goes? A Crosslinguistic Study of (Im)possible Language Learning in LMs
- Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
- CODI: Compressing Chain-of-Thought into Continuous Space via Self-Distillation
- CaKE: Circuit-aware Editing Enables Generalizable Knowledge Learners
- LookAhead Tuning: Safer Language Models via Partial Answer Previews
- Pandora: A Code-Driven Large Language Model Agent for Unified Reasoning Across Diverse Structured Knowledge
- Citrus-V: Advancing Medical Foundation Models with Unified Medical Image Grounding for Clinical Reasoning
- Finding My Voice: Generative Reconstruction of Disordered Speech for Automated Clinical Evaluation
- Cross-Cultural Transfer of Commonsense Reasoning in LLMs: Evidence from the Arab World
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via a Hybrid Architecture
- Enhancing Unsupervised Sentence Embeddings via Knowledge-Driven Data Augmentation and Gaussian-Decayed Contrastive Learning
- Post-hoc Study of Climate Microtargeting on Social Media Ads with LLMs: Thematic Insights and Fairness Evaluation
- Exploring Model Kinship for Merging Large Language Models
- Language Models as Causal Effect Generators
- Compositional Phoneme Approximation for L1-Grounded L2 Pronunciation Training
- Improving Low-Resource Sequence Labeling with Knowledge Fusion and Contextual Label Explanations
- VLDBench Evaluating Multimodal Disinformation with Regulatory Alignment
- The Illusion of Readiness: Stress Testing Large Frontier Models on Multimodal Medical Benchmarks
- Memory-QA: Answering Recall Questions Based on Multimodal Memories
- No Verifiable Reward for Prosody: Toward Preference-Guided Prosody Learning in TTS
- HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
- Teaching Audio Models to Reason: A Unified Framework for Source- and Layer-wise Distillation
- OraPO: Oracle-educated Reinforcement Learning for Data-efficient and Factual Radiology Report Generation
- Agentic AutoSurvey: Let LLMs Survey LLMs
- Pay More Attention To Audio: Mitigating Imbalance of Cross-Modal Attention in Large Audio Language Models
- Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
- VIR-Bench: Evaluating Geospatial and Temporal Understanding of MLLMs via Travel Video Itinerary Reconstruction
- ColorBlindnessEval: Can Vision-Language Models Pass Color Blindness Tests?
- Measuring AI "Slop" in Text
- Soft Tokens, Hard Truths
- Online Process Reward Leanring for Agentic Reinforcement Learning
- Steering Multimodal Large Language Models Decoding for Context-Aware Safety
- Systematic Comparative Analysis of Large Pretrained Language Models on Contextualized Medication Event Extraction
- CompLLM: Compression for Long Context Q&A
- Reinforcement Learning on Pre-Training Data
- Extracting Conceptual Spaces from LLMs Using Prototype Embeddings
- SloPalSpeech: A 2,8000-Hour Slovak Speech Corpus from Parliamentary Data
- WolBanking77: Wolof Banking Speech Intent Classification Dataset
- DRISHTIKON: A Multimodal Multilingual Benchmark for Testing Language Models' Understanding on Indian Culture
- Baseer: A Vision-Language Model for Arabic Document-to-Markdown OCR
- Multi-Hierarchical Feature Detection for Large Language Model Generated Text
- Diversity Boosts AI-Generated Text Detection
- Extractive Fact Decomposition for Interpretable Natural Language Inference in one Forward Pass
- DTW-Align: Bridging the Modality Gap in End-to-End Speech Translation with Dynamic Time Warping Alignment
- Investigating Test-Time Scaling with Reranking for Machine Translation
- Charting a Decade of Computational Linguistics in Italy: The CLiC-it Corpus
- Pathways of Thoughts: Multi-Directional Thinking for Long-form Personalized Question Answering
- Are most sentences unique? An empirical examination of Chomskyan claims
- Human-Annotated NER Dataset for the Kyrgyz Language
- Context-Aware Hierarchical Taxonomy Generation for Scientific Papers via LLM-Guided Multi-Aspect Clustering
- Anecdoctoring: Automated Red-Teaming Across Language and Place
- Consistency-Aware Parameter-Preserving Knowledge Editing Framework for Multi-Hop Question Answering
- Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction
- MemOrb: A Plug-and-Play Verbal-Reinforcement Memory Layer for E-Commerce Customer Service
- LOTUSDIS: A Thai far-field meeting corpus for robust conversational ASR
- Global-Recent Semantic Reasoning on Dynamic Text-Attributed Graphs with Large Language Models
- False Friends Are Not Foes: Investigating Vocabulary Overlap in Multilingual Language Models
- When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
- Financial Risk Relation Identification through Dual-view Adaptation
- AECBench: A Hierarchical Benchmark for Knowledge Evaluation of Large Language Models in the AEC Field
- Beyond the Leaderboard: Understanding Performance Disparities in Large Language Models via Model Diffing
- MAPEX: A Multi-Agent Pipeline for Keyphrase Extraction
- Are Smaller Open-Weight LLMs Closing the Gap to Proprietary Models for Biomedical Question Answering?
- Developing an AI framework to automatically detect shared decision-making in patient-doctor conversations
- CogniLoad: A Synthetic Natural Language Reasoning Benchmark With Tunable Length, Intrinsic Difficulty, and Distractor Density
- LAWCAT: Efficient Distillation from Quadratic to Linear Attention with Convolution across Tokens for Long Context Modeling
- Actions Speak Louder than Prompts: A Large-Scale Study of LLMs for Graph Inference
- A Rhythm-Aware Phrase Insertion for Classical Arabic Poetry Composition
- Trace Is In Sentences: Unbiased Lightweight ChatGPT-Generated Text Detector
- CCQA: Generating Question from Solution Can Improve Inference-Time Reasoning in SLMs
- Prior-based Noisy Text Data Filtering: Fast and Strong Alternative For Perplexity
- TsqLoRA: Towards Sensitivity and Quality Low-Rank Adaptation for Efficient Fine-Tuning
- UniECG: Understanding and Generating ECG in One Unified Model
- A Good Plan is Hard to Find: Aligning Models with Preferences is Misaligned with What Helps Users
- Thinking in a Crowd: How Auxiliary Information Shapes LLM Reasoning
- SIRAG: Towards Stable and Interpretable RAG with A Process-Supervised Multi-Agent Framework
- ERFC: Happy Customers with Emotion Recognition and Forecasting in Conversation in Call Centers
- Evaluating Large Language Models for Detecting Antisemitism
- Exploiting Tree Structure for Credit Assignment in RL Training of LLMs
- Brittleness and Promise: Knowledge Graph Based Reward Modeling for Diagnostic Reasoning
- Speculate Deep and Accurate: Lossless and Training-Free Acceleration for Offloaded LLMs via Substitute Speculative Decoding
- Speech Vecalign: an Embedding-based Method for Aligning Parallel Speech Documents
- Interactive Real-Time Speaker Diarization Correction with Human Feedback
- NormGenesis: Multicultural Dialogue Generation via Exemplar-Guided Social Norm Modeling and Violation Recovery
- Evaluating the Creativity of LLMs in Persian Literary Text Generation
- Towards Practical Multi-label Causal Discovery in High-Dimensional Event Sequences via One-Shot Graph Aggregation
- FedFiTS: Fitness-Selected, Slotted Client Scheduling for Trustworthy Federated Learning in Healthcare AI
- Analysis on distribution and clustering of weight
- PipelineRL: Faster On-policy Reinforcement Learning for Long Sequence Generatio
- GSTM-HMU: Generative Spatio-Temporal Modeling for Human Mobility Understanding
- Efficient Reinforcement Learning by Reducing Forgetting with Elephant Activation Functions
- Dynamic Prompt Fusion for Multi-Task and Cross-Domain Adaptation in LLMs
- GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models
- Event Causality Identification with Synthetic Control
- ZERA: Zero-init Instruction Evolving Refinement Agent - From Zero Instructions to Structured Prompts via Principle-based Optimization
- Theoretical Foundations of Representation Learning using Unlabeled Data: Statistics and Optimization
- Fully Learnable Neural Reward Machines
- OmniBridge: Unified Multimodal Understanding, Generation, and Retrieval via Latent Space Alignment
- Improving Credit Card Fraud Detection through Transformer-Enhanced GAN Oversampling
- Latent Danger Zone: Distilling Unified Attention for Cross-Architecture Black-box Attacks
- Beyond Backpropagation: Exploring Innovative Algorithms for Energy-Efficient Deep Neural Network Training
- Diffusion Bridge Variational Inference for Deep Gaussian Processes
- Graph Neural Networks with Similarity-Navigated Probabilistic Feature Copying
- Asymptotically Optimal Problem-Dependent Bandit Policies for Transfer Learning
- Algorithms for Adversarially Robust Deep Learning
- DRO-REBEL: Distributionally Robust Relative-Reward Regression for Fast and Efficient LLM Alignment
- Shared-Weights Extender and Gradient Voting for Neural Network Expansion
- NGRPO: Negative-enhanced Group Relative Policy Optimization
- Exploring Heterophily in Graph-level Tasks
- Enhancing the Effectiveness and Durability of Backdoor Attacks in Federated Learning through Maximizing Task Distinction
- Tackling GNARLy Problems: Graph Neural Algorithmic Reasoning Reimagined through Reinforcement Learning
- Towards Privacy-Aware Bayesian Networks: A Credal Approach
- Lift What You Can: Green Online Learning with Heterogeneous Ensembles
- Central Limit Theorems for Asynchronous Averaged Q-Learning
- Otters: An Energy-Efficient SpikingTransformer via Optical Time-to-First-Spike Encoding
- Learning From Simulators: A Theory of Simulation-Grounded Learning
- CR-Net: Scaling Parameter-Efficient Training with Cross-Layer Low-Rank Structure
- Flow marching for a generative PDE foundation model
- HyperAdapt: Simple High-Rank Adaptation
- Subspace Clustering of Subspaces: Unifying Canonical Correlation Analysis and Subspace Clustering
- Towards Rational Pesticide Design with Graph Machine Learning Models for Ecotoxicology
- A Generalized Bisimulation Metric of State Similarity between Markov Decision Processes: From Theoretical Propositions to Applications
- LLM-Enhanced Self-Evolving Reinforcement Learning for Multi-Step E-Commerce Payment Fraud Risk Detection
- Theory of periodic convolutional neural network
- MOMEMTO: Patch-based Memory Gate Model in Time Series Foundation Model
- Diagonal Linear Networks and the Lasso Regularization Path
- Probabilistic Machine Learning for Uncertainty-Aware Diagnosis of Industrial Systems
- Training-Free Data Assimilation with GenCast
- Graph-based Clustering Revisited: A Relaxation of Kernel $k$-Means Perspective
- SimpleFold: Folding Proteins is Simpler than You Think
- Physics-informed time series analysis with Kolmogorov-Arnold Networks under Ehrenfest constraints
- Hybrid Data can Enhance the Utility of Synthetic Data for Training Anti-Money Laundering Models
- APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation
- Reverse-Complement Consistency for DNA Language Models
- Symphony-MoE: Harmonizing Disparate Pre-trained Models into a Coherent Mixture-of-Experts
- Global Minimizers of Sigmoid Contrastive Loss
- Explainable Graph Neural Networks: Understanding Brain Connectivity and Biomarkers in Dementia
- Interaction Topological Transformer for Multiscale Learning in Porous Materials
- DS-Diffusion: Data Style-Guided Diffusion Model for Time-Series Generation
- Reflect before Act: Proactive Error Correction in Language Models
- Graph Enhanced Trajectory Anomaly Detection
- Towards Provable Emergence of In-Context Reinforcement Learning
- Development of Deep Learning Optimizers: Approaches, Concepts, and Update Rules
- Explicit Path CGR: Maintaining Sequence Fidelity in Geometric Representations
- Diffusion Policies with Offline and Inverse Reinforcement Learning for Promoting Physical Activity in Older Adults Using Wearable Sensors
- MeshODENet: A Graph-Informed Neural Ordinary Differential Equation Neural Network for Simulating Mesh-Based Physical Systems
- Fast Linear Solvers via AI-Tuned Markov Chain Monte Carlo-based Matrix Inversion
- GluMind: Multimodal Parallel Attention and Knowledge Retention for Robust Cross-Population Blood Glucose Forecasting
- Probabilistic Geometric Principal Component Analysis with application to neural data
- Discrete-time diffusion-like models for speech synthesis
- Individualized non-uniform quantization for vector search
- DSFT: Inspiring Diffusion Large Language Models to Comprehend Mathematical and Logical Patterns
- MobiGPT: A Foundation Model for Mobile Wireless Networks
- PiMoE: Token-Level Routing for Integrating High-Precision Computation and Reasoning
- FedIA: A Plug-and-Play Importance-Aware Gradient Pruning Aggregation Method for Domain-Robust Federated Graph Learning on Node Classification
- SBVR: Summation of BitVector Representation for Efficient LLM Quantization
- TurnBack: A Geospatial Route Cognition Benchmark for Large Language Models through Reverse Route
- Conversational Orientation Reasoning: Egocentric-to-Allocentric Navigation with Multimodal Chain-of-Thought
- Variational Task Vector Composition
- MolPILE - large-scale, diverse dataset for molecular representation learning
- FastMTP: Accelerating LLM Inference with Enhanced Multi-Token Prediction
- Multi-Worker Selection based Distributed Swarm Learning for Edge IoT with Non-i.i.d. Data
- GnnXemplar: Exemplars to Explanations - Natural Language Rules for Global GNN Interpretability
- KM-GPT: An Automated Pipeline for Reconstructing Individual Patient Data from Kaplan-Meier Plots
- AdaSTI: Conditional Diffusion Models with Adaptive Dependency Modeling for Spatio-Temporal Imputation
- Early Prediction of Multi-Label Care Escalation Triggers in the Intensive Care Unit Using Electronic Health Records
- ConceptFlow: Hierarchical and Fine-grained Concept-Based Explanation for Convolutional Neural Networks
- Sparse Training Scheme for Multimodal LLM
- HyperNAS: Enhancing Architecture Representation for NAS Predictor via Hypernetwork
- WLFM: A Well-Logs Foundation Model for Multi-Task and Cross-Well Geological Interpretation
- A deep reinforcement learning platform for antibiotic discovery
- MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
- Developing Training Procedures for Piecewise-linear Spline Activation Functions in Neural Networks
- A Simple and Reproducible Hybrid Solver for a Truck-Drone VRP with Recharge
- Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework
- Accounting for Uncertainty in Machine Learning Surrogates: A Gauss-Hermite Quadrature Approach to Reliability Analysis
- Research on Metro Transportation Flow Prediction Based on the STL-GRU Combined Model
- Two ways to knowledge?
- Self-Evolving LLMs via Continual Instruction Tuning
- A Weighted Gradient Tracking Privacy-Preserving Method for Distributed Optimization
- SDGF: Fusing Static and Multi-Scale Dynamic Correlations for Multivariate Time Series Forecasting
- From Parameters to Performance: A Data-Driven Study on LLM Structure and Development
- LoRALib: A Standardized Benchmark for Evaluating LoRA-MoE Methods
- Rank-Induced PL Mirror Descent: A Rank-Faithful Second-Order Algorithm for Sleeping Experts
- Comparative Analysis of FOLD-SE vs. FOLD-R++ in Binary Classification and XGBoost in Multi-Category Classification
- A Machine Learning Framework for Pathway-Driven Therapeutic Target Discovery in Metabolic Disorders
- A Study of Skews, Imbalances, and Pathological Conditions in LLM Inference Deployment on GPU Clusters detectable from DPU
- Towards Scalable and Structured Spatiotemporal Forecasting
- Amortized Latent Steering: Low-Cost Alternative to Test-Time Optimization
- Robust and continuous machine learning of usage habits to adapt digital interfaces to user needs
- Decentor-V: Lightweight ML Training on Low-Power RISC-V Edge Devices
- MobileRL: Online Agentic Reinforcement Learning for Mobile GUI Agents
- A Coopetitive-Compatible Data Generation Framework for Cross-silo Federated Learning
- Prediction of Coffee Ratings Based On Influential Attributes Using SelectKBest and Optimal Hyperparameters
- NurseSchedRL: Attention-Guided Reinforcement Learning for Nurse-Patient Assignment
- Anomaly Detection in Electric Vehicle Charging Stations Using Federated Learning
- Machine Learnability as a Measure of Order in Aperiodic Sequences
- Data Valuation and Selection in a Federated Model Marketplace
- BULL-ODE: Bullwhip Learning with Neural ODEs and Universal Differential Equations under Stochastic Demand
- Model-Based Transfer Learning for Real-Time Damage Assessment of Bridge Networks
- AdaMixT: Adaptive Weighted Mixture of Multi-Scale Expert Transformers for Time Series Forecasting
- Solve it with EASE
- Machine Learning-Based Classification of Vessel Types in Straits Using AIS Tracks
- Localized PCA-Net Neural Operators for Scalable Solution Reconstruction of Elliptic PDEs
- Prompt Optimization Meets Subspace Representation Learning for Few-shot Out-of-Distribution Detection
- Large language models surpass domain-specific architectures for antepartum electronic fetal monitoring analysis
Research Sources: 462 | Generated: 9/24/2025