AI RESEARCH PAPERS & ACADEMIC SOURCES
- InternSVG: Towards Unified SVG Tasks with Multimodal Large Language Models
- MMAP: A Multi-Magnification and Prototype-Aware Architecture for Predicting Spatial Gene Expression
- Reasoning as Representation: Rethinking Visual Reinforcement Learning in Image Quality Assessment
- MaterialRefGS: Reflective Gaussian Splatting with Multi-view Consistent Material Inference
- Robust Ego-Exo Correspondence with Long-Term Memory
- Enhancing Maritime Domain Awareness on Inland Waterways: A YOLO-Based Fusion of Satellite and AIS for Vessel Characterization
- Coupled Degradation Modeling and Fusion: A VLM-Guided Degradation-Coupled Network for Degradation-Aware Infrared and Visible Image Fusion
- VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment
- Towards Fast and Scalable Normal Integration using Continuous Components
- Situat3DChange: Situated 3D Change Understanding Dataset for Multimodal Large Language Model
- mmWalk: Towards Multi-modal Multi-view Walking Assistance
- Massive Activations are the Key to Local Detail Synthesis in Diffusion Transformers
- ODI-Bench: Can MLLMs Understand Immersive Omnidirectional Environments?
- How many samples to label for an application given a foundation model? Chest X-ray classification study
- SNAP: Towards Segmenting Anything in Any Point Cloud
- A Framework for Low-Effort Training Data Generation for Urban Semantic Segmentation
- Benchmarking foundation models for hyperspectral image classification: Application to cereal crop type mapping
- MS-Mix: Unveiling the Power of Mixup for Multimodal Sentiment Analysis
- ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
- ExpVid: A Benchmark for Experiment Video Understanding & Reasoning
- High-resolution Photo Enhancement in Real-time: A Laplacian Pyramid Network
- IVEBench: Modern Benchmark Suite for Instruction-Guided Video Editing Assessment
- PhySIC: Physically Plausible 3D Human-Scene Interaction and Contact from a Single Image
- InfiniHuman: Infinite 3D Human Creation with Precise Control
- Beyond 'Templates': Category-Agnostic Object Pose, Size, and Shape Estimation from a Single View
- Diffusion Transformers with Representation Autoencoders
- Bayesian Topological Convolutional Neural Nets
- DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training
- Point Prompting: Counterfactual Tracking with Video Diffusion Models
- Ev4DGS: Novel-view Rendering of Non-Rigid Objects from Monocular Event Streams
- Semantic-Cohesive Knowledge Distillation for Deep Cross-modal Hashing
- Reliable Active Learning from Unreliable Labels via Neural Collapse Geometry
- Causality $\neq$ Decodability, and Vice Versa: Lessons from Interpreting Counting ViTs
- Cross-Sensor Touch Generation
- Decomposer Networks: Deep Component Analysis and Synthesis
- MTMD: A Multi-Task Multi-Domain Framework for Unified Ad Lightweight Ranking at Pinterest
- Generative Latent Video Compression
- CLoD-GS: Continuous Level-of-Detail via 3D Gaussian Splatting
- SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
- Enabling High-Quality In-the-Wild Imaging from Severely Aberrated Metalens Bursts
- INR-Bench: A Unified Benchmark for Implicit Neural Representations in Multi-Domain Regression and Reconstruction
- Towards Efficient 3D Gaussian Human Avatar Compression: A Prior-Guided Framework
- SuperEx: Enhancing Indoor Mapping and Exploration using Non-Line-of-Sight Perception
- SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams
- UltraScatter: Ray-Based Simulation of Ultrasound Scattering
- ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios
- JND-Guided Light-Weight Neural Pre-Filter for Perceptual Image Coding
- VLM-Guided Adaptive Negative Prompting for Creative Generation
- Comparative Evaluation of Neural Network Architectures for Generalizable Human Spatial Preference Prediction in Unseen Built Environments
- On the Optimal Representation Efficiency of Barlow Twins: An Information-Geometric Interpretation
- The Easy Path to Robustness: Coreset Selection using Sample Hardness
- Lightweight Facial Landmark Detection in Thermal Images via Multi-Level Cross-Modal Knowledge Transfer
- SCOOP'D: Learning Mixed-Liquid-Solid Scooping via Sim2Real Generative Policy
- Invariant Feature Learning for Generalized Long-Tailed Classification
- Class Is Invariant to Context and Vice Versa: On Learning Invariance for Out-Of-Distribution Generalization
- Information Topology
- Hyper-STTN: Hypergraph Augmented Spatial-Temporal Transformer Network for Trajectory Prediction
- MarkPlugger: Generalizable Watermark Framework for Latent Diffusion Models without Retraining
- Improving Hierarchical Representations of Vectorized HD Maps with Perspective Clues
- UniRGB-IR: A Unified Framework for Visible-Infrared Semantic Tasks via Adapter Tuning
- Streamlining Image Editing with Layered Diffusion Brushes
- RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations
- A Unified Approach Towards Active Learning and Out-of-Distribution Detection
- SMC++: Masked Learning of Unsupervised Video Semantic Compression
- Contrastive Local Manifold Learning for No-Reference Image Quality Assessment
- Open Vocabulary Multi-Label Video Classification
- LiDAR-GS:Real-time LiDAR Re-Simulation using Gaussian Splatting
- Tokenizing Motion: A Generative Approach for Scene Dynamics Compression
- OVS Meets Continual Learning: Towards Sustainable Open-Vocabulary Segmentation
- Baking Gaussian Splatting into Diffusion Denoiser for Fast and Scalable Single-stage Image-to-3D Generation and Reconstruction
- Multimodal Alignment and Fusion: A Survey
- Learning Visual Hierarchies in Hyperbolic Space for Image Retrieval
- Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
- Real-Time Position-Aware View Synthesis from Single-View Input
- CULTURE3D: A Large-Scale and Diverse Dataset of Cultural Landmarks and Terrains for Gaussian-Based Scene Rendering
- Concept Steerers: Leveraging K-Sparse Autoencoders for Test-Time Controllable Generations
- Generating Multi-Image Synthetic Data for Text-to-Image Customization
- FCVSR: A Frequency-aware Method for Compressed Video Super-Resolution
- MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
- OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
- Blind Video Super-Resolution based on Implicit Kernels
- Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
- A Comprehensive Survey on Knowledge Distillation
- Free-Lunch Color-Texture Disentanglement for Stylized Image Generation
- Surface-Aware Distilled 3D Semantic Features
- SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance
- VideoAds for Fast-Paced Video Understanding
- DDFusion:Degradation-Decoupled Fusion Framework for Robust Infrared and Visible Images Fusion
- LSP-ST: Ladder Shape-Biased Side-Tuning for Robust Infrared Small Target Detection
- Motion-Enhanced Nonlocal Similarity Implicit Neural Representation for Infrared Dim and Small Target Detection
- xTrace: A Facial Expressive Behaviour Analysis Tool for Continuous Affect Recognition
- CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models
- InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition
- VORTA: Efficient Video Diffusion via Routing Sparse Attention
- SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams
- Learning Shared Representations from Unpaired Data
- Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
- SatDreamer360: Multiview-Consistent Generation of Ground-Level Scenes from Satellite Imagery
- Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
- Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
- A PDE-Based Image Dehazing Method via Atmospheric Scattering Theory
- OmniVCus: Feedforward Subject-driven Video Customization with Multimodal Control Conditions
- LH2Face: Loss function for Hard High-quality Face
- DeRIS: Decoupling Perception and Cognition for Enhanced Referring Image Segmentation through Loopback Synergy
- SenseShift6D: Multimodal RGB-D Benchmarking for Robust 6D Pose Estimation across Environment and Sensor Variations
- Multi-Scale Attention and Gated Shifting for Fine-Grained Event Spotting in Videos
- STAR: A Benchmark for Astronomical Star Fields Super-Resolution
- IONext: Unlocking the Next Era of Inertial Odometry
- Rethinking Medical Anomaly Detection in Brain MRI: An Image Quality Assessment Perspective
- Cell as Point: One-Stage Framework for Efficient Cell Tracking
- LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement
- Editable-DeepSC: Reliable Cross-Modal Semantic Communications for Facial Editing
- Demand Estimation with Text and Image Data
- MedVKAN: Efficient Feature Extraction with Mamba and KAN for Medical Image Segmentation
- OSCAR: One-Step Diffusion Codec Across Multiple Bit-rates
- Large-Area Fabrication-Aware Computational Diffractive Optics
- Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
- TC-GS: A Faster Gaussian Splatting Module Utilizing Tensor Cores
- The Illusion of Progress? A Critical Look at Test-Time Adaptation for Vision-Language Models
- Multi Camera Connected Vision System with Multi View Analytics: A Comprehensive Survey
- Constructive Distortion: Improving MLLMs with Attention-Guided Image Warping
- Post Processing of image segmentation using Conditional Random Fields
- Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection
- Cell Instance Segmentation: The Devil Is in the Boundaries
- Cluster-Aware Prompt Ensemble Learning for Few-Shot Vision-Language Model Adaptation
- Fast Self-Supervised depth and mask aware Association for Multi-Object Tracking
- Geometry-Aware Scene Configurations for Novel View Synthesis
- LTGS: Long-Term Gaussian Scene Chronology From Sparse View Updates
- An uncertainty-aware framework for data-efficient multi-view animal pose estimation
- HeadsUp! High-Fidelity Portrait Image Super-Resolution
- Semi-disentangled spatiotemporal implicit neural representations of longitudinal neuroimaging data for trajectory classification
- A Multi-Strategy Framework for Enhancing Shatian Pomelo Detection in Real-World Orchards
- J-RAS: Enhancing Medical Image Segmentation via Retrieval-Augmented Joint Training
- Scaling Traffic Insights with AI and Language Model-Powered Camera Systems for Data-Driven Transportation Decision Making
- FlareX: A Physics-Informed Dataset for Lens Flare Removal via 2D Synthesis and 3D Rendering
- BurstDeflicker: A Benchmark Dataset for Flicker Removal in Dynamic Scenes
- MIMO: A medical vision language model with visual referring multimodal input and pixel grounding multimodal output
- Q-Adapter: Visual Query Adapter for Extracting Textually-related Features in Video Captioning
- P-4DGS: Predictive 4D Gaussian Splatting with 90$\times$ Compression
- Complementary and Contrastive Learning for Audio-Visual Segmentation
- DREAM: A Benchmark Study for Deepfake REalism AssessMent
- Collaborative Learning of Semantic-Aware Feature Learning and Label Recovery for Multi-Label Image Recognition with Incomplete Labels
- Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning
- Tracking the Spatiotemporal Evolution of Landslide Scars Using a Vision Foundation Model: A Novel and Universal Framework
- Gesplat: Robust Pose-Free 3D Reconstruction via Geometry-Guided Gaussian Splatting
- Cooperative Pseudo Labeling for Unsupervised Federated Classification
- Answer-Consistent Chain-of-thought Reinforcement Learning For Multi-modal Large Langauge Models
- ImmerIris: A Large-Scale Dataset and Benchmark for Immersive Iris Recognition in Open Scenes
- Multi Class Parkinsons Disease Detection Based on Finger Tapping Using Attention-Enhanced CNN BiLSTM
- YOLOv11-Litchi: Efficient Litchi Fruit Detection based on UAV-Captured Agricultural Imagery in Complex Orchard Environments
- Color3D: Controllable and Consistent 3D Colorization with Personalized Colorizer
- Stroke Locus Net: Occluded Vessel Localization from MRI Modalities
- ReMix: Towards a Unified View of Consistent Character Generation and Editing
- SparseUWSeg: Active Sparse Point-Label Augmentation for Underwater Semantic Segmentation
- ViConEx-Med: Visual Concept Explainability via Multi-Concept Token Transformer for Medical Image Analysis
- TCMA: Text-Conditioned Multi-granularity Alignment for Drone Cross-Modal Text-Video Retrieval
- Fairness Without Labels: Pseudo-Balancing for Bias Mitigation in Face Gender Classification
- B2N3D: Progressive Learning from Binary to N-ary Relationships for 3D Object Grounding
- From Generic to Specialized: A Subspecialty Diagnostic System Powered by Self-Supervised Learning for Cervical Histopathology
- A Style-Based Metric for Quantifying the Synthetic-to-Real Gap in Autonomous Driving Image Datasets
- Semantic Visual Anomaly Detection and Reasoning in AI-Generated Images
- Are Video Models Emerging as Zero-Shot Learners and Reasoners in Medical Imaging?
- Opacity-Gradient Driven Density Control for Compact and Efficient Few-Shot 3D Gaussian Splatting
- VividAnimator: An End-to-End Audio and Pose-driven Half-Body Human Animation Framework
- Bridging Perspectives: Foundation Model Guided BEV Maps for 3D Object Detection and Tracking
- SAM2LoRA: Composite Loss-Guided, Parameter-Efficient Finetuning of SAM2 for Retinal Fundus Segmentation
- Ordinal Scale Traffic Congestion Classification with Multi-Modal Vision-Language and Motion Analysis
- PointMAC: Meta-Learned Adaptation for Robust Test-Time Point Cloud Completion
- Vision4PPG: Emergent PPG Analysis Capability of Vision Foundation Models for Vital Signs like Blood Pressure
- Self-Supervised Multi-Scale Transformer with Attention-Guided Fusion for Efficient Crack Detection
- AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
- Guided Image Feature Matching using Feature Spatial Order
- Towards Cybersickness Severity Classification from VR Gameplay Videos Using Transfer Learning and Temporal Modeling
- MonoSE(3)-Diffusion: A Monocular SE(3) Diffusion Framework for Robust Camera-to-Robot Pose Estimation
- On the Problem of Consistent Anomalies in Zero-Shot Industrial Anomaly Detection
- Post-TIPS Prediction via Multimodal Interaction: A Multi-Center Dataset and Framework for Survival, Complication, and Portal Pressure Assessment
- When Images Speak Louder: Mitigating Language Bias-induced Hallucinations in VLMs through Cross-Modal Guidance
- DAGLFNet:Deep Attention-Guided Global-Local Feature Fusion for Pseudo-Image Point Cloud Segmentation
- MSF-Mamba: Motion-aware State Fusion Mamba for Efficient Micro-Gesture Recognition
- Head-wise Adaptive Rotary Positional Encoding for Fine-Grained Image Generation
- Jigsaw3D: Disentangled 3D Style Transfer via Patch Shuffling and Masking
- VR-Thinker: Boosting Video Reward Models through Thinking-with-Image Reasoning
- Receptive Field Expanded Look-Up Tables for Vision Inference: Advancing from Low-level to High-level Tasks
- Unified Open-World Segmentation with Multi-Modal Prompts
- Layout-Independent License Plate Recognition via Integrated Vision and Language Models
- MCE: Towards a General Framework for Handling Missing Modalities under Imbalanced Missing Rates
- MRS-YOLO Railroad Transmission Line Foreign Object Detection Based on Improved YOLO11 and Channel Pruning
- Deep semi-supervised approach based on consistency regularization and similarity learning for weeds classification
- UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation
- Injecting Frame-Event Complementary Fusion into Diffusion for Optical Flow in Challenging Scenes
- Equipping Vision Foundation Model with Mixture of Experts for Out-of-Distribution Detection
- A Simple and Better Baseline for Visual Grounding
- ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
- OmniQuality-R: Advancing Reward Models Through All-Encompassing Quality Assessment
- GraphTARIF: Linear Graph Transformer with Augmented Rank and Improved Focus
- A Machine Learning Perspective on Automated Driving Corner Cases
- Stability Under Scrutiny: Benchmarking Representation Paradigms for Online HD Mapping
- AdaViewPlanner: Adapting Video Diffusion Models for Viewpoint Planning in 4D Scenes
- MSM-Seg: A Modality-and-Slice Memory Framework with Category-Agnostic Prompting for Multi-Modal Brain Tumor Segmentation
- Action-Dynamics Modeling and Cross-Temporal Interaction for Online Action Understanding
- Dynamic Gaussian Splatting from Defocused and Motion-blurred Monocular Videos
- WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting
- Seeing My Future: Predicting Situated Interaction Behavior in Virtual Reality
- Uncovering Anomalous Events for Marine Environmental Monitoring via Visual Anomaly Detection
- Restricted Receptive Fields for Face Verification
- EGD-YOLO: A Lightweight Multimodal Framework for Robust Drone-Bird Discrimination via Ghost-Enhanced YOLOv8n and EMA Attention under Adverse Condition
- Structured Spectral Graph Learning for Multi-label Abnormality Classification in 3D Chest CT Scans
- ImHead: A Large-scale Implicit Morphable Model for Localized Head Modeling
- Full segmentation annotations of 3D time-lapse microscopy images of MDA231 cells
- FastHMR: Accelerating Human Mesh Recovery via Token and Layer Merging with Diffusion Decoding
- rareboost3d: a synthetic lidar dataset with enhanced rare classes
- Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales
- SceneTextStylizer: A Training-Free Scene Text Style Transfer Framework with Diffusion Model
- DKPMV: Dense Keypoints Fusion from Multi-View RGB Frames for 6D Pose Estimation of Textureless Objects
- Towards Distribution-Shift Uncertainty Estimation for Inverse Problems with Generative Priors
- IUT-Plug: A Plug-in tool for Interleaved Image-Text Generation
- Chart-RVR: Reinforcement Learning with Verifiable Rewards for Explainable Chart Reasoning
- Mixup Helps Understanding Multimodal Video Better
- Perspective-aware 3D Gaussian Inpainting with Multi-view Consistency
- ContextGen: Contextual Layout Anchoring for Identity-Consistent Multi-Instance Generation
- Frequency Domain Unlocks New Perspectives for Abdominal Medical Image Segmentation
- COCO-Tree: Compositional Hierarchical Concept Trees for Enhanced Reasoning in Vision Language Models
- High-Resolution Spatiotemporal Modeling with Global-Local State Space Models for Video-Based Human Pose Estimation
- GIR-Bench: Versatile Benchmark for Generating Images with Reasoning
- Vlaser: Vision-Language-Action Model with Synergistic Embodied Reasoning
- Enhancing Zero-Shot Anomaly Detection: CLIP-SAM Collaboration with Cascaded Prompts
- Benchmarking Deep Learning Models for Laryngeal Cancer Staging Using the LaryngealCT Dataset
- Zero-shot Face Editing via ID-Attribute Decoupled Inversion
- LSVOS 2025 Challenge Report: Recent Advances in Complex Video Object Segmentation
- ROFI: A Deep Learning-Based Ophthalmic Sign-Preserving and Reversible Patient Face Anonymizer
- Future-Aware End-to-End Driving: Bidirectional Modeling of Trajectory Planning and Scene Evolution
- CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization
- Compositional Zero-Shot Learning: A Survey
- MoMaps: Semantics-Aware Scene Motion Generation with Motion Maps
- Multimodal Disease Progression Modeling via Spatiotemporal Disentanglement and Multiscale Alignment
- Connecting Giants: Synergistic Knowledge Transfer of Large Multimodal Models for Few-Shot Learning
- Demystifying Numerosity in Diffusion Models -- Limitations and Remedies
- Validation of an Artificial Intelligence Tool for the Detection of Sperm DNA Fragmentation Using the TUNEL In Situ Hybridization Assay
- Multiview Manifold Evidential Fusion for PolSAR Image Classification
- CoPRS: Learning Positional Prior from Chain-of-Thought for Reasoning Segmentation
- Reliable Cross-modal Alignment via Prototype Iterative Construction
- BLEnD-Vis: Benchmarking Multimodal Cultural Understanding in Vision Language Models
- Saudi Sign Language Translation Using T5
- FlexAC: Towards Flexible Control of Associative Reasoning in Multimodal Large Language Models
- Class Prototypes based Contrastive Learning for Classifying Multi-Label and Fine-Grained Educational Videos
- Investigating Identity Signals in Conversational Facial Dynamics via Disentangled Expression Features
- DTEA: Dynamic Topology Weaving and Instability-Driven Entropic Attenuation for Medical Image Segmentation
- Exploring and Leveraging Class Vectors for Classifier Editing
- EEMS: Edge-Prompt Enhanced Medical Image Segmentation Based on Learnable Gating Mechanism
- Human Uncertainty-Aware Data Selection and Automatic Labeling in Visual Question Answering
- $\Delta \mathrm{Energy}$: Optimizing Energy Change During Vision-Language Alignment Improves both OOD Detection and OOD Generalization
- sketch2symm: Symmetry-aware sketch-to-shape generation via semantic bridging
- Evaluating the effects of preprocessing, method selection, and hyperparameter tuning on SAR-based flood mapping and water depth estimation
- REACT3D: Recovering Articulations for Interactive Physical 3D Scenes
- Steering Over-refusals Towards Safety in Retrieval Augmented Generation
- End-to-end Speech Recognition with similar length speech and text
- Rethinking LLM Evaluation: Can We Evaluate LLMs with 200x Less Data?
- When or What? Understanding Consumer Engagement on Digital Platforms
- VOLTAGE: A Versatile Contrastive Learning based OCR Methodology for ultra low-resource scripts through Auto Glyph Feature Extraction
- Merlin's Whisper: Enabling Efficient Reasoning in LLMs via Black-box Adversarial Prompting
- Detecting Hallucinations in Authentic LLM-Human Interactions
- Preserving LLM Capabilities through Calibration Data Curation: From Analysis to Optimization
- FactAppeal: Identifying Epistemic Factual Appeals in News Media
- You're Not Gonna Believe This: A Computational Analysis of Factual Appeals and Sourcing in Partisan News
- Unlocking LLM Safeguards for Low-Resource Languages via Reasoning and Alignment with Minimal Training Data
- RePro: Training Language Models to Faithfully Recycle the Web for Pretraining
- Sarcasm Detection Using Deep Convolutional Neural Networks: A Modular Deep Learning Framework
- Large Language Models for Full-Text Methods Assessment: A Case Study on Mediation Analysis
- HiligayNER: A Baseline Named Entity Recognition Model for Hiligaynon
- Review of Inference-Time Scaling Strategies: Reasoning, Search and RAG
- DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
- Rethinking Agentic Workflows: Evaluating Inference-Based Test-Time Scaling Strategies in Text2SQL Tasks
- LLM$\times$MapReduce-V3: Enabling Interactive In-Depth Survey Generation through a MCP-Driven Hierarchically Modular Agent System
- ADVICE: Answer-Dependent Verbalized Confidence Estimation
- GapDNER: A Gap-Aware Grid Tagging Model for Discontinuous Named Entity Recognition
- End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF: A Reproducibility Study
- Punctuation-aware treebank tree binarization
- Enhancing Large Language Model Reasoning via Selective Critical Token Fine-Tuning
- LogiNumSynth: Synthesizing Joint Logical-Numerical Reasoning Problems for Language Models
- Enabling Doctor-Centric Medical AI with LLMs through Workflow-Aligned Tasks and Benchmarks
- Latent Refinement Decoding: Enhancing Diffusion-Based Language Models by Refining Belief States
- TypePilot: Leveraging the Scala Type System for Secure LLM-generated Code
- Bridging Gaps in Hate Speech Detection: Meta-Collections and Benchmarks for Low-Resource Iberian Languages
- Evaluating Reasoning Faithfulness in Medical Vision-Language Models using Multimodal Perturbations
- Discursive Circuits: How Do Language Models Understand Discourse Relations?
- WebRouter: Query-specific Router via Variational Information Bottleneck for Cost-sensitive Web Agent
- A Theorem-Proving-Based Evaluation of Neural Semantic Parsing
- CNSocialDepress: A Chinese Social Media Dataset for Depression Risk Detection and Structured Analysis
- XQuant: Achieving Ultra-Low Bit KV Cache Quantization with Cross-Layer Compression
- Do Psychometric Tests Work for Large Language Models? Evaluation of Tests on Sexism, Racism, and Morality
- Emergent Misalignment via In-Context Learning: Narrow in-context examples can produce broadly misaligned LLMs
- Are Large Language Models Effective Knowledge Graph Constructors?
- Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications
- Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies
- Valid Survey Simulations with Limited Human Data: The Roles of Prompting, Fine-Tuning, and Rectification
- Who are you, ChatGPT? Personality and Demographic Style in LLM-Generated Content
- GenCNER: A Generative Framework for Continual Named Entity Recognition
- Hallucination Detection via Internal States and Structured Reasoning Consistency in Large Language Models
- An Encoder-Integrated PhoBERT with Graph Attention for Vietnamese Token-Level Classification
- Information-Preserving Reformulation of Reasoning Traces for Antidistillation
- Invisible Languages of the LLM Universe
- Culturally-Aware Conversations: A Framework & Benchmark for LLMs
- LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
- Survey Response Generation: Generating Closed-Ended Survey Responses In-Silico with Large Language Models
- MeTA-LoRA: Data-Efficient Multi-Task Fine-Tuning for Large Language Models
- Deconstructing Attention: Investigating Design Principles for Effective Language Modeling
- StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
- Enhancing Long Chain-of-Thought Reasoning through Multi-Path Plan Aggregation
- ACADREASON: Exploring the Limits of Reasoning Models with Academic Research Problems
- When Agents Trade: Live Multi-Market Trading Benchmark for LLM Agents
- Demystifying Reinforcement Learning in Agentic Reasoning
- Are Large Reasoning Models Interruptible?
- A Comprehensive Survey on Benchmarks and Solutions in Software Engineering of LLM-Empowered Agentic System
- Task-Aware Resolution Optimization for Visual Large Language Models
- CardRewriter: Leveraging Knowledge Cards for Long-Tail Query Rewriting on Short-Video Platforms
- Bhasha-Rupantarika: Algorithm-Hardware Co-design approach for Multilingual Neural Machine Translation
- Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
- The Social Cost of Intelligence: Emergence, Propagation, and Amplification of Stereotypical Bias in Multi-Agent Systems
- Secret-Protected Evolution for Differentially Private Synthetic Text Generation
- VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
- ELMO: Efficiency via Low-precision and Peak Memory Optimization in Large Output Spaces
- Can Tool-Integrated Reinforcement Learning Generalize Across Diverse Domains?
- Beyond the Crowd: LLM-Augmented Community Notes for Governing Health Misinformation
- ReLook: Vision-Grounded RL with a Multimodal LLM Critic for Agentic Web Coding
- Bag of Tricks for Subverting Reasoning-based Safety Guardrails
- QDER: Query-Specific Document and Entity Representations for Multi-Vector Document Re-Ranking
- REGENT: Relevance-Guided Attention for Entity-Aware Multi-Vector Neural Re-Ranking
- QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs
- Native Language Identification in Turkish: L1 Influence of Arabic, Persian, and Albanian
- Debate, Deliberate, Decide (D3): A Cost-Aware Adversarial Framework for Reliable and Interpretable LLM Evaluation
- A Survey on Automatic Credibility Assessment Using Textual Credibility Signals in the Era of Large Language Models
- SubData: Bridging Heterogeneous Datasets to Enable Theory-Driven Evaluation of Political and Demographic Perspectives in LLMs
- Rethinking the Residual Distribution of Locate-then-Editing Methods in Model Editing
- Dynamic Optimizations of LLM Ensembles with Two-Stage Reinforcement Learning Agents
- Beyond Sample-Level Feedback: Using Reference-Level Feedback to Guide Data Synthesis
- Evolving LLMs' Self-Refinement Capability via Synergistic Training-Inference Optimization
- Hope vs. Hate: Understanding User Interactions with LGBTQ+ News Content in Mainstream US News Media through the Lens of Hope Speech
- Personality Editing for Language Models through Adjusting Self-Referential Queries
- Exploring the Generalizability of Factual Hallucination Mitigation via Enhancing Precise Knowledge Utilization
- Test-Time Alignment for Large Language Models via Textual Model Predictive Control
- Layered Insights: Generalizable Analysis of Authorial Style by Leveraging All Transformer Layers
- Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas
- Add-One-In: Incremental Sample Selection for Large Language Models via a Choice-Based Greedy Paradigm
- LADM: Long-context Training Data Selection with Attention-based Dependency Measurement for LLMs
- Exploring Compositional Generalization (in COGS/ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
- DMDTEval: An Evaluation and Analysis of LLMs on Disambiguation in Multi-domain Translation
- Transparent and Robust RAG: Adaptive-Reward Reinforcement Learning for Decision Traceability
- Saten: Sparse Augmented Tensor Networks for Post-Training Compression of Large Language Models
- Hallucinate at the Last in Long Response Generation: A Case Study on Long Document Summarization
- TemplateRL: Structured Template-Guided Reinforcement Learning for LLM Reasoning
- CrosGrpsABS: Cross-Attention over Syntactic and Semantic Graphs for Aspect-Based Sentiment Analysis in a Low-Resource Language
- ARM: Adaptive Reasoning Model
- Are Language Models Consequentialist or Deontological Moral Reasoners?
- Self-ensemble: Mitigating Confidence Mis-calibration for Large Language Models
- Safety-Aligned Weights Are Not Enough: Refusal-Teacher-Guided Finetuning Enhances Safety and Downstream Performance under Harmful Finetuning Attacks
- Effectiveness of Counter-Speech against Abusive Content: A Multidimensional Annotation and Classification Study
- MultiFinBen: Benchmarking Large Language Models for Multilingual and Multimodal Financial Application
- When Machine Unlearning Meets Retrieval-Augmented Generation (RAG): Keep Secret or Forget Knowledge?
- Training and Evaluating with Human Label Variation: An Empirical Study
- DUMP: Automated Distribution-Level Curriculum Learning for RL-based LLM Post-training
- VLMGuard-R1: Proactive Safety Alignment for VLMs via Reasoning-Driven Prompt Optimization
- Accurate and Diverse LLM Mathematical Reasoning via Automated PRM-Guided GFlowNets
- SkewRoute: Training-Free LLM Routing for Knowledge Graph Retrieval-Augmented Generation via Score Skewness of Retrieved Context
- Complexity-aware fine-tuning
- Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations
- TreeNet: Layered Decision Ensembles
- OmniSAT: Compact Action Token, Faster Auto Regression
- Knowledge-Aware Mamba for Joint Change Detection and Classification from MODIS Times Series
- NNDM: NN_UNet Diffusion Model for Brain Tumor Segmentation
- Adaptive Fusion Network with Temporal-Ranked and Motion-Intensity Dynamic Images for Micro-expression Recognition
- Table Question Answering in the Era of Large Language Models: A Comprehensive Survey of Tasks, Methods, and Evaluation
- Emotionally Charged, Logically Blurred: AI-driven Emotional Framing Impairs Human Fallacy Detection
- VisRAG 2.0: Evidence-Guided Multi-Image Reasoning in Visual Retrieval-Augmented Generation
- Gold Panning: Turning Positional Bias into Signal for Multi-Document LLM Reasoning
- Steering Embedding Models with Geometric Rotation: Mapping Semantic Relationships Across Languages and Models
- Text Prompt Injection of Vision Language Models
- NG-Router: Graph-Supervised Multi-Agent Collaboration for Nutrition Question Answering
- CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
- iBERT: Interpretable Style Embeddings via Sense Decomposition
- DELTA: Dynamic Layer-Aware Token Attention for Efficient Long-Context Reasoning
- Abductive Preference Learning
- HIPPD: Brain-Inspired Hierarchical Information Processing for Personality Detection
- Don't Throw Away Your Pretrained Model
- Enhancing Faithfulness in Abstractive Summarization via Span-Level Fine-Tuning
- Unifying Tree Search Algorithm and Reward Design for LLM Reasoning: A Survey
- Toward Machine Translation Literacy: How Lay Users Perceive and Rely on Imperfect Translations
- MTP-S2UT: Enhancing Speech-to-Speech Translation Quality with Multi-token Prediction
- Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety
- HUME: Measuring the Human-Model Performance Gap in Text Embedding Task
- Unilaw-R1: A Large Language Model for Legal Reasoning with Reinforcement Learning and Iterative Inference
- A-IPO: Adaptive Intent-driven Preference Optimization
- Diversity Augmentation of Dynamic User Preference Data for Boosting Personalized Text Summarizers
- Stop When Enough: Adaptive Early-Stopping for Chain-of-Thought Reasoning
- LinearRAG: Linear Graph Retrieval Augmented Generation on Large-scale Corpora
- BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data
- Weed Out, Then Harvest: Dual Low-Rank Adaptation is an Effective Noisy Label Detector for Noise-Robust Learning
- Text2Token: Unsupervised Text Representation Learning with Token Target Prediction
- ImCoref-CeS: An Improved Lightweight Pipeline for Coreference Resolution with LLM-based Checker-Splitter Refinement
- Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
- On the Entity-Level Alignment in Crosslingual Consistency
- Are LLMs Empathetic to All? Investigating the Influence of Multi-Demographic Personas on a Model's Empathy
- End-to-end Automatic Speech Recognition and Speech Translation: Integration of Speech Foundational Models and LLMs
- ASC analyzer: A Python package for measuring argument structure construction usage in English texts
- AssoMem: Scalable Memory QA with Multi-Signal Associative Retrieval
- RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation
- WebThinker: Empowering Large Reasoning Models with Deep Research Capability
- ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos
- LiTransProQA: an LLM-based Literary Translation evaluation metric with Professional Question Answering
- Adaptive Stress Testing Black-Box LLM Planners
- CHD: Coupled Hierarchical Diffusion for Long-Horizon Tasks
- GenoArmory: A Unified Evaluation Framework for Adversarial Attacks on Genomic Foundation Models
- DexGarmentLab: Dexterous Garment Manipulation Environment with Generalizable Policy
- SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization
- A Set-Sequence Model for Time Series
- Simple and Effective Specialized Representations for Fair Classifiers
- A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone
- Lightweight and Interpretable Transformer via Mixed Graph Algorithm Unrolling for Traffic Forecast
- Noise Injection Systemically Degrades Large Language Model Safety Guardrails
- Breaking the Compression Ceiling: Data-Free Pipeline for Ultra-Efficient Delta Compression
- Communication-Efficient Diffusion Denoising Parallelization via Reuse-then-Predict Mechanism
- From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning
- Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator
- TRIM: Achieving Extreme Sparsity with Targeted Row-wise Iterative Metric-driven Pruning
- DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
- FRIREN: Beyond Trajectories -- A Spectral Lens on Time
- Task-Optimized Convolutional Recurrent Networks Align with Tactile Processing in the Rodent Brain
- Rolling Ball Optimizer: Learning by ironing out loss landscape wrinkles
- STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization
- MA-RAG: Multi-Agent Retrieval-Augmented Generation via Collaborative Chain-of-Thought Reasoning
- Multi-Scale Manifold Alignment for Interpreting Large Language Models: A Unified Information-Geometric Framework
- Empirical Investigation of Latent Representational Dynamics in Large Language Models: A Manifold Evolution Perspective
- TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data
- Inclusive, Differentially Private Federated Learning for Clinical Data
- Latent Reasoning via Sentence Embedding Prediction
- Does Machine Unlearning Truly Remove Knowledge?
- Jigsaw-R1: A Study of Rule-based Visual Reinforcement Learning with Jigsaw Puzzles
- AMSbench: A Comprehensive Benchmark for Evaluating MLLM Capabilities in AMS Circuits
- Equivalent Linear Mappings of Large Language Models
- $\texttt{AVROBUSTBENCH}$: Benchmarking the Robustness of Audio-Visual Recognition Models at Test-Time
- State-Covering Trajectory Stitching for Diffusion Planners
- Towards Unsupervised Training of Matching-based Graph Edit Distance Solver via Preference-aware GAN
- Bridging Neural ODE and ResNet: A Formal Error Bound for Safety Verification
- MoodAngels: A Retrieval-augmented Multi-agent Framework for Psychiatry Diagnosis
- AbBiBench: A Benchmark for Antibody Binding Affinity Maturation and Design
- Superior Molecular Representations from Intermediate Encoder Layers
- Monotone and Conservative Policy Iteration Beyond the Tabular Case
- Revisit What You See: Disclose Language Prior in Vision Tokens for LVLM Decoding
- Tversky Neural Networks: Psychologically Plausible Deep Learning with Differentiable Tversky Similarity
- Revisiting Chain-of-Thought Prompting: Zero-shot Can Be Stronger than Few-shot
- StreetLens: Enabling Human-Centered AI Agents for Neighborhood Assessment from Street View Imagery
- Ignition Phase : Standard Training for Fast Adversarial Robustness
- SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
- Structured Kolmogorov-Arnold Neural ODEs for Interpretable Learning and Symbolic Discovery of Nonlinear Dynamics
- On Convolutions, Intrinsic Dimension, and Diffusion Models
- Doc2SAR: A Synergistic Framework for High-Fidelity Extraction of Structure-Activity Relationships from Scientific Documents
- The Hidden Link Between RLHF and Contrastive Learning
- PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
- GradMetaNet: An Equivariant Architecture for Learning on Gradients
- MLLM-Fabric: Multimodal Large Language Model-Driven Robotic Framework for Fabric Sorting and Selection
- Train-before-Test Harmonizes Language Model Rankings
- Simulating Three-dimensional Turbulence with Physics-informed Neural Networks
- Learning Diffusion Models with Flexible Representation Guidance
- Prompt4Trust: A Reinforcement Learning Prompt Augmentation Framework for Clinically-Aligned Confidence Calibration in Multimodal Large Language Models
- RedOne: Revealing Domain-specific LLM Post-Training in Social Networking Services
- A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n
- Learning Representations of Event Time Series with Sparse Autoencoders for Anomaly Detection, Similarity Search, and Unsupervised Classification
- Beyond Rate Coding: Surrogate Gradients Enable Spike Timing Learning in Spiking Neural Networks
- Efficient Compositional Multi-tasking for On-device Large Language Models
- NSPDI-SNN: An efficient lightweight SNN based on nonlinear synaptic pruning and dendritic integration
- Long-Range Graph Wavelet Networks
- ASTREA: Introducing Agentic Intelligence for Orbital Thermal Autonomy
- Generalisation of automatic tumour segmentation in histopathological whole-slide images across multiple cancer types
- Protein as a Second Language for LLMs
- RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
- Domain-Specific Data Generation Framework for RAG Adaptation
- The Curious Case of Factual (Mis)Alignment between LLMs' Short- and Long-Form Answers
- Fairness Metric Design Exploration in Multi-Domain Moral Sentiment Classification using Transformer-Based Models
- LightPneumoNet: Lightweight Pneumonia Classifier
- Attacks by Content: Automated Fact-checking is an AI Security Issue
- Nepali Sign Language Characters Recognition: Dataset Development and Deep Learning Approaches
- Large Language Models Are Effective Code Watermarkers
- A Large-Language-Model Assisted Automated Scale Bar Detection and Extraction Framework for Scanning Electron Microscopic Images
- From Prompts to Packets: A View from the Network on ChatGPT, Copilot, and Gemini
- Towards Real-Time Fake News Detection under Evidence Scarcity
- ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models
- LouisKV: Efficient KV Cache Retrieval for Long Input-Output Sequences
- Beyond touch-based HMI: Control your machines in natural language by utilizing large language models and OPC UA
- When Does Supervised Training Pay Off? The Hidden Economics of Object Detection in the Era of Vision-Language Models
- FOSSIL: Harnessing Feedback on Suboptimal Samples for Data-Efficient Generalisation with Imitation Learning for Embodied Vision-and-Language Tasks
- Do LLMs "Feel"? Emotion Circuits Discovery and Control
- Diffusion-Link: Diffusion Probabilistic Model for Bridging the Audio-Text Modality Gap
- Event-Aware Prompt Learning for Dynamic Graphs
- Part II: ROLL Flash -- Accelerating RLVR and Agentic Training with Asynchrony
- Uncertainty-Aware ControlNet: Bridging Domain Gaps with Synthetic Image Generation
- Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity
- Understanding the Generalization of Stochastic Gradient Adam in Learning Neural Networks
- LLM-Specific Utility: A New Perspective for Retrieval-Augmented Generation
- Stabilizing MoE Reinforcement Learning by Aligning Training and Inference Routers
- Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
- Medical Interpretability and Knowledge Maps of Large Language Models
- DocReward: A Document Reward Model for Structuring and Stylizing
- Living Off the LLM: How LLMs Will Change Adversary Tactics
- KnowRL: Teaching Language Models to Know What They Know
- Reconstructing 12-Lead ECG from 3-Lead ECG using Variational Autoencoder to Improve Cardiac Disease Detection of Wearable ECG Devices
- Audio-Maestro: Enhancing Large Audio-Language Models with Tool-Augmented Reasoning
- Iterative Amortized Inference: Unifying In-Context Learning and Learned Optimizers
- Coordinated Strategies in Realistic Air Combat by Hierarchical Multi-Agent Reinforcement Learning
- Investigating Large Language Models' Linguistic Abilities for Text Preprocessing
- AndesVL Technical Report: An Efficient Mobile-side Multimodal Large Language Model
- Offline Reinforcement Learning with Generative Trajectory Policies
- People use fast, flat goal-directed simulation to reason about novel problems
- Automatic Music Sample Identification with Multi-Track Contrastive Learning
- LikePhys: Evaluating Intuitive Physics Understanding in Video Diffusion Models via Likelihood Preference
- Cracking CodeWhisperer: Analyzing Developers' Interactions and Patterns During Programming Tasks
- A Flexible Multi-Agent Deep Reinforcement Learning Framework for Dynamic Routing and Scheduling of Latency-Critical Services
- CodeWatcher: IDE Telemetry Data Extraction Tool for Understanding Coding Interactions with LLMs
- Query-Specific GNN: A Comprehensive Graph Representation Learning Method for Retrieval Augmented Generation
- Characterizing Web Search in The Age of Generative AI
- Hierarchical Qubit-Merging Transformer for Quantum Error Correction
- SemCSE-Multi: Multifaceted and Decodable Embeddings for Aspect-Specific and Interpretable Scientific Domain Mapping
- LLM-Oriented Token-Adaptive Knowledge Distillation
- Attention Factors for Statistical Arbitrage
- EvoCAD: Evolutionary CAD Code Generation with Vision Language Models
- NV3D: Leveraging Spatial Shape Through Normal Vector-based 3D Object Detection
- MATH-Beyond: A Benchmark for RL to Expand Beyond the Base Model
- FinVet: A Collaborative Framework of RAG and External Fact-Checking Agents for Financial Misinformation Detection
- ManiAgent: An Agentic Framework for General Robotic Manipulation
- FACE: Faithful Automatic Concept Extraction
- Accelerated stochastic first-order method for convex optimization under heavy-tailed noise
- Ego-Vision World Model for Humanoid Contact Planning
- Boundary-Guided Policy Optimization for Memory-efficient RL of Diffusion Large Language Models
- Representation-Based Exploration for Language Models: From Test-Time to Post-Training
- PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
- Phys2Real: Fusing VLM Priors with Interactive Online Adaptation for Uncertainty-Aware Sim-to-Real Manipulation
- Scaling Language-Centric Omnimodal Representation Learning
- Adversarial Attacks Leverage Interference Between Features in Superposition
- CodePlot-CoT: Mathematical Visual Reasoning by Thinking with Code-Driven Images
- Fake News in Social Networks
- Learning to Be Cautious
- ChipGPT: How far are we from natural language hardware design
- Leveraging Twitter Data for Sentiment Analysis of Transit User Feedback: An NLP Framework
- DeAL: Decoding-time Alignment for Large Language Models
- GI-NAS: Boosting Gradient Inversion Attacks Through Adaptive Neural Architecture Search
- A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
- Human-inspired Episodic Memory for Infinite Context LLMs
- OmniBal: Towards Fast Instruction-Tuning for Vision-Language Models via Omniverse Computation Balance
- R-GAT: Cancer Document Classification Leveraging Graph-Based Residual Network for Scenarios with Limited Data
- Draw with Thought: Unleashing Multimodal Reasoning for Scientific Diagram Generation
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
- CodeCrash: Exposing LLM Fragility to Misleading Natural Language in Code Reasoning
- Retrieval is Not Enough: Enhancing RAG Reasoning through Test-Time Critique and Optimization
- mCLM: A Modular Chemical Language Model that Generates Functional and Makeable Molecules
- How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
- MCP-RADAR: A Multi-Dimensional Benchmark for Evaluating Tool Use Capabilities in Large Language Models
- TCP: a Benchmark for Temporal Constraint-Based Planning
- SMELLNET: A Large-scale Dataset for Real-world Smell Recognition
- Agents of Change: Self-Evolving LLM Agents for Strategic Planning
- Wide-Horizon Thinking and Simulation-Based Evaluation for Real-World LLM Planning with Multifaceted Constraints
- Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
- Multi-Functional RIS-Enabled in SAGIN for IoT: A Hybrid Deep Reinforcement Learning Approach with Compressed Twin-Models
- Large Language Models and Operations Research: A Structured Survey
- DeepVARwT: Deep Learning for a VAR Model with Trend
- Meta-Learning Adaptive Loss Functions
- SANTA: Separate Strategies for Inaccurate and Incomplete Annotation Noise in Distantly-Supervised Named Entity Recognition
- Camouflaged Image Synthesis Is All You Need to Boost Camouflaged Detection
- Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization
- Codiscovering graphical structure and functional relationships within data: A Gaussian Process framework for connecting the dots
- Survey of Natural Language Processing for Education: Taxonomy, Systematic Review, and Future Trends
- Discovering and Reasoning of Causality in the Hidden World with Large Language Models
- Output Format Biases in the Evaluation of Large Language Models for Code Translation
- Does Biomedical Training Lead to Better Medical Performance?
- The Interpretable and Effective Graph Neural Additive Networks
- A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice
- Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
- SWIFT: Semantic Watermarking for Image Forgery Thwarting
- MVIGER: Multi-View Variational Integration of Complementary Knowledge for Generative Recommender
- AI-powered skin spectral imaging enables instant sepsis diagnosis and outcome prediction in critically ill patients
- QUITO-X: A New Perspective on Context Compression from the Information Bottleneck Theory
- Automated detection of underdiagnosed medical conditions via opportunistic imaging
- Neuralink: Fast LLM Inference on Smartphones with Neuron Co-Activation Linking
- Who Speaks Matters: Analysing the Influence of the Speaker's Ethnicity on Hate Classification
- Retrieval-Retro: Retrieval-based Inorganic Retrosynthesis with Expert Knowledge
- FairDD: Fair Dataset Distillation
- Robot Learning with Super-Linear Scaling
- Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios
- Can a MISL Fly? Analysis and Ingredients for Mutual Information Skill Learning
- SEKE: Specialised Experts for Keyword Extraction
- TREAD: Token Routing for Efficient Architecture-agnostic Diffusion Training
- Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning
- On the Role of Transformer Feed-Forward Layers in Nonlinear In-Context Learning
- Robust Federated Finetuning of LLMs via Alternating Optimization of LoRA
- Contrastive Representation Distillation via Multi-Scale Feature Decoupling
- The Minimal Search Space for Conditional Causal Bandits
- Auction Design using Value Prediction with Hallucinations
- AB-UPT: Scaling Neural CFD Surrogates for High-Fidelity Automotive Aerodynamics Simulations via Anchored-Branched Universal Physics Transformers
- DemonAgent: Dynamically Encrypted Multi-Backdoor Implantation Attack on LLM-based Agent
- Precise Mobile Manipulation of Small Everyday Objects
- Steering LLMs for Formal Theorem Proving
- Unveiling Downstream Performance Scaling of LLMs: A Clustering-Based Perspective
- Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
- MathTutorBench: A Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors
- Disentangling Feature Structure: A Mathematically Provable Two-Stage Training Dynamics in Transformers
- Agents in the Sandbox: End-to-End Crash Bug Reproduction for Minecraft
- On the Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions
- Learning to Instruct for Visual Instruction Tuning
- Post-Incorporating Code Structural Knowledge into Pretrained Models via ICL for Code Translation
- PartialLoading: User Scheduling and Bandwidth Allocation for Parameter-sharing Edge Inference
- VNJPTranslate: A comprehensive pipeline for Vietnamese-Japanese translation
- Frontier AI's Impact on the Cybersecurity Landscape
- Deep Learning-based Intrusion Detection Systems: A Survey
- BabyVLM: Data-Efficient Pretraining of VLMs Inspired by Infant Learning
- Learning from Reference Answers: Versatile Language Model Alignment without Binary Human Preference Data
- Forecasting Clinical Risk from Textual Time Series: Structuring Narratives for Temporal AI in Healthcare
- QAMA: Scalable Quantum Annealing Multi-Head Attention Operator for Deep Learning
- You only need 4 extra tokens: Synergistic Test-time Adaptation for LLMs
- SGM: A Statistical Godel Machine for Risk-Controlled Recursive Self-Modification
- Reasoning-Enhanced Large Language Models for Molecular Property Prediction
- MRI Brain Tumor Detection with Computer Vision
- Audit-of-Understanding: Posterior-Constrained Inference for Mathematical Reasoning in Language Models
- Unveiling Gamer Archetypes through Multi modal feature Correlations and Unsupervised Learning
- MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
- X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model
- Simulating Viva Voce Examinations to Evaluate Clinical Reasoning in Large Language Models
- ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
- From Programs to Poses: Factored Real-World Scene Generation via Learned Program Libraries
- MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
- The algorithmic regulator
- Sample-Efficient Online Learning in LM Agents via Hindsight Trajectory Rewriting
- Prepared for the Unknown: Adapting AIOps Capacity Forecasting Models to Data Changes
- Bridging Semantics & Structure for Software Vulnerability Detection using Hybrid Network Models
- KG-MAS: Knowledge Graph-Enhanced Multi-Agent Infrastructure for coupling physical and digital robotic environments
- Mapping the Urban Mobility Intelligence Frontier: A Scientometric Analysis of Data-Driven Pedestrian Trajectory Prediction and Simulation
- Towards Safe Maneuvering of Double-Ackermann-Steering Robots with a Soft Actor-Critic Framework
- Measuring What Matters: Connecting AI Ethics Evaluations to System Attributes, Hazards, and Harms
- Ortho-Fuse: Orthomosaic Generation for Sparse High-Resolution Crop Health Maps Through Intermediate Optical Flow Estimation
- RobotFleet: An Open-Source Framework for Centralized Multi-Robot Task Planning
- Identifying bias in CNN image classification using image scrambling and transforms
- RefusalBench: Generative Evaluation of Selective Refusal in Grounded Language Models
- STEAM: A Semantic-Level Knowledge Editing Framework for Large Language Models
- Controllable Graph Generation with Diffusion Models via Inference-Time Tree Search Guidance
- Mesh-Gait: A Unified Framework for Gait Recognition Through Multi-Modal Representation Learning from 2D Silhouettes
- LONGQAEVAL: Designing Reliable Evaluations of Long-Form Clinical QA under Resource Constraints
- Combo-Gait: Unified Transformer Framework for Multi-Modal Gait Recognition and Attribute Analysis
- Taming a Retrieval Framework to Read Images in Humanlike Manner for Augmenting Generation of MLLMs
- Hierarchical LoRA MoE for Efficient CTR Model Scaling
- Multi-Task Learning with Feature-Similarity Laplacian Graphs for Predicting Alzheimer's Disease Progression
- Do Audio LLMs Really LISTEN, or Just Transcribe? Measuring Lexical vs. Acoustic Emotion Cues Reliance
- Reverse Supervision at Scale: Exponential Search Meets the Economics of Annotation
- Data-driven simulator of multi-animal behavior with unknown dynamics via offline and online reinforcement learning
- NIM: Neuro-symbolic Ideographic Metalanguage for Inclusive Communication
- Testing and Enhancing Multi-Agent Systems for Robust Code Generation
- Learning from Disagreement: A Group Decision Simulation Framework for Robust Medical Image Segmentation
- LightSAE: Parameter-Efficient and Heterogeneity-Aware Embedding for IoT Multivariate Time Series Forecasting
- AnyBCQ: Hardware Efficient Flexible Binary-Coded Quantization for Multi-Precision LLMs
- FML-bench: A Benchmark for Automatic ML Research Agents Highlighting the Importance of Exploration Breadth
- Assessing Large Language Models for Structured Medical Order Extraction
- Latent Retrieval Augmented Generation of Cross-Domain Protein Binders
- UltraLLaDA: Scaling the Context Length to 128K for Diffusion Large Language Models
- SASER: Stego attacks on open-source LLMs
- Towards Self-Refinement of Vision-Language Models with Triangular Consistency
- Personalized Motion Guidance Framework for Athlete-Centric Coaching
- Align2Act: Instruction-Tuned Models for Human-Aligned Autonomous Driving
- MARS-Sep: Multimodal-Aligned Reinforced Sound Separation
- f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
- Population-Coded Spiking Neural Networks for High-Dimensional Robotic Control
- ECO: Enhanced Code Optimization via Performance-Aware Prompting for Code-LLMs
- Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?
- PAC-Bayesian Reinforcement Learning Trains Generalizable Policies
- GLOFNet -- A Multimodal Dataset for GLOF Monitoring and Prediction
- BitMar: Low-Bit Multimodal Fusion with Episodic Memory for Edge Devices
- Compositional Symmetry as Compression: Lie Pseudogroup Structure in Algorithmic Agents
- Dynamic Topic Evolution with Temporal Decay and Attention in Large Language Models
- A Machine Learning Approach for MIDI to Guitar Tablature Conversion
- UniCoD: Enhancing Robot Policy via Unified Continuous and Discrete Representation Learning
- Trustworthy Retrosynthesis: Eliminating Hallucinations with a Diverse Ensemble of Reaction Scorers
- DEMO: Disentangled Motion Latent Flow Matching for Fine-Grained Controllable Talking Portrait Synthesis
- AGENTIQL: An Agent-Inspired Multi-Expert Framework for Text-to-SQL Generation
- Scalable Face Security Vision Foundation Model for Deepfake, Diffusion, and Spoofing Detection
- BrowserAgent: Building Web Agents with Human-Inspired Web Browsing Actions
- Image-to-Video Transfer Learning based on Image-Language Foundation Models: A Comprehensive Survey
- LSZone: A Lightweight Spatial Information Modeling Architecture for Real-time In-car Multi-zone Speech Separation
- High-Dimensional Learning Dynamics of Quantized Models with Straight-Through Estimator
- Attention-Enhanced LSTM Modeling for Improved Temperature and Rainfall Forecasting in Bangladesh
- Missing Data Multiple Imputation for Tabular Q-Learning in Online RL
- Deep Learning in Astrophysics
- HYPERDOA: Robust and Efficient DoA Estimation using Hyperdimensional Computing
- SS-DPPN: A self-supervised dual-path foundation model for the generalizable cardiac audio representation
- Provable Anytime Ensemble Sampling Algorithms in Nonlinear Contextual Bandits
- Proficiency-Aware Adaptation and Data Augmentation for Robust L2 ASR
- A Stochastic Differential Equation Framework for Multi-Objective LLM Interactions: Dynamical Systems Analysis with Code Generation Applications
- Optimally Deep Networks -- Adapting Model Depth to Datasets for Superior Efficiency
- GPS Spoofing Attack Detection in Autonomous Vehicles Using Adaptive DBSCAN
- Understanding Sampler Stochasticity in Training Diffusion Models for RLHF
- ParsVoice: A Large-Scale Multi-Speaker Persian Speech Corpus for Text-to-Speech Synthesis
- DISC-GAN: Disentangling Style and Content for Cluster-Specific Synthetic Underwater Image Generation
- BioOSS: A Bio-Inspired Oscillatory State System with Spatio-Temporal Dynamics
- Toward Human-Centered Readability Evaluation
- MSCloudCAM: Cross-Attention with Multi-Scale Context for Multispectral Cloud Segmentation
- PruneGCRN: Minimizing and explaining spatio-temporal problems through node pruning
- Therapeutic AI and the Hidden Risks of Over-Disclosure: An Embedded AI-Literacy Framework for Mental Health Privacy
- Is Implicit Knowledge Enough for LLMs? A RAG Approach for Tree-based Structures
- Generative AI and the Transformation of Software Development Practices
- From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis
- Agentic RAG for Software Testing with Hybrid Vector-Graph and Multi-Agent Orchestration
- Happiness is Sharing a Vocabulary: A Study of Transliteration Methods
- VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering
- Software Defect Prediction using Autoencoder Transformer Model
- Discrete State Diffusion Models: A Sample Complexity Perspective
- HeroFilter: Adaptive Spectral Graph Filter for Varying Heterophilic Relations
- GRIP: A Unified Framework for Grid-Based Relay and Co-Occurrence-Aware Planning in Dynamic Environments
- Generative AI for Software Project Management: Insights from a Review of Software Practitioner Literature
- Topological Alignment of Shared Vision-Language Embedding Space
- LPCVAE: A Conditional VAE with Long-Term Dependency and Probabilistic Time-Frequency Fusion for Time Series Anomaly Detection
- DreamMakeup: Face Makeup Customization using Latent Diffusion Models
- Comparative Explanations via Counterfactual Reasoning in Recommendations
- FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model
- Evaluating Language Models' Evaluations of Games
- TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
- Redundancy as a Structural Information Principle for Learning and Generalization
- Unify Variables in Neural Scaling Laws for General Audio Representations via Embedding Effective Rank
- Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models
- Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning
- KOTOX: A Korean Toxic Dataset for Deobfuscation and Detoxification
- MC#: Mixture Compressor for Mixture-of-Experts Large Models
- APLOT: Robust Reward Modeling via Adaptive Preference Learning with Optimal Transport
- Judge Before Answer: Can MLLM Discern the False Premise in Question?
- RV-HATE: Reinforced Multi-Module Voting for Implicit Hate Speech Detection
- Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
- DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
- A Survey on Agentic Multimodal Large Language Models
- DeepResearchGuard: Deep Research with Open-Domain Evaluation and Multi-Stage Guardrails for Safety
- ABLEIST: Intersectional Disability Bias in LLM-Generated Hiring Scenarios
- DND: Boosting Large Language Models with Dynamic Nested Depth
- Automating Structural Engineering Workflows with Large Language Model Agents
- Into the Unknown: Towards using Generative Models for Sampling Priors of Environment Uncertainty for Planning in Configuration Spaces
- GeoVLMath: Enhancing Geometry Reasoning in Vision-Language Models via Cross-Modal Reward for Auxiliary Line Creation
- XGrasp: Gripper-Aware Grasp Detection with Multi-Gripper Data Generation
- From Reasoning LLMs to BERT: A Two-Stage Distillation Framework for Search Relevance
- Temporal Alignment Guidance: On-Manifold Sampling in Diffusion Models
- PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
- Flow Matching-Based Autonomous Driving Planning with Advanced Interactive Behavior Modeling
- Causal Disentanglement Learning for Accurate Anomaly Detection in Multivariate Time Series
- Source-Free Object Detection with Detection Transformer
- Text-Enhanced Panoptic Symbol Spotting in CAD Drawings
- HoMer: Addressing Heterogeneities by Modeling Sequential and Set-wise Contexts for CTR Prediction
- A Primer on SO(3) Action Representations in Deep Reinforcement Learning
- Enhancing LLM Reasoning via Non-Human-Like Reasoning Path Preference Optimization
- A Vision for Access Control in LLM-based Agent Systems
- PhysioME: A Robust Multimodal Self-Supervised Framework for Physiological Signals with Missing Modalities
- video-SALMONN S: Streaming Audio-Visual LLMs Beyond Length Limits via Memory
- One Size Does Not Fit All: Exploring Variable Thresholds for Distance-Based Multi-Label Text Classification
- EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling
- G2L:From Giga-Scale to Cancer-Specific Large-Scale Pathology Foundation Models via Knowledge Distillation
- Responsible AI Adoption in the Public Sector: A Data-Centric Taxonomy of AI Adoption Challenges
- Bias-Aware AI Chatbot for Engineering Advising at the University of Maryland A. James Clark School of Engineering
- Direct Routing Gradient (DRGrad): A Personalized Information Surgery for Multi-Task Learning (MTL) Recommendations
- Enhanced Urban Traffic Management Using CCTV Surveillance Videos and Multi-Source Data Current State Prediction and Frequent Episode Mining
- Real-Time Health Analytics Using Ontology-Driven Complex Event Processing and LLM Reasoning: A Tuberculosis Case Study
- Rounding-Guided Backdoor Injection in Deep Learning Model Quantization
- TinyViT-Batten: Few-Shot Vision Transformer with Explainable Attention for Early Batten-Disease Detection on Pediatric MRI
- Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 Object Detectors for Computer Vision and Pattern Recognition
- Data Provenance Auditing of Fine-Tuned Large Language Models with a Text-Preserving Technique
- Generative Models for Helmholtz Equation Solutions: A Dataset of Acoustic Materials
- Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models
- Learning What Matters: Steering Diffusion via Spectrally Anisotropic Forward Noise
- Adversarial-Resilient RF Fingerprinting: A CNN-GAN Framework for Rogue Transmitter Detection
- A Hybrid Computational Intelligence Framework with Metaheuristic Optimization for Drug-Drug Interaction Prediction
- Leveraging LLMs to Streamline the Review of Public Funding Applications
- Coupled Data and Measurement Space Dynamics for Enhanced Diffusion Posterior Sampling
- AI in Computational Thinking Education in Higher Education: A Systematic Literature Review
- Fortifying LLM-Based Code Generation with Graph-Based Reasoning on Secure Coding Practices
- Deep Neural Networks Inspired by Differential Equations
- Stop DDoS Attacking the Research Community with AI-Generated Survey Papers
- On the Occurence of Critical Learning Periods in Neural Networks
- CREST-Search: Comprehensive Red-teaming for Evaluating Safety Threats in Large Language Models Powered by Web Search
- Evaluation of Differential Privacy Mechanisms on Federated Learning
- Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection
- Vanishing Contributions: A Unified Approach to Smoothly Transition Neural Models into Compressed Form
- VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
- A Demonstration of Self-Adaptive Jamming Attack Detection in AI/ML Integrated O-RAN
- The Idola Tribus of AI: Large Language Models tend to perceive order where none exists
- SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
- ReaLM: Residual Quantization Bridging Knowledge Graph Embeddings and Large Language Models
- Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments
- All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
- High-Power Training Data Identification with Provable Statistical Guarantees
- ICL-Router: In-Context Learned Model Representations for LLM Routing
- Preference-Aware Memory Update for Long-Term LLM Agents
- Layout-Aware Parsing Meets Efficient LLMs: A Unified, Scalable Framework for Resume Information Extraction and Evaluation
- It's 2025 -- Narrative Learning is the new baseline to beat for explainable machine learning
- InteractScience: Programmatic and Visually-Grounded Evaluation of Interactive Scientific Demonstration Code Generation
- Herb.jl: A Unifying Program Synthesis Library
- Evaluating LLM-Based Process Explanations under Progressive Behavioral-Input Reduction
- ARROW: An Adaptive Rollout and Routing Method for Global Weather Forecasting
- InterCorpRel-LLM: Enhancing Financial Relational Understanding with Graph-Language Models
- Chlorophyll-a Mapping and Prediction in the Mar Menor Lagoon Using C2RCC-Processed Sentinel 2 Imagery
- Judge's Verdict: A Comprehensive Analysis of LLM Judge Capability Through Human Agreement
- Machine learning methods fail to provide cohesive atheoretical construction of personality traits from semantic embeddings
- Patentformer: A demonstration of AI-assisted automated patent drafting
- PatentVision: A multimodal method for drafting patent applications
- Scaling Laws and Symmetry, Evidence from Neural Force Fields
- PromptGuard at BLP-2025 Task 1: A Few-Shot Classification Framework Using Majority Voting and Keyword Similarity for Bengali Hate Speech Detection
- Why Do Transformers Fail to Forecast Time Series In-Context?
- SVTime: Small Time Series Forecasting Models Informed by "Physics" of Large Vision Model Forecasters
- Building a Foundational Guardrail for General Agentic Systems via Synthetic Data
- Large Language Models for Imbalanced Classification: Diversity makes the difference
- Temporal Lifting as Latent-Space Regularization for Continuous-Time Flow Models in AI Systems
- Towards Understanding Ambiguity Resolution in Multimodal Inference of Meaning
- Harnessing Self-Supervised Deep Learning and Geostationary Remote Sensing for Advancing Wildfire and Associated Air Quality Monitoring: Improved Smoke and Fire Front Masking using GOES and TEMPO Radiance Data
- CALM: A Causal Analysis Language Model for Tabular Data in Complex Systems with Local Scores, Conditional Independence Tests, and Relation Attributes
- ProxRouter: Proximity-Weighted LLM Query Routing for Improved Robustness to Outliers
- Token is All You Price
- NarraBench: A Comprehensive Framework for Narrative Benchmarking
- WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions
- ROBOPSY PL[AI]: Using Role-Play to Investigate how LLMs Present Collective Memory
- Myopic Bayesian Decision Theory for Batch Active Learning with Partial Batch Label Sampling
- CHUG: Crowdsourced User-Generated HDR Video Quality Dataset
- Closing the Data-Efficiency Gap Between Autoregressive and Masked Diffusion LLMs
- Probabilistic bias adjustment of seasonal predictions of Arctic Sea Ice Concentration
- Chain-of-Influence: Tracing Interdependencies Across Time and Features in Clinical Predictive Modelings
- Learning Bug Context for PyTorch-to-JAX Translation with LLMs
- Stability of Transformers under Layer Normalization
- Agentic Property-Based Testing: Finding Bugs Across the Python Ecosystem
- SpectralCA: Bi-Directional Cross-Attention for Next-Generation UAV Hyperspectral Vision
- Augmenting generative models with biomedical knowledge graphs improves targeted drug discovery
- Phase-Aware Deep Learning with Complex-Valued CNNs for Audio Signal Applications
- MemPromptTSS: Persistent Prompt Memory for Iterative Multi-Granularity Time Series State Segmentation
- Denoising Diffusion as a New Framework for Underwater Images
- Unpacking Hateful Memes: Presupposed Context and False Claims
- Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding
- Explainable Human-in-the-Loop Segmentation via Critic Feedback Signals
- Beyond Fertility: Analyzing STRR as a Metric for Multilingual Tokenization Evaluation
- Homomorphic Mappings for Value-Preserving State Aggregation in Markov Decision Processes
- Operationalizing AI: Empirical Evidence on MLOps Practices, User Satisfaction, and Organizational Context
- Neuro-inspired automated lens design
- Beyond the limitation of a single query: Train your LLM for query expansion with Reinforcement Learning
- SLEAN: Simple Lightweight Ensemble Analysis Network for Multi-Provider LLM Coordination: Design, Implementation, and Vibe Coding Bug Investigation Case Study
- Skill-Targeted Adaptive Training
- Lightweight Baselines for Medical Abstract Classification: DistilBERT with Cross-Entropy as a Strong Default
- Efficient Onboard Vision-Language Inference in UAV-Enabled Low-Altitude Economy Networks via LLM-Enhanced Optimization
- FOSSIL: Regret-Minimizing Curriculum Learning for Metadata-Free and Low-Data Mpox Diagnosis
- ALLOY: Generating Reusable Agent Workflows from User Demonstration
- Think Twice to See More: Iterative Visual Reasoning in Medical VLMs
- Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling
- CLMN: Concept based Language Models via Neural Symbolic Reasoning
- OBsmith: Testing JavaScript Obfuscator using LLM-powered sketching
- Gradient-based Model Shortcut Detection for Time Series Classification
- How AI Companionship Develops: Evidence from a Longitudinal Study
- Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
- What Makes Looped Transformers Perform Better Than Non-Recursive Ones (Provably)
- Uncovering Singularities in Feynman Integrals via Machine Learning
- Uncertainty-Aware Post-Detection Framework for Enhanced Fire and Smoke Detection in Compact Deep Learning Models
- Training-Free In-Context Forensic Chain for Image Manipulation Detection and Localization
- DeepFusionNet: Autoencoder-Based Low-Light Image Enhancement and Super-Resolution
- Ctrl-World: A Controllable Generative World Model for Robot Manipulation
- CacheClip: Accelerating RAG with Effective KV Cache Reuse
- PermLLM: Learnable Channel Permutation for N:M Sparse Large Language Models
- Hybrid OCR-LLM Framework for Enterprise-Scale Document Information Extraction Under Copy-heavy Task
- DiffHeads: Differential Analysis and Inference-Time Masking of Bias Heads in Large Language Models
- A Unified Frequency Domain Decomposition Framework for Interpretable and Robust Time Series Forecasting
- Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective
- BILLY: Steering Large Language Models via Merging Persona Vectors for Creative Generation
- Multi-Scale Diffusion Transformer for Jointly Simulating User Mobility and Mobile Traffic Pattern
- SaFiRe: Saccade-Fixation Reiteration with Mamba for Referring Image Segmentation
- Large Language Model Sourcing: A Survey
- HccePose(BF): Predicting Front \& Back Surfaces to Construct Ultra-Dense 2D-3D Correspondences for Pose Estimation
- LLMs are All You Need? Improving Fuzz Testing for MOJO with Large Language Models
- Dejavu: Post-Deployment Learning for Embodied Agents via Experience Feedback
- A Survey of Inductive Reasoning for Large Language Models
- MedAgentAudit: Diagnosing and Quantifying Collaborative Failure Modes in Medical Multi-Agent Systems
- Formally Verified Certification of Unsolvability of Temporal Planning Problems
- CauchyNet: Compact and Data-Efficient Learning using Holomorphic Activation Functions
- Revisiting Trust in the Era of Generative AI: Factorial Structure and Latent Profiles
- RLFR: Extending Reinforcement Learning for LLMs with Flow Environment
- Distributionally Robust Control with End-to-End Statistically Guaranteed Metric Learning
- Learning to Guarantee Type Correctness in Code Generation through Type-Guided Program Synthesis
- UF-RNN: Real-Time Adaptive Motion Generation Using Uncertainty-Driven Foresight Prediction
- A3RNN: Bi-directional Fusion of Bottom-up and Top-down Process for Developmental Visual Attention in Robots
- The Geometry of Reasoning: Flowing Logics in Representation Space
- How can we assess human-agent interactions? Case studies in software agent design
- AI and Consciousness
- Beyond AlphaEarth: Toward Human-Centered Spatial Representation via POI-Guided Contrastive Learning
- Autonomous Agents for Scientific Discovery: Orchestrating Scientists, Language, Code, and Physics
- The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs
- Follow My Lead: Logical Fallacy Classification with Knowledge-Augmented LLMs
- Deliberative Dynamics and Value Alignment in LLM Debates
- RIPRAG: Hack a Black-box Retrieval-Augmented Generation Question-Answering System with Reinforcement Learning
- Failure-Driven Workflow Refinement
- Belief Graphs with Reasoning Zones: Structure, Dynamics, and Epistemic Activation
- SwarmSys: Decentralized Swarm-Inspired Agents for Scalable and Adaptive Reasoning
- SyncLipMAE: Contrastive Masked Pretraining for Audio-Visual Talking-Face Representation
- Agentic Troubleshooting Guide Automation for Incident Management
- DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
- CharCom: Composable Identity Control for Multi-Character Story Illustration
- Concise Reasoning in the Lens of Lagrangian Optimization
- SAFER: Risk-Constrained Sample-then-Filter in Large Language Models
- Don't Just Fine-tune the Agent, Tune the Environment
- PIXEL: Adaptive Steering Via Position-wise Injection with eXact Estimated Levels under Subspace Calibration
- Adaptive Dual Reasoner: Large Reasoning Models Can Think Efficiently by Hybrid Reasoning
- The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
- Mitigating Hallucination in Multimodal Reasoning via Functional Attention Control
- LLM-Friendly Knowledge Representation for Customer Support
- Beyond Ethics: How Inclusive Innovation Drives Economic Returns in Medical AI
- Trace Length is a Simple Uncertainty Signal in Reasoning Models
- Traj-CoA: Patient Trajectory Modeling via Chain-of-Agents for Lung Cancer Risk Prediction
- MedCoAct: Confidence-Aware Multi-Agent Collaboration for Complete Clinical Decision
- Tracing the Traces: Latent Temporal Signals for Efficient and Accurate Reasoning
- ELAIPBench: A Benchmark for Expert-Level Artificial Intelligence Paper Understanding
- A Layered Intuition -- Method Model with Scope Extension for LLM Reasoning
- A Distance Measure for Random Permutation Set: From the Layer-2 Belief Structure Perspective
- EA4LLM: A Gradient-Free Approach to Large Language Model Optimization via Evolutionary Algorithms
- Collaborative Text-to-Image Generation via Multi-Agent Reinforcement Learning and Semantic Fusion
- Automatic Piecewise Linear Regression for Predicting Student Learning Satisfaction
- Equity-Aware Geospatial AI for Forecasting Demand-Driven Hospital Locations in Germany
- Hierarchical Optimization via LLM-Guided Objective Evolution for Mobility-on-Demand Systems
- Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning
- Simpliflow: A Lightweight Open-Source Framework for Rapid Creation and Deployment of Generative Agentic AI Workflows
- OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
- Extended Triangular Method: A Generalized Algorithm for Contradiction Separation Based Automated Deduction
- Adaptive Selection of Symbolic Languages for Improving LLM Logical Reasoning
- LLMs as Strategic Agents: Beliefs, Best Response Behavior, and Emergent Heuristics
- DRIFT: Decompose, Retrieve, Illustrate, then Formalize Theorems
- The Irrational Machine: Neurosis and the Limits of Algorithmic Safety
- LLM-Empowered Agentic MAC Protocols: A Dynamic Stackelberg Game Approach
- PaperArena: An Evaluation Benchmark for Tool-Augmented Agentic Reasoning on Scientific Literature
- PoU: Proof-of-Use to Counter Tool-Call Hacking in DeepResearch Agents
- Scalable and Explainable Enterprise Knowledge Discovery Using Graph-Centric Hybrid Retrieval
- Video-STR: Reinforcing MLLMs in Video Spatio-Temporal Reasoning with Relation Graph
- Revisiting Model Interpolation for Efficient Reasoning
- FBS Model-based Maintenance Record Accumulation for Failure-Cause Inference in Manufacturing Systems
- Argumentation-Based Explainability for Legal AI: Comparative and Regulatory Perspectives
- Modeling AI-Driven Production and Competitiveness A Multi-Agent Economic Simulation of China and the United States
- Improving AI Efficiency in Data Centres by Power Dynamic Response
- Spec-Driven AI for Science: The ARIA Framework for Automated and Reproducible Data Analysis
- $How^{2}$: How to learn from procedural How-to questions
- Aligning Deep Implicit Preferences by Learning to Reason Defensively
- AI Alignment Strategies from a Risk Perspective: Independent Safety Mechanisms or Shared Failures?
- PADME: Procedure Aware DynaMic Execution
- Evolution in Simulation: AI-Agent School with Dual Memory for High-Fidelity Educational Dynamics
- Automated Skill Decomposition Meets Expert Ontologies: Bridging the Granularity Gap with LLMs
- AI-Driven anemia diagnosis: A review of advanced models and techniques
- From to : Multidimensional Supervision of Reasoning Process for LLM Optimization
- Unifying Deductive and Abductive Reasoning in Knowledge Graphs with Masked Diffusion Model
- Zero Data Retention in LLM-based Enterprise AI Assistants: A Comparative Study of Market Leading Agentic AI Products
- Analyzing and Internalizing Complex Policy Documents for LLM Agents
- Reproducibility: The New Frontier in AI Governance
- Explainability, risk modeling, and segmentation based customer churn analytics for personalized retention in e-commerce
- ParaCook: On Time-Efficient Planning for Multi-Agent Systems
- SR-Scientist: Scientific Equation Discovery With Agentic AI
- Operand Quant: A Single-Agent Architecture for Autonomous Machine Learning Engineering
- Detecting Conspiracy Theory Against COVID-19 Vaccines
- Causal Digital Twins for Cyber-Physical Security: A Framework for Robust Anomaly Detection in Industrial Control Systems
- Toward a Unified Security Framework for AI Agents: Trust, Risk, and Liability
- Hound: Relation-First Knowledge Graphs for Complex-System Reasoning in Security Audits
Research Sources: 954 | Generated: 10/14/2025