AI RESEARCH PAPERS & ACADEMIC SOURCES
- MR-Align: Meta-Reasoning Informed Factuality Alignment for Large Reasoning Models : Abstract: Large reasoning models (LRMs) show strong capabilities in complex reasoning, yet their marginal gains on evidence-dependent factual questions are limited. We find this limitation is partiall...
- Parallel Loop Transformer for Efficient Test-Time Computation Scaling : Abstract: Large Language Models (LLMs) are powerful but often too slow and costly for real-world use during inference. Looped transformers save on parameters by reusing the same weights for multiple c...
- Do Large Language Models Grasp The Grammar? Evidence from Grammar-Book-Guided Probing in Luxembourgish : Abstract: Grammar refers to the system of rules that governs the structural organization and the semantic relations among linguistic units such as sentences, phrases, and words within a given language...
- Seeing Through the MiRAGE: Evaluating Multimodal Retrieval Augmented Generation : Abstract: We introduce MiRAGE, an evaluation framework for retrieval-augmented generation (RAG) from multimodal sources. As audiovisual media becomes a prevalent source of information online, it is es...
- RiddleBench: A New Generative Reasoning Benchmark for LLMs : Abstract: Large Language Models have demonstrated strong performance on many established reasoning benchmarks. However, these benchmarks primarily evaluate structured skills like quantitative problem-...
- Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction : Abstract: Language models generally produce grammatical text, but they are more likely to make errors in certain contexts. Drawing on paradigms from psycholinguistics, we carry out a fine-grained anal...
- SemCoT: Accelerating Chain-of-Thought Reasoning through Semantically-Aligned Implicit Tokens : Abstract: The verbosity of Chain-of-Thought (CoT) reasoning hinders its mass deployment in efficiency-critical applications. Recently, implicit CoT approaches have emerged, which encode reasoning step...
- Language Model Behavioral Phases are Consistent Across Architecture, Training Data, and Scale : Abstract: We show that across architecture (Transformer vs. Mamba vs. RWKV), training dataset (OpenWebText vs. The Pile), and scale (14 million parameters to 12 billion parameters), autoregressive lan...
- POWSM: A Phonetic Open Whisper-Style Speech Foundation Model : Abstract: Recent advances in spoken language processing have led to substantial progress in phonetic tasks such as automatic speech recognition (ASR), phone recognition (PR), grapheme-to-phoneme conve...
- Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech : Abstract: Advancements in spoken language processing have driven the development of spoken language models (SLMs), designed to achieve universal audio understanding by jointly learning text and audio ...
- Can LLMs Estimate Cognitive Complexity of Reading Comprehension Items? : Abstract: Estimating the cognitive complexity of reading comprehension (RC) items is crucial for assessing item difficulty before it is administered to learners. Unlike syntactic and semantic features...
- TOPol: Capturing and Explaining Multidimensional Semantic Polarity Fields and Vectors : Abstract: Traditional approaches to semantic polarity in computational linguistics treat sentiment as a unidimensional scale, overlooking the multidimensional structure of language. This work introduc...
- DEBATE: A Large-Scale Benchmark for Role-Playing LLM Agents in Multi-Agent, Long-Form Debates : Abstract: Accurately modeling opinion change through social interactions is crucial for addressing issues like misinformation and polarization. While role-playing large language models (LLMs) offer a ...
- Pretraining Strategies using Monolingual and Parallel Data for Low-Resource Machine Translation : Abstract: This research article examines the effectiveness of various pretraining strategies for developing machine translation models tailored to low-resource languages. Although this work considers ...
- A Survey on Unlearning in Large Language Models : Abstract: The advancement of Large Language Models (LLMs) has revolutionized natural language processing, yet their training on massive corpora poses significant risks, including the memorization of s...
- Explainable Disentanglement on Discrete Speech Representations for Noise-Robust ASR : Abstract: Discrete audio representations are gaining traction in speech modeling due to their interpretability and compatibility with large language models, but are not always optimized for noisy or r...
- Testing Cross-Lingual Text Comprehension In LLMs Using Next Sentence Prediction : Abstract: While large language models are trained on massive datasets, this data is heavily skewed towards English. Does their impressive performance reflect genuine ability or just this data advantag...
- ProMediate: A Socio-cognitive framework for evaluating proactive agents in multi-party negotiation : Abstract: While Large Language Models (LLMs) are increasingly used in agentic frameworks to assist individual users, there is a growing need for agents that can proactively manage complex, multi-party...
- Adapting Small Language Models to Low-Resource Domains: A Case Study in Hindi Tourism QA : Abstract: Domain-specific question answering in low-resource languages faces two key challenges: scarcity of annotated datasets and limited domain knowledge in general-purpose language models. In this...
- Teaching Sarcasm: Few-Shot Multimodal Sarcasm Detection via Distillation to a Parameter-Efficient Student : Abstract: Multimodal sarcasm detection is challenging, especially in low-resource settings where subtle image-text contradictions are hard to learn due to scarce annotated data, which hinders the mode...
- Parrot: A Training Pipeline Enhances Both Program CoT and Natural Language CoT for Reasoning : Abstract: Natural language chain-of-thought (N-CoT) and Program chain-of-thought (P-CoT) have emerged as two primary paradigms for large language models (LLMs) to solve mathematical reasoning problems...
- CRMWeaver: Building Powerful Business Agent via Agentic RL and Shared Memories : Abstract: Recent years have witnessed the rapid development of LLM-based agents, which shed light on using language agents to solve complex real-world problems. A prominent application lies in busines...
- Not ready for the bench: LLM legal interpretation is unstable and out of step with human judgments : Abstract: Legal interpretation frequently involves assessing how a legal text, as understood by an 'ordinary' speaker of the language, applies to the set of facts characterizing a legal dispute in the...
- CLASS-IT: Conversational and Lecture-Aligned Small-Scale Instruction Tuning for BabyLMs : Abstract: This work investigates whether small-scale LMs can benefit from instruction tuning. We compare conversational and question-answering instruction tuning datasets, applied either in a merged o...
- Monitoring Transformative Technological Convergence Through LLM-Extracted Semantic Entity Triple Graphs : Abstract: Forecasting transformative technologies remains a critical but challenging task, particularly in fast-evolving domains such as Information and Communication Technologies (ICTs). Traditional ...
- Roleplaying with Structure: Synthetic Therapist-Client Conversation Generation from Questionnaires : Abstract: The development of AI for mental health is hindered by a lack of authentic therapy dialogues, due to strict privacy regulations and the fact that clinical sessions were historically rarely r...
- Serve Programs, Not Prompts : Abstract: Current large language model (LLM) serving systems, primarily designed for text completion, are neither efficient nor adaptable for increasingly complex LLM applications due to their inflexi...
- Seeing, Signing, and Saying: A Vision-Language Model-Assisted Pipeline for Sign Language Data Acquisition and Curation from Social Media : Abstract: Most existing sign language translation (SLT) datasets are limited in scale, lack multilingual coverage, and are costly to curate due to their reliance on expert annotation and controlled re...
- Depth and Autonomy: A Framework for Evaluating LLM Applications in Social Science Research : Abstract: Large language models (LLMs) are increasingly utilized by researchers across a wide range of domains, and qualitative social science is no exception; however, this adoption faces persistent ...
- A Critical Study of Automatic Evaluation in Sign Language Translation : Abstract: Automatic evaluation metrics are crucial for advancing sign language translation (SLT). Current SLT evaluation metrics, such as BLEU and ROUGE, are only text-based, and it remains unclear to...
- TwinVoice: A Multi-dimensional Benchmark Towards Digital Twins via LLM Persona Simulation : Abstract: Large Language Models (LLMs) are exhibiting emergent human-like abilities and are increasingly envisioned as the foundation for simulating an individual's communication style, behavioral ten...
- Evaluating the Role of Verifiers in Test-Time Scaling for Legal Reasoning Tasks : Abstract: Test-time scaling (TTS) techniques can improve the performance of large language models (LLMs) at the expense of additional computation and latency. While TTS has proven effective in formal ...
- EHR-R1: A Reasoning-Enhanced Foundational Language Model for Electronic Health Record Analysis : Abstract: Electronic Health Records (EHRs) contain rich yet complex information, and their automated analysis is critical for clinical decision-making. Despite recent advances of large language models...
- PairUni: Pairwise Training for Unified Multimodal Language Models : Abstract: Unified vision-language models (UVLMs) must perform both understanding and generation within a single architecture, but these tasks rely on heterogeneous data and supervision, making it diff...
- Interpreting LLMs as Credit Risk Classifiers: Do Their Feature Explanations Align with Classical ML? : Abstract: Large Language Models (LLMs) are increasingly explored as flexible alternatives to classical machine learning models for classification tasks through zero-shot prompting. However, their suit...
- Scaling Latent Reasoning via Looped Language Models : Abstract: Modern LLMs are trained to "think" primarily via explicit text generation, such as chain-of-thought (CoT), which defers reasoning to post-training and under-leverages pre-training data. We p...
- DiagramEval: Evaluating LLM-Generated Diagrams via Graphs : Abstract: Diagrams play a central role in research papers for conveying ideas, yet they are often notoriously complex and labor-intensive to create. Although diagrams are presented as images, standard...
- Decomposition-Enhanced Training for Post-Hoc Attributions In Language Models : Abstract: Large language models (LLMs) are increasingly used for long-document question answering, where reliable attribution to sources is critical for trust. Existing post-hoc attribution methods wo...
- PANORAMA: A Dataset and Benchmarks Capturing Decision Trails and Rationales in Patent Examination : Abstract: Patent examination remains an ongoing challenge in the NLP literature even after the advent of large language models (LLMs), as it requires an extensive yet nuanced human judgment on whether...
- Conflict Adaptation in Vision-Language Models : Abstract: A signature of human cognitive control is conflict adaptation: improved performance on a high-conflict trial following another high-conflict trial. This phenomenon offers an account for how ...
- More than a Moment: Towards Coherent Sequences of Audio Descriptions : Abstract: Audio Descriptions (ADs) convey essential on-screen information, allowing visually impaired audiences to follow videos. To be effective, ADs must form a coherent sequence that helps listener...
- ZK-SenseLM: Verifiable Large-Model Wireless Sensing with Selective Abstention and Zero-Knowledge Attestation : Abstract: ZK-SenseLM is a secure and auditable wireless sensing framework that pairs a large-model encoder for Wi-Fi channel state information (and optionally mmWave radar or RFID) with a policy-groun...
- Large Language Models for Few-Shot Named Entity Recognition : Abstract: Named entity recognition (NER) is a fundamental task in numerous downstream applications. Recently, researchers have employed pre-trained language models (PLMs) and large language models (LL...
- OpenFactCheck: Building, Benchmarking Customized Fact-Checking Systems and Evaluating the Factuality of Claims and LLMs : Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie in assess...
- RLAIF-V: Open-Source AI Feedback Leads to Super GPT-4V Trustworthiness : Abstract: Traditional feedback learning for hallucination reduction relies on labor-intensive manual labeling or expensive proprietary models. This leaves the community without foundational knowledge ...
- UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking : Abstract: The rapid adoption of Large Language Models (LLMs) has raised important concerns about the factual reliability of their outputs, particularly in low-resource languages such as Urdu. Existing...
- NL-Debugging: Exploiting Natural Language as an Intermediate Representation for Code Debugging : Abstract: Debugging is a critical aspect of LLM's coding ability. Early debugging efforts primarily focused on code-level analysis, which often falls short when addressing complex programming errors t...
- Precise In-Parameter Concept Erasure in Large Language Models : Abstract: Large language models (LLMs) often acquire knowledge during pretraining that is undesirable in downstream deployments, e.g., sensitive information or copyrighted content. Existing approaches...
- LLMs are Better Than You Think: Label-Guided In-Context Learning for Named Entity Recognition : Abstract: In-context learning (ICL) enables large language models (LLMs) to perform new tasks using only a few demonstrations. However, in Named Entity Recognition (NER), existing ICL methods typicall...
- Robust Preference Optimization via Dynamic Target Margins : Abstract: The alignment of Large Language Models (LLMs) is crucial for ensuring their safety and reliability in practical applications. Direct Preference Optimization (DPO) has emerged as an efficient...
- Think Twice Before You Judge: Mixture of Dual Reasoning Experts for Multimodal Sarcasm Detection : Abstract: Multimodal sarcasm detection has attracted growing interest due to the rise of multimedia posts on social media. Understanding sarcastic image-text posts often requires external contextual k...
- Adapter-state Sharing CLIP for Parameter-efficient Multimodal Sarcasm Detection : Abstract: The growing prevalence of multimodal image-text sarcasm on social media poses challenges for opinion mining systems. Existing approaches rely on full fine-tuning of large models, making them...
- Steering Information Utility in Key-Value Memory for Language Model Post-Training : Abstract: Recent advancements in language models (LMs) have marked a shift toward the growing importance of post-training. Yet, post-training approaches such as supervised fine-tuning (SFT) do not gua...
- Can LLMs Outshine Conventional Recommenders? A Comparative Evaluation : Abstract: In recent years, integrating large language models (LLMs) into recommender systems has created new opportunities for improving recommendation quality. However, a comprehensive benchmark is n...
- FPGA-based Lane Detection System incorporating Temperature and Light Control Units : Abstract: Intelligent vehicles are one of the most important outcomes gained from the world tendency toward automation. Applications of IVs, whether in urban roads or robot tracks, do prioritize lane ...
- MCIHN: A Hybrid Network Model Based on Multi-path Cross-modal Interaction for Multimodal Emotion Recognition : Abstract: Multimodal emotion recognition is crucial for future human-computer interaction. However, accurate emotion recognition still faces significant challenges due to differences between different...
- FruitProm: Probabilistic Maturity Estimation and Detection of Fruits and Vegetables : Abstract: Maturity estimation of fruits and vegetables is a critical task for agricultural automation, directly impacting yield prediction and robotic harvesting. Current deep learning approaches pred...
- Proper Body Landmark Subset Enables More Accurate and 5X Faster Recognition of Isolated Signs in LIBRAS : Abstract: This paper investigates the feasibility of using lightweight body landmark detection for the recognition of isolated signs in Brazilian Sign Language (LIBRAS). Although the skeleton-based ap...
- Pixels to Signals: A Real-Time Framework for Traffic Demand Estimation : Abstract: Traffic congestion is becoming a challenge in the rapidly growing urban cities, resulting in increasing delays and inefficiencies within urban transportation systems. To address this issue a...
- VividCam: Learning Unconventional Camera Motions from Virtual Synthetic Videos : Abstract: Although recent text-to-video generative models are getting more capable of following external camera controls, imposed by either text descriptions or camera trajectories, they still struggl...
- IBIS: A Powerful Hybrid Architecture for Human Activity Recognition : Abstract: The increasing interest in Wi-Fi sensing stems from its potential to capture environmental data in a low-cost, non-intrusive way, making it ideal for applications like healthcare, space occu...
- Auto3DSeg for Brain Tumor Segmentation from 3D MRI in BraTS 2023 Challenge : Abstract: In this work, we describe our solution to the BraTS 2023 cluster of challenges using Auto3DSeg from MONAI. We participated in all 5 segmentation challenges, and achieved the 1st place result...
- DRIP: Dynamic patch Reduction via Interpretable Pooling : Abstract: Recently, the advances in vision-language models, including contrastive pretraining and instruction tuning, have greatly pushed the frontier of multimodal AI. However, owing to the large-sca...
- Vision-Language Integration for Zero-Shot Scene Understanding in Real-World Environments : Abstract: Zero-shot scene understanding in real-world settings presents major challenges due to the complexity and variability of natural scenes, where models must recognize new objects, actions, and ...
- Neighborhood Feature Pooling for Remote Sensing Image Classification : Abstract: In this work, we propose neighborhood feature pooling (NFP) as a novel texture feature extraction method for remote sensing image classification. The NFP layer captures relationships between...
- PSTF-AttControl: Per-Subject-Tuning-Free Personalized Image Generation with Controllable Face Attributes : Abstract: Recent advancements in personalized image generation have significantly improved facial identity preservation, particularly in fields such as entertainment and social media. However, existin...
- Visual Diversity and Region-aware Prompt Learning for Zero-shot HOI Detection : Abstract: Zero-shot Human-Object Interaction detection aims to localize humans and objects in an image and recognize their interaction, even when specific verb-object pairs are unseen during training....
- AtlasGS: Atlanta-world Guided Surface Reconstruction with Implicit Structured Gaussians : Abstract: 3D reconstruction of indoor and urban environments is a prominent research topic with various downstream applications. However, existing geometric priors for addressing low-texture regions i...
- Region-CAM: Towards Accurate Object Regions in Class Activation Maps for Weakly Supervised Learning Tasks : Abstract: Class Activation Mapping (CAM) methods are widely applied in weakly supervised learning tasks due to their ability to highlight object regions. However, conventional CAM methods highlight on...
- DINO-YOLO: Self-Supervised Pre-training for Data-Efficient Object Detection in Civil Engineering Applications : Abstract: Object detection in civil engineering applications is constrained by limited annotated data in specialized domains. We introduce DINO-YOLO, a hybrid architecture combining YOLOv12 with DINOv...
- Revisiting Reconstruction-based AI-generated Image Detection: A Geometric Perspective : Abstract: The rise of generative Artificial Intelligence (AI) has made detecting AI-generated images a critical challenge for ensuring authenticity. Existing reconstruction-based methods lack theoreti...
- EA3D: Online Open-World 3D Object Extraction from Streaming Videos : Abstract: Current 3D scene understanding methods are limited by offline-collected multi-view data or pre-constructed 3D geometry. In this paper, we present ExtractAnything3D (EA3D), a unified online f...
- Towards Real-Time Inference of Thin Liquid Film Thickness Profiles from Interference Patterns Using Vision Transformers : Abstract: Thin film interferometry is a powerful technique for non-invasively measuring liquid film thickness with applications in ophthalmology, but its clinical translation is hindered by the challe...
- Target-Guided Bayesian Flow Networks for Quantitatively Constrained CAD Generation : Abstract: Deep generative models, such as diffusion models, have shown promising progress in image generation and audio generation via simplified continuity assumptions. However, the development of ge...
- $D^2GS$: Dense Depth Regularization for LiDAR-free Urban Scene Reconstruction : Abstract: Recently, Gaussian Splatting (GS) has shown great potential for urban scene reconstruction in the field of autonomous driving. However, current urban scene reconstruction methods often depen...
- Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation : Abstract: Prevalent semantic segmentation methods generally adopt a vanilla classifier to categorize each pixel into specific classes. Although such a classifier learns global information from the t...
- Test-Time Adaptive Object Detection with Foundation Model : Abstract: In recent years, test-time adaptive object detection has attracted increasing attention due to its unique advantages in online domain adaptation, which aligns more closely with real-world ap...
- Mask-Robust Face Verification for Online Learning via YOLOv5 and Residual Networks : Abstract: In the contemporary landscape, the fusion of information technology and the rapid advancement of artificial intelligence have ushered school education into a transformative phase characteriz...
- AI-Powered Early Detection of Critical Diseases using Image Processing and Audio Analysis : Abstract: Early diagnosis of critical diseases can significantly improve patient survival and reduce treatment costs. However, existing diagnostic techniques are often costly, invasive, and inaccessib...
- U-CAN: Unsupervised Point Cloud Denoising with Consistency-Aware Noise2Noise Matching : Abstract: Point clouds captured by scanning sensors are often perturbed by noise, which have a highly negative impact on downstream tasks (e.g. surface reconstruction and shape understanding). Previou...
- MSF-Net: Multi-Stage Feature Extraction and Fusion for Robust Photometric Stereo : Abstract: Photometric stereo is a technique aimed at determining surface normals through the utilization of shading cues derived from images taken under different lighting conditions. However, existin...
- Aligning What You Separate: Denoised Patch Mixing for Source-Free Domain Adaptation in Medical Image Segmentation : Abstract: Source-Free Domain Adaptation (SFDA) is emerging as a compelling solution for medical image segmentation under privacy constraints, yet current approaches often ignore sample difficulty and ...
- Balanced conic rectified flow : Abstract: Rectified flow is a generative model that learns smooth transport mappings between two distributions through an ordinary differential equation (ODE). Unlike diffusion-based generative models...
- DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis : Abstract: Recent advances in deep generative models have made it easier to manipulate face videos, raising significant concerns about their potential misuse for fraud and misinformation. Existing dete...
- VADB: A Large-Scale Video Aesthetic Database with Professional and Multi-Dimensional Annotations : Abstract: Video aesthetic assessment, a vital area in multimedia computing, integrates computer vision with human cognition. Its progress is limited by the lack of standardized datasets and robust mod...
- Mapping and Classification of Trees Outside Forests using Deep Learning : Abstract: Trees Outside Forests (TOF) play an important role in agricultural landscapes by supporting biodiversity, sequestering carbon, and regulating microclimates. Yet, most studies have treated TO...
- RT-DETRv4: Painlessly Furthering Real-Time Object Detection with Vision Foundation Models : Abstract: Real-time object detection has achieved substantial progress through meticulously designed architectures and optimization strategies. However, the pursuit of high-speed inference via lightwe...
- LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation : Abstract: We propose LangHOPS, the first Multimodal Large Language Model (MLLM) based framework for open-vocabulary object-part instance segmentation. Given an image, LangHOPS can jointly detect and s...
- Diffusion-Driven Progressive Target Manipulation for Source-Free Domain Adaptation : Abstract: Source-free domain adaptation (SFDA) is a challenging task that tackles domain shifts using only a pre-trained source model and unlabeled target data. Existing SFDA methods are restricted by...
- GaTector+: A Unified Head-free Framework for Gaze Object and Gaze Following Prediction : Abstract: Gaze object detection and gaze following are fundamental tasks for interpreting human gaze behavior or intent. However, most previous methods usually solve these two tasks separately, and th...
- Seeing Clearly and Deeply: An RGBD Imaging Approach with a Bio-inspired Monocentric Design : Abstract: Achieving high-fidelity, compact RGBD imaging presents a dual challenge: conventional compact optics struggle with RGB sharpness across the entire depth-of-field, while software-only Monocul...
- Prototype-Driven Adaptation for Few-Shot Object Detection : Abstract: Few-shot object detection (FSOD) often suffers from base-class bias and unstable calibration when only a few novel samples are available. We propose Prototype-Driven Alignment (PDA), a light...
- StreamingCoT: A Dataset for Temporal Dynamics and Multimodal Chain-of-Thought Reasoning in Streaming VideoQA : Abstract: The rapid growth of streaming video applications demands multimodal models with enhanced capabilities for temporal dynamics understanding and complex reasoning. However, current Video Questi...
- Informative Sample Selection Model for Skeleton-based Action Recognition with Limited Training Samples : Abstract: Skeleton-based human action recognition aims to classify human skeletal sequences, which are spatiotemporal representations of actions, into predefined categories. To reduce the reliance on ...
- Instance-Level Composed Image Retrieval : Abstract: The progress of composed image retrieval (CIR), a popular research direction in image retrieval, where a combined visual and textual query is used, is held back by the absence of high-qualit...
- SPADE: Sparsity Adaptive Depth Estimator for Zero-Shot, Real-Time, Monocular Depth Estimation in Underwater Environments : Abstract: Underwater infrastructure requires frequent inspection and maintenance due to harsh marine conditions. Current reliance on human divers or remotely operated vehicles is limited by perceptual...
- Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks : Abstract: Humans possess spatial reasoning abilities that enable them to understand spaces through multimodal observations, such as vision and sound. Large multimodal reasoning models extend these abi...
- FreeArt3D: Training-Free Articulated Object Generation using 3D Diffusion : Abstract: Articulated 3D objects are central to many applications in robotics, AR/VR, and animation. Recent approaches to modeling such objects either rely on optimization-based reconstruction pipelin...
- VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning : Abstract: Visual effects (VFX) are crucial to the expressive power of digital media, yet their creation remains a major challenge for generative AI. Prevailing methods often rely on the one-LoRA-per-e...
- Resi-VidTok: An Efficient and Decomposed Progressive Tokenization Framework for Ultra-Low-Rate and Lightweight Video Transmission : Abstract: Real-time transmission of video over wireless networks remains highly challenging, even with advanced deep models, particularly under severe channel conditions such as limited bandwidth and ...
- Functional correspondence by matrix completion : Abstract: In this paper, we consider the problem of finding dense intrinsic correspondence between manifolds using the recently introduced functional framework. We pose the functional correspondence p...
- Single Image Estimation of Cell Migration Direction by Deep Circular Regression : Abstract: In this paper, we address the problem of estimating the migration direction of cells based on a single image. A solution to this problem lays the foundation for a variety of applications tha...
- U-DECN: End-to-End Underwater Object Detection ConvNet with Improved DeNoising Training : Abstract: Underwater object detection has higher requirements of running speed and deployment efficiency for the detector due to its specific environmental challenges. NMS of two- or one-stage object ...
- ScribbleVS: Scribble-Supervised Medical Image Segmentation via Dynamic Competitive Pseudo Label Selection : Abstract: In clinical medicine, precise image segmentation can provide substantial support to clinicians. However, obtaining high-quality segmentation typically demands extensive pixel-level annotatio...
- Simulating Automotive Radar with Lidar and Camera Inputs : Abstract: Low-cost millimeter automotive radar has received more and more attention due to its ability to handle adverse weather and lighting conditions in autonomous driving. However, the lack of qua...
- Open3D-VQA: A Benchmark for Comprehensive Spatial Reasoning with Multimodal Large Language Model in Open Space : Abstract: Spatial reasoning is a fundamental capability of multimodal large language models (MLLMs), yet their performance in open aerial environments remains underexplored. In this work, we present O...
- XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark : Abstract: Document Reading Order Recovery is a fundamental task in document image understanding, playing a pivotal role in enhancing Retrieval-Augmented Generation (RAG) and serving as a critical prep...
- DPMambaIR: All-in-One Image Restoration via Degradation-Aware Prompt State Space Model : Abstract: All-in-One image restoration aims to address multiple image degradation problems using a single model, offering a more practical and versatile solution compared to designing dedicated models...
- MagicPortrait: Temporally Consistent Face Reenactment with 3D Geometric Guidance : Abstract: In this study, we propose a method for video face reenactment that integrates a 3D face parametric model into a latent diffusion framework, aiming to improve shape consistency and motion con...
- FutureSightDrive: Thinking Visually with Spatio-Temporal CoT for Autonomous Driving : Abstract: Vision-Language-Action (VLA) models are increasingly used for end-to-end driving due to their world knowledge and reasoning ability. Most prior work, however, inserts textual chains-of-thoug...
- HF-VTON: High-Fidelity Virtual Try-On via Consistent Geometric and Semantic Alignment : Abstract: Virtual try-on technology has become increasingly important in the fashion and retail industries, enabling the generation of high-fidelity garment images that adapt seamlessly to target huma...
- Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness : Abstract: Convolutional neural networks (CNNs) trained on object recognition achieve high task performance but continue to exhibit vulnerability under a range of visual perturbations and out-of-domain...
- HAIF-GS: Hierarchical and Induced Flow-Guided Gaussian Splatting for Dynamic Scene : Abstract: Reconstructing dynamic 3D scenes from monocular videos remains a fundamental challenge in 3D vision. While 3D Gaussian Splatting (3DGS) achieves real-time rendering in static settings, exten...
- FOCUS: Internal MLLM Representations for Efficient Fine-Grained Visual Question Answering : Abstract: While Multimodal Large Language Models (MLLMs) offer strong perception and reasoning capabilities for image-text input, Visual Question Answering (VQA) focusing on small image details still ...
- MILo: Mesh-In-the-Loop Gaussian Splatting for Detailed and Efficient Surface Reconstruction : Abstract: While recent advances in Gaussian Splatting have enabled fast reconstruction of high-quality 3D scenes from images, extracting accurate surface meshes remains a challenge. Current approaches...
- RoboCerebra: A Large-scale Benchmark for Long-horizon Robotic Manipulation Evaluation : Abstract: Recent advances in vision-language models (VLMs) have enabled instruction-conditioned robotic systems with improved generalization. However, most existing work focuses on reactive System 1 p...
- CANDI: Hybrid Discrete-Continuous Diffusion Models : Abstract: While continuous diffusion has shown remarkable success in continuous domains such as image generation, its direct application to discrete data has underperformed compared to purely discrete...
- Towards Scaling Deep Neural Networks with Predictive Coding: Theory and Practice : Abstract: Backpropagation (BP) is the standard algorithm for training the deep neural networks that power modern artificial intelligence including large language models. However, BP is energy ineffici...
- Differential Privacy as a Perk: Federated Learning over Multiple-Access Fading Channels with a Multi-Antenna Base Station : Abstract: Federated Learning (FL) is a distributed learning paradigm that preserves privacy by eliminating the need to exchange raw data during training. In its prototypical edge instantiation with un...
- Robust Fitted-Q-Evaluation and Iteration under Sequentially Exogenous Unobserved Confounders : Abstract: Offline reinforcement learning is important in domains such as medicine, economics, and e-commerce where online experimentation is costly, dangerous or unethical, and where the true model is...
- MP-FVM: Enhancing Finite Volume Method for Water Infiltration Modeling in Unsaturated Soils via Message-passing Encoder-decoder Network : Abstract: The spatiotemporal water flow dynamics in unsaturated soils can generally be modeled by the Richards equation. To overcome the computational challenges associated with solving this highly no...
- Tracking the Median of Gradients with a Stochastic Proximal Point Method : Abstract: There are several applications of stochastic optimization where one can benefit from a robust estimate of the gradient. For example, domains such as distributed learning with corrupted nodes...
- Exploring End-to-end Differentiable Neural Charged Particle Tracking -- A Loss Landscape Perspective : Abstract: Measurement and analysis of high energetic particles for scientific, medical or industrial applications is a complex procedure, requiring the design of sophisticated detector and data proces...
- S'MoRE: Structural Mixture of Residual Experts for Parameter-Efficient LLM Fine-tuning : Abstract: Fine-tuning pre-trained large language models (LLMs) presents a dual challenge of balancing parameter efficiency and model capacity. Existing methods like low-rank adaptations (LoRA) are eff...
- Efficient Adaptive Experimentation with Noncompliance : Abstract: We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged -- rather than directly assigned -- via a binary instrume...
- Symplectic Generative Networks (SGNs): A Hamiltonian Framework for Invertible Deep Generative Modeling : Abstract: We introduce the \emph{Symplectic Generative Network (SGN)}, a deep generative model that leverages Hamiltonian mechanics to construct an invertible, volume-preserving mapping between a late...
- OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents : Abstract: Computer use agents are LLM-based agents that can directly interact with a graphical user interface, by processing screenshots or accessibility trees. While these systems are gaining popular...
- Online Adaptation for Flying Quadrotors in Tight Formations : Abstract: The task of flying in tight formations is challenging for teams of quadrotors because the complex aerodynamic wake interactions can destabilize individual team members as well as the team. F...
- Thompson Sampling in Function Spaces via Neural Operators : Abstract: We propose an extension of Thompson sampling to optimization problems over function spaces where the objective is a known functional of an unknown operator's output. We assume that queries t...
- An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting : Abstract: Radio frequency (RF) fingerprinting, which extracts unique hardware imperfections of radio devices, has emerged as a promising physical-layer device identification mechanism in zero trust ar...
- Iti-Validator: A Guardrail Framework for Validating and Correcting LLM-Generated Itineraries : Abstract: The rapid advancement of Large Language Models (LLMs) has enabled them to generate complex, multi-step plans and itineraries. However, these generated plans often lack temporal and spatial c...
- Cross-Lingual Summarization as a Black-Box Watermark Removal Attack : Abstract: Watermarking has been proposed as a lightweight mechanism to identify AI-generated text, with schemes typically relying on perturbations to token distributions. While prior work shows that p...
- Who You Are Matters: Bridging Topics and Social Roles via LLM-Enhanced Logical Recommendation : Abstract: Recommender systems filter contents/items valuable to users by inferring preferences from user features and historical behaviors. Mainstream approaches follow the learning-to-rank paradigm, ...
- A method for the systematic generation of graph XAI benchmarks via Weisfeiler-Leman coloring : Abstract: Graph neural networks have become the de facto model for learning from structured data. However, the decision-making process of GNNs remains opaque to the end user, which undermines their us...
- Continuous Domain Generalization : Abstract: Real-world data distributions often shift continuously across multiple latent factors such as time, geography, and socioeconomic contexts. However, existing domain generalization approaches ...
- Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems : Abstract: Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strengt...
- Artificial Intelligence for Direct Prediction of Molecular Dynamics Across Chemical Space : Abstract: Molecular dynamics (MD) is a powerful tool for exploring the behavior of atomistic systems, but its reliance on sequential numerical integration limits simulation efficiency. We present a no...
- SATURN: SAT-based Reinforcement Learning to Unleash Language Model Reasoning : Abstract: How to design reinforcement learning (RL) tasks that effectively unleash the reasoning capability of large language models (LLMs) remains an open question. Existing RL tasks (e.g., math, pro...
- Dynamic Risk Assessments for Offensive Cybersecurity Agents : Abstract: Foundation models are increasingly becoming better autonomous programmers, raising the prospect that they could also automate dangerous offensive cyber-operations. Current frontier model aud...
- InfoChartQA: A Benchmark for Multimodal Question Answering on Infographic Charts : Abstract: Understanding infographic charts with design-driven visual elements (e.g., pictograms, icons) requires both visual recognition and reasoning, posing challenges for multimodal large language ...
- Learning-Augmented Online Bipartite Fractional Matching : Abstract: Online bipartite matching is a fundamental problem in online optimization, extensively studied both in its integral and fractional forms due to its theoretical significance and practical app...
- WXImpactBench: A Disruptive Weather Impact Understanding Benchmark for Evaluating Large Language Models : Abstract: Climate change adaptation requires the understanding of disruptive weather impacts on society, where large language models (LLMs) might be applicable. However, their effectiveness is under-e...
- Probabilistic Kernel Function for Fast Angle Testing : Abstract: In this paper, we study the angle testing problem in the context of similarity search in high-dimensional Euclidean spaces and propose two projection-based probabilistic kernel functions, on...
- Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling : Abstract: We present LrcSSM, a $\textit{non-linear}$ recurrent model that processes long sequences as fast as today's linear state-space layers. By forcing its Jacobian matrix to be diagonal, the full...
- Decom-Renorm-Merge: Model Merging on the Right Space Improves Multitasking : Abstract: In the era of large-scale training, model merging has evolved into a tool for creating multitasking models efficiently. It enables the knowledge of models to be fused, without the need for h...
- Learning with Calibration: Exploring Test-Time Computing of Spatio-Temporal Forecasting : Abstract: Spatio-temporal forecasting is crucial in many domains, such as transportation, meteorology, and energy. However, real-world scenarios frequently present challenges such as signal anomalies,...
- Doubly Robust Alignment for Large Language Models : Abstract: This paper studies reinforcement learning from human feedback (RLHF) for aligning large language models with human preferences. While RLHF has demonstrated promising results, many algorithms...
- Purifying Shampoo: Investigating Shampoo's Heuristics by Decomposing its Preconditioner : Abstract: The recent success of Shampoo in the AlgoPerf contest has sparked renewed interest in Kronecker-factorization-based optimization algorithms for training neural networks. Despite its success,...
- DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO : Abstract: Recent works have demonstrated the effectiveness of reinforcement learning (RL)-based post-training for enhancing the reasoning capabilities of large language models (LLMs). In particular, G...
- Reinforcement Learning Teachers of Test Time Scaling : Abstract: Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at ini...
- Intelligent Design 4.0: Paradigm Evolution Toward the Agentic AI Era : Abstract: Research and practice in Intelligent Design (ID) have significantly enhanced engineering innovation, efficiency, quality, and productivity over recent decades, fundamentally reshaping how en...
- Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization : Abstract: Language models can retain dangerous knowledge and skills even after extensive safety fine-tuning, posing both misuse and misalignment risks. Recent studies show that even specialized unlear...
- TabArena: A Living Benchmark for Machine Learning on Tabular Data : Abstract: With the growing popularity of deep learning and foundation models for tabular data, the need for standardized and reliable benchmarks is higher than ever. However, current benchmarks are st...
- Many LLMs Are More Utilitarian Than One : Abstract: Moral judgment is integral to large language models' (LLMs) social reasoning. As multi-agent systems gain prominence, it becomes crucial to understand how LLMs function when collaborating co...
- Differential Mamba : Abstract: Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hall...
- Flow matching for reaction pathway generation : Abstract: Elucidating reaction mechanisms hinges on efficiently generating transition states (TSs), products, and complete reaction networks. Recent generative models, such as diffusion models for TS ...
- Exploring the In-Context Learning Capabilities of LLMs for Money Laundering Detection in Financial Graphs : Abstract: The complexity and interconnectivity of entities involved in money laundering demand investigative reasoning over graph-structured data. This paper explores the use of large language models ...
- Privacy-Preserving Personalization in Education: A Federated Recommender System for Student Performance Prediction : Abstract: The increasing digitalization of education presents unprecedented opportunities for data-driven personalization, but it also introduces significant challenges to student data privacy. Conven...
- From Linear to Nonlinear: Provable Weak-to-Strong Generalization through Feature Learning : Abstract: Weak-to-strong generalization refers to the phenomenon where a stronger model trained under supervision from a weaker one can outperform its teacher. While prior studies aim to explain this ...
- Augmenting Biological Fitness Prediction Benchmarks with Landscapes Features from GraphFLA : Abstract: Machine learning models increasingly map biological sequence-fitness landscapes to predict mutational effects. Effective evaluation of these models requires benchmarks curated from empirical...
- Send Less, Save More: Energy-Efficiency Benchmark of Embedded CNN Inference vs. Data Transmission in IoT : Abstract: The integration of the Internet of Things (IoT) and Artificial Intelligence offers significant opportunities to enhance our ability to monitor and address ecological changes. As environmenta...
- Aggregation Hides Out-of-Distribution Generalization Failures from Spurious Correlations : Abstract: Benchmarks for out-of-distribution (OOD) generalization frequently show a strong positive correlation between in-distribution (ID) and OOD accuracy across models, termed "accuracy-on-the-lin...
- Adaptive EEG-based stroke diagnosis with a GRU-TCN classifier and deep Q-learning thresholding : Abstract: Rapid triage of suspected stroke needs accurate, bedside-deployable tools; EEG is promising but underused at first contact. We present an adaptive multitask EEG classifier that converts 32-c...
- Topic Analysis with Side Information: A Neural-Augmented LDA Approach : Abstract: Traditional topic models such as Latent Dirichlet Allocation (LDA) have been widely used to uncover latent structures in text corpora, but they often struggle to integrate auxiliary informat...
- WBT-BGRL: A Non-Contrastive Weighted Bipartite Link Prediction Model for Inductive Learning : Abstract: Link prediction in bipartite graphs is crucial for applications like recommendation systems and failure detection, yet it is less studied than in monopartite graphs. Contrastive methods stru...
- Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought : Abstract: Recent large language models (LLMs) can generate long Chain-of-Thought (CoT) at test time, enabling them to solve complex tasks. These reasoning steps in CoT are often assumed as a faithful ...
- Resource-Efficient and Robust Inference of Deep and Bayesian Neural Networks on Embedded and Analog Computing Platforms : Abstract: While modern machine learning has transformed numerous application domains, its growing computational demands increasingly constrain scalability and efficiency, particularly on embedded and ...
- Conformational Rank Conditioned Committees for Machine Learning-Assisted Directed Evolution : Abstract: Machine Learning-assisted directed evolution (MLDE) is a powerful tool for efficiently navigating antibody fitness landscapes. Many structure-aware MLDE pipelines rely on a single conformati...
- Strategic inputs: feature selection from game-theoretic perspective : Abstract: The exponential growth of data volumes has led to escalating computational costs in machine learning model training. However, many features fail to contribute positively to model performance...
- Enhancing Hierarchical Reinforcement Learning through Change Point Detection in Time Series : Abstract: Hierarchical Reinforcement Learning (HRL) enhances the scalability of decision-making in long-horizon tasks by introducing temporal abstraction through options-policies that span multiple ti...
- What Really Matters in Matrix-Whitening Optimizers? : Abstract: A range of recent optimizers have emerged that approximate the same "matrix-whitening" transformation in various ways. In this work, we systematically deconstruct such optimizers, aiming to ...
- Disentangling Shared and Private Neural Dynamics with SPIRE: A Latent Modeling Framework for Deep Brain Stimulation : Abstract: Disentangling shared network-level dynamics from region-specific activity is a central challenge in modeling multi-region neural data. We introduce SPIRE (Shared-Private Inter-Regional Encod...
- Machine Learning based Analysis for Radiomics Features Robustness in Real-World Deployment Scenarios : Abstract: Radiomics-based machine learning models show promise for clinical decision support but are vulnerable to distribution shifts caused by variations in imaging protocols, positioning, and segme...
- Graph Distance Based on Cause-Effect Estimands with Latents : Abstract: Causal discovery aims to recover graphs that represent causal relations among given variables from observations, and new methods are constantly being proposed. Increasingly, the community ra...
- Dynamically Weighted Momentum with Adaptive Step Sizes for Efficient Deep Network Training : Abstract: Within the current sphere of deep learning research, despite the extensive application of optimization algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (Ad...
- Training Across Reservoirs: Using Numerical Differentiation To Couple Trainable Networks With Black-Box Reservoirs : Abstract: We introduce Bounded Numerical Differentiation (BOND), a perturbative method for estimating partial derivatives across network structures with inaccessible computational graphs. BOND demonst...
- Continual Low-Rank Adapters for LLM-based Generative Recommender Systems : Abstract: While large language models (LLMs) achieve strong performance in recommendation, they face challenges in continual learning as users, items, and user preferences evolve over time. Existing L...
- Shift is Good: Mismatched Data Mixing Improves Test Performance : Abstract: We consider training and testing on mixture distributions with different training and test proportions. We show that in many settings, and in some sense generically, distribution shift can b...
- A Unified Bilevel Model for Adversarial Learning and A Case Study : Abstract: Adversarial learning has been attracting more and more attention thanks to the fast development of machine learning and artificial intelligence. However, due to the complicated structure of ...
- An Analysis of Causal Effect Estimation using Outcome Invariant Data Augmentation : Abstract: The technique of data augmentation (DA) is often used in machine learning for regularization purposes to better generalize under i.i.d. settings. In this work, we present a unifying framewor...
- Machine Learning Guided Optimal Transmission Switching to Mitigate Wildfire Ignition Risk : Abstract: To mitigate acute wildfire ignition risks, utilities de-energize power lines in high-risk areas. The Optimal Power Shutoff (OPS) problem optimizes line energization statuses to manage wildfi...
- Machine Learning and CPU (Central Processing Unit) Scheduling Co-Optimization over a Network of Computing Centers : Abstract: In the rapidly evolving research on artificial intelligence (AI) the demand for fast, computationally efficient, and scalable solutions has increased in recent years. The problem of optimizi...
- Selective Learning for Deep Time Series Forecasting : Abstract: Benefiting from high capacity for capturing complex temporal patterns, deep learning (DL) has significantly advanced time series forecasting (TSF). However, deep models tend to suffer from s...
- BSFA: Leveraging the Subspace Dichotomy to Accelerate Neural Network Training : Abstract: Recent studies \citep{gur2018gradient,song2024does, wen2024understanding} highlight a fundamental dichotomy in deep learning optimization: Although parameter updates along the top eigendirec...
- On the Stability of Neural Networks in Deep Learning : Abstract: Deep learning has achieved remarkable success across a wide range of tasks, but its models often suffer from instability and vulnerability: small changes to the input may drastically affect ...
- Hierarchical Physics-Embedded Learning for Spatiotemporal Dynamical Systems : Abstract: Modeling complex spatiotemporal dynamics, particularly in far-from-equilibrium systems, remains a grand challenge in science. The governing partial differential equations (PDEs) for these sy...
- CDFlow: Building Invertible Layers with Circulant and Diagonal Matrices : Abstract: Normalizing flows are deep generative models that enable efficient likelihood estimation and sampling through invertible transformations. A key challenge is to design linear layers that enha...
- Beyond Leakage and Complexity: Towards Realistic and Efficient Information Cascade Prediction : Abstract: Information cascade popularity prediction is a key problem in analyzing content diffusion in social networks. However, current related works suffer from three critical limitations: (1) tempo...
- Analysis of Semi-Supervised Learning on Hypergraphs : Abstract: Hypergraphs provide a natural framework for modeling higher-order interactions, yet their theoretical underpinnings in semi-supervised learning remain limited. We provide an asymptotic consi...
- Parameter Averaging in Link Prediction : Abstract: Ensemble methods are widely employed to improve generalization in machine learning. This has also prompted the adoption of ensemble learning for the knowledge graph embedding (KGE) models in...
- A Deep Learning Framework for Multi-Operator Learning: Architectures and Approximation Theory : Abstract: While many problems in machine learning focus on learning mappings between finite-dimensional spaces, scientific applications require approximating mappings between function spaces, i.e., op...
- Gradient-Weight Alignment as a Train-Time Proxy for Generalization in Classification Tasks : Abstract: Robust validation metrics remain essential in contemporary deep learning, not only to detect overfitting and poor generalization, but also to monitor training dynamics. In the supervised cla...
- Right for the Right Reasons: Avoiding Reasoning Shortcuts via Prototypical Neurosymbolic AI : Abstract: Neurosymbolic AI is growing in popularity thanks to its ability to combine neural perception and symbolic reasoning in end-to-end trainable models. However, recent findings reveal these are ...
- Support Vector Machine-Based Burnout Risk Prediction with an Interactive Interface for Organizational Use : Abstract: Burnout is a psychological syndrome marked by emotional exhaustion, depersonalization, and reduced personal accomplishment, with a significant impact on individual well-being and organizatio...
- Transformers Provably Learn Directed Acyclic Graphs via Kernel-Guided Mutual Information : Abstract: Uncovering hidden graph structures underlying real-world data is a critical challenge with broad applications across scientific domains. Recently, transformer-based models leveraging the att...
- A Framework for Bounding Deterministic Risk with PAC-Bayes: Applications to Majority Votes : Abstract: PAC-Bayes is a popular and efficient framework for obtaining generalization guarantees in situations involving uncountable hypothesis spaces. Unfortunately, in its classical formulation, it ...
- Perturbation Bounds for Low-Rank Inverse Approximations under Noise : Abstract: Low-rank pseudoinverses are widely used to approximate matrix inverses in scalable machine learning, optimization, and scientific computing. However, real-world matrices are often observed w...
- Generalized Sobolev IPM for Graph-Based Measures : Abstract: We study the Sobolev IPM problem for measures supported on a graph metric space, where critic function is constrained to lie within the unit ball defined by Sobolev norm. While Le et al. (20...
- Feedback Alignment Meets Low-Rank Manifolds: A Structured Recipe for Local Learning : Abstract: Training deep neural networks (DNNs) with backpropagation (BP) achieves state-of-the-art accuracy but requires global error propagation and full parameterization, leading to substantial memo...
- Uncertainty Quantification for Regression: A Unified Framework based on kernel scores : Abstract: Regression tasks, notably in safety-critical domains, require proper uncertainty quantification, yet the literature remains largely classification-focused. In this light, we introduce a fami...
- Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy : Abstract: A central challenge in machine learning is to understand how noise or measurement errors affect low-rank approximations, particularly in the spectral norm. This question is especially import...
- Mechanistic Interpretability of RNNs emulating Hidden Markov Models : Abstract: Recurrent neural networks (RNNs) provide a powerful approach in neuroscience to infer latent dynamics in neural populations and to generate hypotheses about the neural computations underlyin...
- Convolutional Spiking-based GRU Cell for Spatio-temporal Data : Abstract: Spike-based temporal messaging enables SNNs to efficiently process both purely temporal and spatio-temporal time-series or event-driven data. Combining SNNs with Gated Recurrent Units (GRUs)...
- MLPrE -- A tool for preprocessing and exploratory data analysis prior to machine learning model construction : Abstract: With the recent growth of Deep Learning for AI, there is a need for tools to meet the demand of data flowing into those models. In some cases, source data may exist in multiple formats, and ...
- Synthetic Data Reveals Generalization Gaps in Correlated Multiple Instance Learning : Abstract: Multiple instance learning (MIL) is often used in medical imaging to classify high-resolution 2D images by processing patches or classify 3D volumes by processing slices. However, convention...
- Neural Stochastic Flows: Solver-Free Modelling and Inference for SDE Solutions : Abstract: Stochastic differential equations (SDEs) are well suited to modelling noisy and irregularly sampled time series found in finance, physics, and machine learning. Traditional approaches requir...
- Re-evaluating sample efficiency in de novo molecule generation : Abstract: De novo molecule generation can suffer from data inefficiency; requiring large amounts of training data or many sampled data points to conduct objective optimization. The latter is a particu...
- Stiff Circuit System Modeling via Transformer : Abstract: Accurate and efficient circuit behavior modeling is a cornerstone of modern electronic design automation. Among different types of circuits, stiff circuits are challenging to model using pre...
- Spectral functions in Minkowski quantum electrodynamics from neural reconstruction: Benchmarking against dispersive Dyson--Schwinger integral equations : Abstract: A Minkowskian physics-informed neural network approach (M--PINN) is formulated to solve the Dyson--Schwinger integral equations (DSE) of quantum electrodynamics (QED) directly in Minkowski s...
- Constructive Lyapunov Functions via Topology-Preserving Neural Networks : Abstract: We prove that ONN achieves order-optimal performance on convergence rate ($\mu \propto \lambda_2$), edge efficiency ($E = N$ for minimal connectivity $k = 2$), and computational complexity (...
- Decoding non-invasive brain activity with novel deep-learning approaches : Abstract: This thesis delves into the world of non-invasive electrophysiological brain signals like electroencephalography (EEG) and magnetoencephalography (MEG), focusing on modelling and decoding su...
- DrivingScene: A Multi-Task Online Feed-Forward 3D Gaussian Splatting Method for Dynamic Driving Scenes : Abstract: Real-time, high-fidelity reconstruction of dynamic driving scenes is challenged by complex dynamics and sparse views, with prior methods struggling to balance quality and efficiency. We prop...
- StrikeWatch: Wrist-worn Gait Recognition with Compact Time-series Models on Low-power FPGAs : Abstract: Running offers substantial health benefits, but improper gait patterns can lead to injuries, particularly without expert feedback. While prior gait analysis systems based on cameras, insoles...
- Comparative Analysis of Data Augmentation for Clinical ECG Classification with STAR : Abstract: Clinical 12-lead ECG classification remains difficult because of diverse recording conditions, overlapping pathologies, and pronounced label imbalance hinder generalization, while unconstrai...
- Certainty in Uncertainty: Reasoning over Uncertain Knowledge Graphs with Statistical Guarantees : Abstract: Uncertain knowledge graph embedding (UnKGE) methods learn vector representations that capture both structural and uncertainty information to predict scores of unseen triples. However, existi...
- Point-level Uncertainty Evaluation of Mobile Laser Scanning Point Clouds : Abstract: Reliable quantification of uncertainty in Mobile Laser Scanning (MLS) point clouds is essential for ensuring the accuracy and credibility of downstream applications such as 3D mapping, model...
- CFL-SparseMed: Communication-Efficient Federated Learning for Medical Imaging with Top-k Sparse Updates : Abstract: Secure and reliable medical image classification is crucial for effective patient treatment, but centralized models face challenges due to data and privacy concerns. Federated Learning (FL) ...
- Sub-microsecond Transformers for Jet Tagging on FPGAs : Abstract: We present the first sub-microsecond transformer implementation on an FPGA achieving competitive performance for state-of-the-art high-energy physics benchmarks. Transformers have shown exce...
- A Re-node Self-training Approach for Deep Graph-based Semi-supervised Classification on Multi-view Image Data : Abstract: Recently, graph-based semi-supervised learning and pseudo-labeling have gained attention due to their effectiveness in reducing the need for extensive data annotations. Pseudo-labeling uses ...
- Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases : Abstract: Privacy concerns have become increasingly critical in modern AI and data science applications, where sensitive information is collected, analyzed, and shared across diverse domains such as h...
- Tree Ensemble Explainability through the Hoeffding Functional Decomposition and TreeHFD Algorithm : Abstract: Tree ensembles have demonstrated state-of-the-art predictive performance across a wide range of problems involving tabular data. Nevertheless, the black-box nature of tree ensembles is a str...
- Idea2Plan: Exploring AI-Powered Research Planning : Abstract: Large language models (LLMs) have demonstrated significant potential to accelerate scientific discovery as valuable tools for analyzing data, generating hypotheses, and supporting innovative...
- Modality-Aware SAM: Sharpness-Aware-Minimization Driven Gradient Modulation for Harmonized Multimodal Learning : Abstract: In multimodal learning, dominant modalities often overshadow others, limiting generalization. We propose Modality-Aware Sharpness-Aware Minimization (M-SAM), a model-agnostic framework that ...
- scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration : Abstract: Advances in single-cell sequencing have enabled high-resolution profiling of diverse molecular modalities, while integrating unpaired multi-omics single-cell data remains challenging. Existi...
- Bayesian Neural Networks vs. Mixture Density Networks: Theoretical and Empirical Insights for Uncertainty-Aware Nonlinear Modeling : Abstract: This paper investigates two prominent probabilistic neural modeling paradigms: Bayesian Neural Networks (BNNs) and Mixture Density Networks (MDNs) for uncertainty-aware nonlinear regression....
- Secure Retrieval-Augmented Generation against Poisoning Attacks : Abstract: Large language models (LLMs) have transformed natural language processing (NLP), enabling applications from content generation to decision support. Retrieval-Augmented Generation (RAG) impro...
- Automating Benchmark Design : Abstract: The rapid progress and widespread deployment of LLMs and LLM-powered agents has outpaced our ability to evaluate them. Hand-crafted, static benchmarks are the primary tool for assessing mode...
- Breast Cancer VLMs: Clinically Practical Vision-Language Train-Inference Models : Abstract: Breast cancer remains the most commonly diagnosed malignancy among women in the developed world. Early detection through mammography screening plays a pivotal role in reducing mortality rate...
- Nonlinear Dynamics In Optimization Landscape of Shallow Neural Networks with Tunable Leaky ReLU : Abstract: In this work, we study the nonlinear dynamics of a shallow neural network trained with mean-squared loss and leaky ReLU activation. Under Gaussian inputs and equal layer width k, (1) we esta...
- BioCoref: Benchmarking Biomedical Coreference Resolution with LLMs : Abstract: Coreference resolution in biomedical texts presents unique challenges due to complex domain-specific terminology, high ambiguity in mention forms, and long-distance dependencies between core...
- Energy Approach from $\varepsilon$-Graph to Continuum Diffusion Model with Connectivity Functional : Abstract: We derive an energy-based continuum limit for $\varepsilon$-graphs endowed with a general connectivity functional. We prove that the discrete energy and its continuum counterpart differ by a...
- EnzyControl: Adding Functional and Substrate-Specific Control for Enzyme Backbone Generation : Abstract: Designing enzyme backbones with substrate-specific functionality is a critical challenge in computational protein engineering. Current generative models excel in protein design but face limi...
- Conditional neural field for spatial dimension reduction of turbulence data: a comparison study : Abstract: We investigate conditional neural fields (CNFs), mesh-agnostic, coordinate-based decoders conditioned on a low-dimensional latent, for spatial dimensionality reduction of turbulent flows. CN...
- A Study on Inference Latency for Vision Transformers on Mobile Devices : Abstract: Given the significant advances in machine learning techniques on mobile devices, particularly in the domain of computer vision, in this work we quantitatively study the performance character...
- Sustainable NARMA-10 Benchmarking for Quantum Reservoir Computing : Abstract: This study compares Quantum Reservoir Computing (QRC) with classical models such as Echo State Networks (ESNs) and Long Short-Term Memory networks (LSTMs), as well as hybrid quantum-classica...
- Generative Bayesian Optimization: Generative Models as Acquisition Functions : Abstract: We present a general strategy for turning generative models into candidate solution samplers for batch Bayesian optimization (BO). The use of generative models for BO enables large batch sca...
- 3D CT-Based Coronary Calcium Assessment: A Feature-Driven Machine Learning Framework : Abstract: Coronary artery calcium (CAC) scoring plays a crucial role in the early detection and risk stratification of coronary artery disease (CAD). In this study, we focus on non-contrast coronary c...
- Prompt Estimation from Prototypes for Federated Prompt Tuning of Vision Transformers : Abstract: Visual Prompt Tuning (VPT) of pre-trained Vision Transformers (ViTs) has proven highly effective as a parameter-efficient fine-tuning technique for adapting large models to downstream tasks ...
- Convergence of off-policy TD(0) with linear function approximation for reversible Markov chains : Abstract: We study the convergence of off-policy TD(0) with linear function approximation when used to approximate the expected discounted reward in a Markov chain. It is well known that the combinati...
- Error Bounds and Optimal Schedules for Masked Diffusions with Factorized Approximations : Abstract: Recently proposed generative models for discrete data, such as Masked Diffusion Models (MDMs), exploit conditional independence approximations to reduce the computational cost of popular Aut...
- Robust variable selection for spatial point processes observed with noise : Abstract: We propose a method for variable selection in the intensity function of spatial point processes that combines sparsity-promoting estimation with noise-robust model selection. As high-resolut...
- PitchFlower: A flow-based neural audio codec with pitch controllability : Abstract: We present PitchFlower, a flow-based neural audio codec with explicit pitch controllability. Our approach enforces disentanglement through a simple perturbation: during training, F0 contours...
- Monitoring the calibration of probability forecasts with an application to concept drift detection involving image classification : Abstract: Machine learning approaches for image classification have led to impressive advances in that field. For example, convolutional neural networks are able to achieve remarkable image classifica...
- Learning-Augmented Online Bidding in Stochastic Settings : Abstract: Online bidding is a classic optimization problem, with several applications in online decision-making, the design of interruptible systems, and the analysis of approximation algorithms. In t...
- Continuous subsurface property retrieval from sparse radar observations using physics informed neural networks : Abstract: Estimating subsurface dielectric properties is essential for applications ranging from environmental surveys of soils to nondestructive evaluation of concrete in infrastructure. Conventional...
- Model Inversion Attacks Meet Cryptographic Fuzzy Extractors : Abstract: Model inversion attacks pose an open challenge to privacy-sensitive applications that use machine learning (ML) models. For example, face authentication systems use modern ML models to compu...
- A Configuration-First Framework for Reproducible, Low-Code Localization : Abstract: Machine learning is increasingly permeating radio-based localization services. To keep results credible and comparable, everyday workflows should make rigorous experiment specification and e...
- PyDPF: A Python Package for Differentiable Particle Filtering : Abstract: State-space models (SSMs) are a widely used tool in time series analysis. In the complex systems that arise from real-world data, it is common to employ particle filtering (PF), an efficient...
- Scaling flow-based approaches for topology sampling in $\mathrm{SU}(3)$ gauge theory : Abstract: We develop a methodology based on out-of-equilibrium simulations to mitigate topological freezing when approaching the continuum limit of lattice gauge theories. We reduce the autocorrelatio...
- Hawk: Leveraging Spatial Context for Faster Autoregressive Text-to-Image Generation : Abstract: Autoregressive (AR) image generation models are capable of producing high-fidelity images but often suffer from slow inference due to their inherently sequential, token-by-token decoding pro...
- Meshless solutions of PDE inverse problems on irregular geometries : Abstract: Solving inverse and optimization problems over solutions of nonlinear partial differential equations (PDEs) on complex spatial domains is a long-standing challenge. Here we introduce a metho...
- How Data Mixing Shapes In-Context Learning: Asymptotic Equivalence for Transformers with MLPs : Abstract: Pretrained Transformers demonstrate remarkable in-context learning (ICL) capabilities, enabling them to adapt to new tasks from demonstrations without parameter updates. However, theoretical...
- Partially Observable Multi-Agent Reinforcement Learning with Information Sharing : Abstract: We study provable multi-agent reinforcement learning (RL) in the general framework of partially observable stochastic games (POSGs). To circumvent the known hardness results and the use of c...
- Score-Aware Policy-Gradient and Performance Guarantees using Local Lyapunov Stability : Abstract: In this paper, we introduce a policy-gradient method for model-based reinforcement learning (RL) that exploits a type of stationary distributions commonly obtained from Markov decision proce...
- Hyperparameters in Continual Learning: A Reality Check : Abstract: Continual learning (CL) aims to train a model on a sequence of tasks (i.e., a CL scenario) while balancing the trade-off between plasticity (learning new tasks) and stability (retaining prio...
- How Many Ratings per Item are Necessary for Reliable Significance Testing? : Abstract: A cornerstone of machine learning evaluation is the (often hidden) assumption that model and human responses are reliable enough to evaluate models against unitary, authoritative, ``gold sta...
- Hypergraph clustering using Ricci curvature: an edge transport perspective : Abstract: In this paper, we introduce a novel method for extending Ricci flow to hypergraphs by defining probability measures on the edges and transporting them on the line expansion. This approach yi...
- Exact Sequence Interpolation with Transformers : Abstract: We prove that transformers can exactly interpolate datasets of finite input sequences in $\mathbb{R}^d$, $d\geq 2$, with corresponding output sequences of smaller or equal length. Specifical...
- TuneNSearch: a hybrid transfer learning and local search approach for solving vehicle routing problems : Abstract: This paper introduces TuneNSearch, a hybrid transfer learning and local search approach for addressing diverse variants of the vehicle routing problem (VRP). Our method uses reinforcement le...
- ASGO: Adaptive Structured Gradient Optimization : Abstract: Training deep neural networks is a structured optimization problem, because the parameters are naturally represented by matrices and tensors rather than by vectors. Under this structural rep...
- Enlightenment Period Improving DNN Performance : Abstract: The start of deep neural network training is characterized by a brief yet critical phase that lasts from the beginning of the training until the accuracy reaches approximately 50\%. During t...
- MDPs with a State Sensing Cost : Abstract: In many practical sequential decision-making problems, tracking the state of the environment incurs a sensing/communication/computation cost. In these settings, the agent's interaction with ...
- Why Knowledge Distillation Works in Generative Models: A Minimal Working Explanation : Abstract: Knowledge distillation (KD) is a core component in the training and deployment of modern generative models, particularly large language models (LLMs). While its empirical benefits are well d...
- Q-learning with Posterior Sampling : Abstract: Bayesian posterior sampling techniques have demonstrated superior empirical performance in many exploration-exploitation settings. However, their theoretical analysis remains a challenge, es...
- Stochastic Momentum Methods for Non-smooth Non-Convex Finite-Sum Coupled Compositional Optimization : Abstract: Finite-sum Coupled Compositional Optimization (FCCO), characterized by its coupled compositional objective structure, emerges as an important optimization paradigm for addressing a wide rang...
- Learning single-index models via harmonic decomposition : Abstract: We study the problem of learning single-index models, where the label $y \in \mathbb{R}$ depends on the input $\boldsymbol{x} \in \mathbb{R}^d$ only through an unknown one-dimensional projec...
- Path-specific effects for pulse-oximetry guided decisions in critical care : Abstract: Identifying and measuring biases associated with sensitive attributes is a crucial consideration in healthcare to prevent treatment disparities. One prominent issue is inaccurate pulse oxime...
- Jailbreak Transferability Emerges from Shared Representations : Abstract: Jailbreak transferability is the surprising phenomenon when an adversarial attack compromising one model also elicits harmful responses from other models. Despite widespread demonstrations, ...
- Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration : Abstract: Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promisi...
- Mesh-Informed Neural Operator : A Transformer Generative Approach : Abstract: Generative models in function spaces, situated at the intersection of generative modeling and operator learning, are attracting increasing attention due to their immense potential in diverse...
- SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning : Abstract: Multimodal in-context learning (ICL) remains underexplored despite significant potential for domains such as medicine. Clinicians routinely encounter diverse, specialized tasks requiring ada...
- Tensor Decomposition Networks for Fast Machine Learning Interatomic Potential Computations : Abstract: $\rm{SO}(3)$-equivariant networks are the dominant models for machine learning interatomic potentials (MLIPs). The key operation of such networks is the Clebsch-Gordan (CG) tensor product, w...
- Faster and Simpler Greedy Algorithm for $k$-Median and $k$-Means : Abstract: Clustering problems such as $k$-means and $k$-median are staples of unsupervised learning, and many algorithmic techniques have been developed to tackle their numerous aspects. In this pap...
- OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs : Abstract: The increased use of large language models (LLMs) across a variety of real-world applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs often hallucin...
- Expand and Compress: Exploring Tuning Principles for Continual Spatio-Temporal Graph Forecasting : Abstract: The widespread deployment of sensing devices leads to a surge in data for spatio-temporal forecasting applications such as traffic flow, air quality, and wind energy. Although spatio-tempora...
- Revisiting Service Level Objectives and System Level Metrics in Large Language Model Serving : Abstract: User experience is a critical factor Large Language Model (LLM) serving systems must consider, where service level objectives (SLOs) considering the experience of individual requests and sys...
- Meta-Learning Objectives for Preference Optimization : Abstract: Evaluating preference optimization (PO) algorithms on LLM alignment is a challenging task that presents prohibitive costs, noise, and several variables like model size and hyper-parameters. ...
- HyperMARL: Adaptive Hypernetworks for Multi-Agent RL : Abstract: Adaptive cooperation in multi-agent reinforcement learning (MARL) requires policies to express homogeneous, specialised, or mixed behaviours, yet achieving this adaptivity remains a critical...
- Physics Context Builders: A Modular Framework for Physical Reasoning in Vision-Language Models : Abstract: Physical reasoning remains a significant challenge for Vision-Language Models (VLMs). This limitation arises from an inability to translate learned knowledge into predictions about physical ...
- Bias in Decision-Making for AI's Ethical Dilemmas: A Comparative Study of ChatGPT and Claude : Abstract: Recent advances in Large Language Models (LLMs) have enabled human-like responses across various tasks, raising questions about their ethical decision-making capabilities and potential biase...
- Data-Juicer 2.0: Cloud-Scale Adaptive Data Processing for and with Foundation Models : Abstract: Foundation models demand advanced data processing for their vast, multimodal datasets. However, traditional frameworks struggle with the unique complexities of multimodal data. In response, ...
- Redistributing Rewards Across Time and Agents for Multi-Agent Reinforcement Learning : Abstract: Credit assignmen, disentangling each agent's contribution to a shared reward, is a critical challenge in cooperative multi-agent reinforcement learning (MARL). To be effective, credit assign...
- Non-Markovian Discrete Diffusion with Causal Language Models : Abstract: Discrete diffusion models offer a flexible, controllable approach to structured sequence generation, yet they still lag behind causal language models in expressive power. A key limitation li...
- LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities : Abstract: Generative models are spearheading recent progress in deep learning, showcasing strong promise for trajectory sampling in dynamical systems as well. However, whereas latent space modeling pa...
- Spontaneous Giving and Calculated Greed in Language Models : Abstract: Large language models demonstrate strong problem-solving abilities through reasoning techniques such as chain-of-thought prompting and reflection. However, it remains unclear whether these r...
- DGTRSD & DGTRS-CLIP: A Dual-Granularity Remote Sensing Image-Text Dataset and Vision Language Foundation Model for Alignment : Abstract: Vision Language Foundation Models based on CLIP architecture for remote sensing primarily rely on short text captions, which often result in incomplete semantic representations. Although lon...
- Steiner Traveling Salesman Problem with Quantum Annealing : Abstract: The Steiner Traveling Salesman Problem (STSP) is a variant of the classical Traveling Salesman Problem. The STSP involves incorporating steiner nodes, which are extra nodes not originally pa...
- OmegAMP: Targeted AMP Discovery through Biologically Informed Generation : Abstract: Deep learning-based antimicrobial peptide (AMP) discovery faces critical challenges such as limited controllability, lack of representations that efficiently model antimicrobial properties, ...
- Handling Label Noise via Instance-Level Difficulty Modeling and Dynamic Optimization : Abstract: Recent studies indicate that deep neural networks degrade in generalization performance under noisy supervision. Existing methods focus on isolating clean subsets or correcting noisy labels,...
- Plexus: Taming Billion-edge Graphs with 3D Parallel Full-graph GNN Training : Abstract: Graph neural networks (GNNs) leverage the connectivity and structure of real-world graphs to learn intricate properties and relationships between nodes. Many real-world graphs exceed the mem...
- Which Demographic Features Are Relevant for Individual Fairness Evaluation of U.S. Recidivism Risk Assessment Tools? : Abstract: Despite its constitutional relevance, the technical ``individual fairness'' criterion has not been operationalized in U.S. state or federal statutes/regulations. We conduct a human subjects ...
- Do Chatbots Walk the Talk of Responsible AI? : Abstract: This study examines whether leading AI chatbot companies implement the responsible AI principles they publicly advocate. The authors used a mixed-methods approach analyzing four major chatbo...
- The Generation Phases of Flow Matching: a Denoising Perspective : Abstract: Flow matching has achieved remarkable success, yet the factors influencing the quality of its generation process remain poorly understood. In this work, we adopt a denoising perspective and ...
- The Narrative Continuity Test: A Conceptual Framework for Evaluating Identity Persistence in AI Systems : Abstract: Artificial intelligence systems based on large language models (LLMs) can now generate coherent text, music, and images, yet they operate without a persistent state: each inference reconstru...
- Efficiency Without Cognitive Change: Evidence from Human Interaction with Narrow AI Systems : Abstract: The growing integration of artificial intelligence (AI) into human cognition raises a fundamental question: does AI merely improve efficiency, or does it alter how we think? This study exper...
- Fair Indivisible Payoffs through Shapley Value : Abstract: We consider the problem of payoff division in indivisible coalitional games, where the value of the grand coalition is a natural number. This number represents a certain quantity of indivisi...
- Understanding Multi-View Transformers : Abstract: Multi-view transformers such as DUSt3R are revolutionizing 3D vision by solving 3D tasks in a feed-forward manner. However, contrary to previous optimization-based pipelines, the inner mecha...
- Trust Dynamics in Strategic Coopetition: Computational Foundations for Requirements Engineering in Multi-Agent Systems : Abstract: Requirements engineering increasingly occurs in multi-stakeholder environments where organizations simultaneously cooperate and compete, creating coopetitive relationships in which trust evo...
- KAN-GCN: Combining Kolmogorov-Arnold Network with Graph Convolution Network for an Accurate Ice Sheet Emulator : Abstract: We introduce KAN-GCN, a fast and accurate emulator for ice sheet modeling that places a Kolmogorov-Arnold Network (KAN) as a feature-wise calibrator before graph convolution networks (GCNs)....
- Finding Culture-Sensitive Neurons in Vision-Language Models : Abstract: Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the...
- SCOUT: A Lightweight Framework for Scenario Coverage Assessment in Autonomous Driving : Abstract: Assessing scenario coverage is crucial for evaluating the robustness of autonomous agents, yet existing methods rely on expensive human annotations or computationally intensive Large Vision-...
- Sequences of Logits Reveal the Low Rank Structure of Language Models : Abstract: A major problem in the study of large language models is to understand their inherent low-dimensional structure. We introduce an approach to study the low-dimensional structure of language m...
- Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging : Abstract: Vision Transformers (ViTs) have emerged as powerful architectures in medical image analysis, excelling in tasks such as disease detection, segmentation, and classification. However, their re...
- FT-ARM: Fine-Tuned Agentic Reflection Multimodal Language Model for Pressure Ulcer Severity Classification with Reasoning : Abstract: Pressure ulcers (PUs) are a serious and prevalent healthcare concern. Accurate classification of PU severity (Stages I-IV) is essential for proper treatment but remains challenging due to su...
- LRT-Diffusion: Calibrated Risk-Aware Guidance for Diffusion Policies : Abstract: Diffusion policies are competitive for offline reinforcement learning (RL) but are typically guided at sampling time by heuristics that lack a statistical notion of risk. We introduce LRT-Di...
- FaRAccel: FPGA-Accelerated Defense Architecture for Efficient Bit-Flip Attack Resilience in Transformer Models : Abstract: Forget and Rewire (FaR) methodology has demonstrated strong resilience against Bit-Flip Attacks (BFAs) on Transformer-based models by obfuscating critical parameters through dynamic rewiring...
- Epileptic Seizure Detection and Prediction from EEG Data: A Machine Learning Approach with Clinical Validation : Abstract: In recent years, machine learning has become an increasingly powerful tool for supporting seizure detection and monitoring in epilepsy care. Traditional approaches focus on identifying seizu...
- Emergence of Minimal Circuits for Indirect Object Identification in Attention-Only Transformers : Abstract: Mechanistic interpretability aims to reverse-engineer large language models (LLMs) into human-understandable computational circuits. However, the complexity of pretrained models often obscur...
- Towards Human-AI Synergy in Requirements Engineering: A Framework and Preliminary Study : Abstract: The future of Requirements Engineering (RE) is increasingly driven by artificial intelligence (AI), reshaping how we elicit, analyze, and validate requirements. Traditional RE is based on la...
- StorageXTuner: An LLM Agent-Driven Automatic Tuning Framework for Heterogeneous Storage Systems : Abstract: Automatically configuring storage systems is hard: parameter spaces are large and conditions vary across workloads, deployments, and versions. Heuristic and ML tuners are often system specif...
- Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8 : Abstract: Developing a highly accurate automatic license plate recognition system (ALPR) is challenging due to environmental factors such as lighting, rain, and dust. Additional difficulties include h...
- Scalable predictive processing framework for multitask caregiving robots : Abstract: The rapid aging of societies is intensifying demand for autonomous care robots; however, most existing systems are task-specific and rely on handcrafted preprocessing, limiting their ability...
- GAPMAP: Mapping Scientific Knowledge Gaps in Biomedical Literature Using Large Language Models : Abstract: Scientific progress is driven by the deliberate articulation of what remains unknown. This study investigates the ability of large language models (LLMs) to identify research knowledge gaps ...
- Monopoly Deal: A Benchmark Environment for Bounded One-Sided Response Games : Abstract: Card games are widely used to study sequential decision-making under uncertainty, with real-world analogues in negotiation, finance, and cybersecurity. Typically, these games fall into three...
- Learning Fair Graph Representations with Multi-view Information Bottleneck : Abstract: Graph neural networks (GNNs) excel on relational data by passing messages over node features and structure, but they can amplify training data biases, propagating discriminatory attributes a...
- The Neural Differential Manifold: An Architecture with Explicit Geometric Structure : Abstract: This paper introduces the Neural Differential Manifold (NDM), a novel neural network architecture that explicitly incorporates geometric structure into its fundamental design. Departing from...
- Learning Low Rank Neural Representations of Hyperbolic Wave Dynamics from Data : Abstract: We present a data-driven dimensionality reduction method that is well-suited for physics-based data representing hyperbolic wave propagation. The method utilizes a specialized neural network...
- Bridging the Divide: End-to-End Sequence-Graph Learning : Abstract: Many real-world datasets are both sequential and relational: each node carries an event sequence while edges encode interactions. Existing methods in sequence modeling and graph modeling oft...
- Lipschitz-aware Linearity Grafting for Certified Robustness : Abstract: Lipschitz constant is a fundamental property in certified robustness, as smaller values imply robustness to adversarial examples when a model is confident in its prediction. However, identif...
- Model-Document Protocol for AI Search : Abstract: AI search depends on linking large language models (LLMs) with vast external knowledge sources. Yet web pages, PDF files, and other raw documents are not inherently LLM-ready: they are long,...
- Transformers in Medicine: Improving Vision-Language Alignment for Medical Image Captioning : Abstract: We present a transformer-based multimodal framework for generating clinically relevant captions for MRI scans. Our system combines a DEiT-Small vision transformer as an image encoder, MediCa...
- SFMS-ALR: Script-First Multilingual Speech Synthesis with Adaptive Locale Resolution : Abstract: Intra-sentence multilingual speech synthesis (code-switching TTS) remains a major challenge due to abrupt language shifts, varied scripts, and mismatched prosody between languages. Conventio...
- Fed-PELAD: Communication-Efficient Federated Learning for Massive MIMO CSI Feedback with Personalized Encoders and a LoRA-Adapted Shared Decoder : Abstract: This paper addresses the critical challenges of communication overhead, data heterogeneity, and privacy in deep learning for channel state information (CSI) feedback in massive MIMO systems....
- Human Resilience in the AI Era -- What Machines Can't Replace : Abstract: AI is displacing tasks, mediating high-stakes decisions, and flooding communication with synthetic content, unsettling work, identity, and social trust. We argue that the decisive human coun...
- GReF: A Unified Generative Framework for Efficient Reranking via Ordered Multi-token Prediction : Abstract: In a multi-stage recommendation system, reranking plays a crucial role in modeling intra-list correlations among items. A key challenge lies in exploring optimal sequences within the combina...
- Cost-Sensitive Unbiased Risk Estimation for Multi-Class Positive-Unlabeled Learning : Abstract: Positive--Unlabeled (PU) learning considers settings in which only positive and unlabeled data are available, while negatives are missing or left unlabeled. This situation is common in real ...
- Studies for : A Human-AI Co-Creative Sound Artwork Using a Real-time Multi-channel Sound Generation Model : Abstract: This paper explores the integration of AI technologies into the artistic workflow through the creation of Studies for, a generative sound installation developed in collaboration with sound a...
- Learning Disentangled Speech- and Expression-Driven Blendshapes for 3D Talking Face Animation : Abstract: Expressions are fundamental to conveying human emotions. With the rapid advancement of AI-generated content (AIGC), realistic and expressive 3D facial animation has become increasingly cruci...
- One-shot Humanoid Whole-body Motion Learning : Abstract: Whole-body humanoid motion represents a cornerstone challenge in robotics, integrating balance, coordination, and adaptability to enable human-like behaviors. However, existing methods typic...
- Scaling Up Bayesian DAG Sampling : Abstract: Bayesian inference of Bayesian network structures is often performed by sampling directed acyclic graphs along an appropriately constructed Markov chain. We present two techniques to improve...
- TV-Rec: Time-Variant Convolutional Filter for Sequential Recommendation : Abstract: Recently, convolutional filters have been increasingly adopted in sequential recommendation for their ability to capture local sequential patterns. However, most of these models complement c...
- IBNorm: Information-Bottleneck Inspired Normalization for Representation Learning : Abstract: Normalization is fundamental to deep learning, but existing approaches such as BatchNorm, LayerNorm, and RMSNorm are variance-centric by enforcing zero mean and unit variance, stabilizing tr...
- SynHLMA:Synthesizing Hand Language Manipulation for Articulated Object with Discrete Human Object Interaction Representation : Abstract: Generating hand grasps with language instructions is a widely studied topic that benefits from embodied AI and VR/AR applications. While transferring into hand articulatied object interactio...
- Dense and Diverse Goal Coverage in Multi Goal Reinforcement Learning : Abstract: Reinforcement Learning algorithms are primarily focused on learning a policy that maximizes expected return. As a result, the learned policy can exploit one or few reward sources. However, i...
- 4-Doodle: Text to 3D Sketches that Move! : Abstract: We present a novel task: text-to-3D sketch animation, which aims to bring freeform sketches to life in dynamic 3D space. Unlike prior works focused on photorealistic content generation, we t...
- MMEdge: Accelerating On-device Multimodal Inference via Pipelined Sensing and Encoding : Abstract: Real-time multimodal inference on resource-constrained edge devices is essential for applications such as autonomous driving, human-computer interaction, and mobile health. However, prior wo...
- Multi-party Agent Relation Sampling for Multi-party Ad Hoc Teamwork : Abstract: Multi-agent reinforcement learning (MARl) has achieved strong results in cooperative tasks but typically assumes fixed, fully controlled teams. Ad hoc teamwork (AHT) relaxes this by allowing...
- A Convexity-dependent Two-Phase Training Algorithm for Deep Neural Networks : Abstract: The key task of machine learning is to minimize the loss function that measures the model fit to the training data. The numerical methods to do this efficiently depend on the properties of t...
- Position: Biology is the Challenge Physics-Informed ML Needs to Evolve : Abstract: Physics-Informed Machine Learning (PIML) has successfully integrated mechanistic understanding into machine learning, particularly in domains governed by well-known physical laws. This succe...
- Hallucinations in Bibliographic Recommendation: Citation Frequency as a Proxy for Training Data Redundancy : Abstract: Large language models (LLMs) have been increasingly applied to a wide range of tasks, from natural language understanding to code generation. While they have also been used to assist in bibl...
- Integrating Legal and Logical Specifications in Perception, Prediction, and Planning for Automated Driving: A Survey of Methods : Abstract: This survey provides an analysis of current methodologies integrating legal and logical specifications into the perception, prediction, and planning modules of automated driving systems. We ...
- GPTOpt: Towards Efficient LLM-Based Black-Box Optimization : Abstract: Global optimization of expensive, derivative-free black-box functions demands extreme sample efficiency. Classical methods such as Bayesian Optimization (BO) can be effective, but they often...
- BhashaBench V1: A Comprehensive Benchmark for the Quadrant of Indic Domains : Abstract: The rapid advancement of large language models(LLMs) has intensified the need for domain and culture specific evaluation. Existing benchmarks are largely Anglocentric and domain-agnostic, li...
- Adaptive End-to-End Transceiver Design for NextG Pilot-Free and CP-Free Wireless Systems : Abstract: The advent of artificial intelligence (AI)-native wireless communication is fundamentally reshaping the design paradigm of next-generation (NextG) systems, where intelligent air interfaces a...
- Improving Temporal Consistency and Fidelity at Inference-time in Perceptual Video Restoration by Zero-shot Image-based Diffusion Models : Abstract: Diffusion models have emerged as powerful priors for single-image restoration, but their application to zero-shot video restoration suffers from temporal inconsistencies due to the stochasti...
- Implicature in Interaction: Understanding Implicature Improves Alignment in Human-LLM Interaction : Abstract: The rapid advancement of Large Language Models (LLMs) is positioning language at the core of human-computer interaction (HCI). We argue that advancing HCI requires attention to the linguisti...
- RLMEval: Evaluating Research-Level Neural Theorem Proving : Abstract: Despite impressive results on curated benchmarks, the practical impact of large language models (LLMs) on research-level neural theorem proving and proof autoformalization is still limited. ...
- Alibaba International E-commerce Product Search Competition DcuRAGONs Team Technical Report : Abstract: This report details our methodology and results developed for the Multilingual E-commerce Search Competition. The problem aims to recognize relevance between user queries versus product item...
- Grounded in Reality: Learning and Deploying Proactive LLM from Offline Logs : Abstract: Large Language Models (LLMs) excel as passive responders, but teaching them to be proactive, goal-oriented partners, a critical capability in high-stakes domains, remains a major challenge. ...
- Scalable Utility-Aware Multiclass Calibration : Abstract: Ensuring that classifiers are well-calibrated, i.e., their predictions align with observed frequencies, is a minimal and fundamental requirement for classifiers to be viewed as trustworthy. ...
- Fine-Tuned Language Models for Domain-Specific Summarization and Tagging : Abstract: This paper presents a pipeline integrating fine-tuned large language models (LLMs) with named entity recognition (NER) for efficient domain-specific text summarization and tagging. The autho...
- An In-Depth Analysis of Cyber Attacks in Secured Platforms : Abstract: There is an increase in global malware threats. To address this, an encryption-type ransomware has been introduced on the Android operating system. The challenges associated with malicious t...
- TempoPFN: Synthetic Pre-training of Linear RNNs for Zero-shot Time Series Forecasting : Abstract: Foundation models for zero-shot time series forecasting face challenges in efficient long-horizon prediction and reproducibility, with existing synthetic-only approaches underperforming on c...
- Reflections on the Reproducibility of Commercial LLM Performance in Empirical Software Engineering Studies : Abstract: Large Language Models have gained remarkable interest in industry and academia. The increasing interest in LLMs in academia is also reflected in the number of publications on this topic over...
- FaCT: Faithful Concept Traces for Explaining Neural Network Decisions : Abstract: Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge. Many post-hoc con...
- Comparative Study of UNet-based Architectures for Liver Tumor Segmentation in Multi-Phase Contrast-Enhanced Computed Tomography : Abstract: Segmentation of liver structures in multi-phase contrast-enhanced computed tomography (CECT) plays a crucial role in computer-aided diagnosis and treatment planning for liver diseases, inclu...
- Using latent representations to link disjoint longitudinal data for mixed-effects regression : Abstract: Many rare diseases offer limited established treatment options, leading patients to switch therapies when new medications emerge. To analyze the impact of such treatment switches within the ...
- Hybrid Quantum-Classical Recurrent Neural Networks : Abstract: We present a hybrid quantum-classical recurrent neural network (QRNN) architecture in which the entire recurrent core is realized as a parametrized quantum circuit (PQC) controlled by a clas...
- Leveraging an Atmospheric Foundational Model for Subregional Sea Surface Temperature Forecasting : Abstract: The accurate prediction of oceanographic variables is crucial for understanding climate change, managing marine resources, and optimizing maritime activities. Traditional ocean forecasting r...
- Lost in Phonation: Voice Quality Variation as an Evaluation Dimension for Speech Foundation Models : Abstract: Recent advances in speech foundation models (SFMs) have enabled the direct processing of spoken language from raw audio, bypassing intermediate textual representations. This capability allow...
- RegionE: Adaptive Region-Aware Generation for Efficient Image Editing : Abstract: Recently, instruction-based image editing (IIE) has received widespread attention. In practice, IIE often modifies only specific regions of an image, while the remaining areas largely remain...
- Communication and Verification in LLM Agents towards Collaboration under Information Asymmetry : Abstract: While Large Language Model (LLM) agents are often approached from the angle of action planning/generation to accomplish a goal (e.g., given by language descriptions), their abilities to coll...
- INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats : Abstract: Modern AI hardware, such as Nvidia's Blackwell architecture, is increasingly embracing low-precision floating-point (FP) formats to handle the pervasive activation outliers in Large Language...
- BOLT-GAN: Bayes-Optimal Loss for Stable GAN Training : Abstract: We introduce BOLT-GAN, a simple yet effective modification of the WGAN framework inspired by the Bayes Optimal Learning Threshold (BOLT). We show that with a Lipschitz continuous discriminat...
- Don't Blind Your VLA: Aligning Visual Representations for OOD Generalization : Abstract: The growing success of Vision-Language-Action (VLA) models stems from the promise that pretrained Vision-Language Models (VLMs) can endow agents with transferable world knowledge and vision-...
- FARSIQA: Faithful and Advanced RAG System for Islamic Question Answering : Abstract: The advent of Large Language Models (LLMs) has revolutionized Natural Language Processing, yet their application in high-stakes, specialized domains like religious question answering is hind...
- Are Language Models Efficient Reasoners? A Perspective from Logic Programming : Abstract: Modern language models (LMs) exhibit strong deductive reasoning capabilities, yet standard evaluations emphasize correctness while overlooking a key aspect of human-like reasoning: efficienc...
- Learning to Plan & Schedule with Reinforcement-Learned Bimanual Robot Skills : Abstract: Long-horizon contact-rich bimanual manipulation presents a significant challenge, requiring complex coordination involving a mixture of parallel execution and sequential collaboration betwee...
- Subgraph Federated Learning via Spectral Methods : Abstract: We consider the problem of federated learning (FL) with graph-structured data distributed across multiple clients. In particular, we address the prevalent scenario of interconnected subgraph...
- User Misconceptions of LLM-Based Conversational Programming Assistants : Abstract: Programming assistants powered by large language models (LLMs) have become widely available, with conversational assistants like ChatGPT proving particularly accessible to less experienced p...
- Graph Network-based Structural Simulator: Graph Neural Networks for Structural Dynamics : Abstract: Graph Neural Networks (GNNs) have recently been explored as surrogate models for numerical simulations. While their applications in computational fluid dynamics have been investigated, littl...
- Process-Level Trajectory Evaluation for Environment Configuration in Software Engineering Agents : Abstract: Large language model-based agents show promise for software engineering, but environment configuration remains a bottleneck due to heavy manual effort and scarce large-scale, high-quality da...
- The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution : Abstract: Real-world language agents must handle complex, multi-step workflows across diverse Apps. For instance, an agent may manage emails by coordinating with calendars and file systems, or monitor...
- Physics-Guided Conditional Diffusion Networks for Microwave Image Reconstruction : Abstract: A conditional latent-diffusion based framework for solving the electromagnetic inverse scattering problem associated with microwave imaging is introduced. This generative machine-learning mo...
- LieSolver: A PDE-constrained solver for IBVPs using Lie symmetries : Abstract: We introduce a method for efficiently solving initial-boundary value problems (IBVPs) that uses Lie symmetries to enforce the associated partial differential equation (PDE) exactly by constr...
- The Limits of Obliviate: Evaluating Unlearning in LLMs via Stimulus-Knowledge Entanglement-Behavior Framework : Abstract: Unlearning in large language models (LLMs) is crucial for managing sensitive data and correcting misinformation, yet evaluating its effectiveness remains an open problem. We investigate whet...
- Task Completion Agents are Not Ideal Collaborators : Abstract: Current evaluations of agents remain centered around one-shot task completion, failing to account for the inherently iterative and collaborative nature of many real-world problems, where hum...
- E-Scores for (In)Correctness Assessment of Generative Model Outputs : Abstract: While generative models, especially large language models (LLMs), are ubiquitous in today's world, principled mechanisms to assess their (in)correctness are limited. Using the conformal pred...
- Gaperon: A Peppered English-French Generative Language Model Suite : Abstract: We release Gaperon, a fully open suite of French-English-coding language models designed to advance transparency and reproducibility in large-scale model training. The Gaperon family include...
- Brain-inspired Computational Intelligence via Predictive Coding : Abstract: Artificial intelligence (AI) is rapidly becoming one of the key technologies of this century. The majority of results in AI thus far have been achieved using deep neural networks trained wit...
- CURATRON: Complete and Robust Preference Data for Rigorous Alignment of Large Language Models : Abstract: This paper addresses the challenges of aligning large language models (LLMs) with human values via preference learning (PL), focusing on incomplete and corrupted data in preference datasets....
- TraveLLM: Could you plan my new public transit route in face of a network disruption? : Abstract: Existing navigation systems often fail during urban disruptions, struggling to incorporate real-time events and complex user constraints, such as avoiding specific areas. We address this gap...
- SNN-Based Online Learning of Concepts and Action Laws in an Open World : Abstract: We present the architecture of a fully autonomous, bio-inspired cognitive agent built around a spiking neural network (SNN) implementing the agent's semantic memory. This agent explores its ...
- Think or Not? Selective Reasoning via Reinforcement Learning for Vision-Language Models : Abstract: Reinforcement Learning (RL) has proven to be an effective post-training strategy for enhancing reasoning in vision-language models (VLMs). Group Relative Policy Optimization (GRPO) is a rece...
- Integrating Counterfactual Simulations with Language Models for Explaining Multi-Agent Behaviour : Abstract: Autonomous multi-agent systems (MAS) are useful for automating complex tasks but raise trust concerns due to risks such as miscoordination or goal misalignment. Explainability is vital for u...
- PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions : Abstract: Doctor-patient consultations require multi-turn, context-aware communication tailored to diverse patient personas. Training or evaluating doctor LLMs in such settings requires realistic pati...
- Cite Pretrain: Retrieval-Free Knowledge Attribution for Large Language Models : Abstract: Trustworthy language models should provide both correct and verifiable answers. However, citations generated directly by standalone LLMs are often unreliable. As a result, current systems in...
- HAMLET: Hyperadaptive Agent-based Modeling for Live Embodied Theatrics : Abstract: Creating an immersive and interactive theatrical experience is a long-term goal in the field of interactive narrative. The emergence of large language model (LLM) is providing a new path to ...
- The Landscape of Agentic Reinforcement Learning for LLMs: A Survey : Abstract: The emergence of agentic reinforcement learning (Agentic RL) marks a paradigm shift from conventional reinforcement learning applied to large language models (LLM RL), reframing LLMs from pa...
- Towards a Common Framework for Autoformalization : Abstract: Autoformalization has emerged as a term referring to the automation of formalization - specifically, the formalization of mathematics using interactive theorem provers (proof assistants). It...
- Quantum Transformer: Accelerating model inference via quantum linear algebra : Abstract: Powerful generative artificial intelligence from large language models (LLMs) harnesses extensive computational resources for inference. In this work, we investigate the transformer architec...
- AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets : Abstract: Background: Development of artificial intelligence (AI) models for lung cancer screening requires large, well-annotated low-dose computed tomography (CT) datasets and rigorous performance be...
- Reliable Evaluation and Benchmarks for Statement Autoformalization : Abstract: Evaluating statement autoformalization, translating natural language mathematics into formal languages like Lean 4, remains a significant challenge, with few metrics, datasets, and standards...
- Scheduling Your LLM Reinforcement Learning with Reasoning Trees : Abstract: Using Reinforcement Learning with Verifiable Rewards (RLVR) to optimize Large Language Models (LLMs) can be conceptualized as progressively editing a query's `Reasoning Tree'. This process i...
- Cyclic Counterfactuals under Shift-Scale Interventions : Abstract: Most counterfactual inference frameworks traditionally assume acyclic structural causal models (SCMs), i.e. directed acyclic graphs (DAGs). However, many real-world systems (e.g. biological ...
- Taming the Real-world Complexities in CPT E/M Coding with Large Language Models : Abstract: Evaluation and Management (E/M) coding, under the Current Procedural Terminology (CPT) taxonomy, documents medical services provided to patients by physicians. Used primarily for billing pur...
- Aligning Large Language Models with Procedural Rules: An Autoregressive State-Tracking Prompting for In-Game Trading : Abstract: Large Language Models (LLMs) enable dynamic game interactions but fail to follow essential procedural flows in rule-governed trading systems, eroding player trust. This work resolves the cor...
- Reasoning-Aware GRPO using Process Mining : Abstract: Reinforcement learning (RL)-based post-training has been crucial for enabling multi-step reasoning in large reasoning models (LRMs), yet current reward schemes are typically outcome-centric....
- H3M-SSMoEs: Hypergraph-based Multimodal Learning with LLM Reasoning and Style-Structured Mixture of Experts : Abstract: Stock movement prediction remains fundamentally challenging due to complex temporal dependencies, heterogeneous modalities, and dynamically evolving inter-stock relationships. Existing appro...
- KnowCoder-A1: Incentivizing Agentic Reasoning Capability with Outcome Supervision for KBQA : Abstract: Knowledge Base Question Answering (KBQA) aims to answer natural-language questions over a structured Knowledge Base (KB). Recent work improves KBQA by adopting an agentic reasoning paradigm,...
- Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models : Abstract: Agentic methods have emerged as a powerful and autonomous paradigm that enhances reasoning, collaboration, and adaptive control, enabling systems to coordinate and independently solve comple...
- Energy-Efficient Autonomous Driving with Adaptive Perception and Robust Decision : Abstract: Autonomous driving is an emerging technology that is expected to bring significant social, economic, and environmental benefits. However, these benefits come with rising energy consumption b...
- RAVR: Reference-Answer-guided Variational Reasoning for Large Language Models : Abstract: Reinforcement learning (RL) can refine the reasoning abilities of large language models (LLMs), but critically depends on a key prerequisite: the LLM can already generate high-utility reason...
- FELA: A Multi-Agent Evolutionary System for Feature Engineering of Industrial Event Log Data : Abstract: Event log data, recording fine-grained user actions and system events, represent one of the most valuable assets for modern digital services. However, the complexity and heterogeneity of ind...
- From Medical Records to Diagnostic Dialogues: A Clinical-Grounded Approach and Dataset for Psychiatric Comorbidity : Abstract: Psychiatric comorbidity is clinically significant yet challenging due to the complexity of multiple co-occurring disorders. To address this, we develop a novel approach integrating synthetic...
- GAP: Graph-Based Agent Planning with Parallel Tool Use and Reinforcement Learning : Abstract: Autonomous agents powered by large language models (LLMs) have shown impressive capabilities in tool manipulation for complex task-solving. However, existing paradigms such as ReAct rely on ...
- Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm : Abstract: A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node ...
- Agentic AI: A Comprehensive Survey of Architectures, Applications, and Future Directions : Abstract: Agentic AI represents a transformative shift in artificial intelligence, but its rapid advancement has led to a fragmented understanding, often conflating modern neural systems with outdated...
- Instrumental goals in advanced AI systems: Features to be managed and not failures to be eliminated? : Abstract: In artificial intelligence (AI) alignment research, instrumental goals, also called instrumental subgoals or instrumental convergent goals, are widely associated with advanced AI systems. Th...
- Multi-Objective Search: Algorithms, Applications, and Emerging Directions : Abstract: Multi-objective search (MOS) has emerged as a unifying framework for planning and decision-making problems where multiple, often conflicting, criteria must be balanced. While the problem has...
- MTIR-SQL: Multi-turn Tool-Integrated Reasoning Reinforcement Learning for Text-to-SQL : Abstract: As large language models (LLMs) are increasingly used in Text-to-SQL tasks, Reinforcement Learning (RL) has become a common method for improving performance. Existing methods primarily rely ...
- Predicate Renaming via Large Language Models : Abstract: In this paper, we address the problem of giving names to predicates in logic rules using Large Language Models (LLMs). In the context of Inductive Logic Programming, various rule generation ...
- Retrieval Augmented Generation (RAG) for Fintech: Agentic Design and Evaluation : Abstract: Retrieval-Augmented Generation (RAG) systems often face limitations in specialized domains such as fintech, where domain-specific ontologies, dense terminology, and acronyms complicate effec...
- Zero Reinforcement Learning Towards General Domains : Abstract: Zero Reinforcement Learning (Zero-RL) has proven to be an effective approach for enhancing the reasoning capabilities of large language models (LLMs) by directly applying reinforcement learn...
- Off-policy Reinforcement Learning with Model-based Exploration Augmentation : Abstract: Exploration is fundamental to reinforcement learning (RL), as it determines how effectively an agent discovers and exploits the underlying structure of its environment to achieve optimal per...
- Standardization of Psychiatric Diagnoses -- Role of Fine-tuned LLM Consortium and OpenAI-gpt-oss Reasoning LLM Enabled Decision Support System : Abstract: The diagnosis of most mental disorders, including psychiatric evaluations, primarily depends on dialogues between psychiatrists and patients. This subjective process can lead to variability ...
- Counterfactual-based Agent Influence Ranker for Agentic AI Workflows : Abstract: An Agentic AI Workflow (AAW), also known as an LLM-based multi-agent system, is an autonomous system that assembles several LLM-based agents to work collaboratively towards a shared goal. Th...
- ALDEN: Reinforcement Learning for Active Navigation and Evidence Gathering in Long Documents : Abstract: Vision-language models (VLMs) excel at interpreting text-rich images but struggle with long, visually complex documents that demand analysis and integration of information spread across mult...
- Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning : Abstract: Unmanned Aerial Vehicles (UAVs) are increasingly populating urban areas for delivery and surveillance purposes. In this work, we develop an optimal navigation strategy based on Deep Reinforc...
- BambooKG: A Neurobiologically-inspired Frequency-Weight Knowledge Graph : Abstract: Retrieval-Augmented Generation allows LLMs to access external knowledge, reducing hallucinations and ageing-data issues. However, it treats retrieved chunks independently and struggles with ...
- TheraMind: A Strategic and Adaptive Agent for Longitudinal Psychological Counseling : Abstract: Large language models (LLMs) in psychological counseling have attracted increasing attention. However, existing approaches often lack emotional understanding, adaptive strategies, and the us...
- Large-Scale Network Embedding in Apache Spark : Abstract: Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches c...
- Modelling the Interplay of Eye-Tracking Temporal Dynamics and Personality for Emotion Detection in Face-to-Face Settings : Abstract: Accurate recognition of human emotions is critical for adaptive human-computer interaction, yet remains challenging in dynamic, conversation-like settings. This work presents a personality-a...
- The Epistemic Suite: A Post-Foundational Diagnostic Methodology for Assessing AI Knowledge Claims : Abstract: Large Language Models (LLMs) generate fluent, plausible text that can mislead users into mistaking simulated coherence for genuine understanding. This paper introduces the Epistemic Suite, a...
- AmarDoctor: An AI-Driven, Multilingual, Voice-Interactive Digital Health Application for Primary Care Triage and Patient Management to Bridge the Digital Health Divide for Bengali Speakers : Abstract: This study presents AmarDoctor, a multilingual voice-interactive digital health app designed to provide comprehensive patient triage and AI-driven clinical decision support for Bengali speak...
- Beyond Models: A Framework for Contextual and Cultural Intelligence in African AI Deployment : Abstract: While global AI development prioritizes model performance and computational scale, meaningful deployment in African markets requires fundamentally different architectural decisions. This pap...
- Flows, straight but not so fast: Exploring the design space of Rectified Flows in Protein Design : Abstract: Generative modeling techniques such as Diffusion and Flow Matching have achieved significant successes in generating designable and diverse protein backbones. However, many current models ar...
- Cardi-GPT: An Expert ECG-Record Processing Chatbot : Abstract: Interpreting and communicating electrocardiogram (ECG) findings are crucial yet challenging tasks in cardiovascular diagnosis, traditionally requiring significant expertise and precise clini...
- PulseFi: A Low Cost Robust Machine Learning System for Accurate Cardiopulmonary and Apnea Monitoring Using Channel State Information : Abstract: Non-intrusive monitoring of vital signs has become increasingly important in a variety of healthcare settings. In this paper, we present PulseFi, a novel low-cost non-intrusive system that u...
- EcoScaleNet: A Lightweight Multi Kernel Network for Long Sequence 12 lead ECG Classification : Abstract: Accurate interpretation of 12 lead electrocardiograms (ECGs) is critical for early detection of cardiac abnormalities, yet manual reading is error prone and existing CNN based classifiers st...
- Beyond Function-Level Search: Repository-Aware Dual-Encoder Code Retrieval with Adversarial Verification : Abstract: The escalating complexity of modern codebases has intensified the need for retrieval systems capable of interpreting cross-component change intents, a capability fundamentally absent in conv...
- Stable-by-Design Neural Network-Based LPV State-Space Models for System Identification : Abstract: Accurate modeling of nonlinear systems is essential for reliable control, yet conventional identification methods often struggle to capture latent dynamics while maintaining stability. We pr...
- Dingtalk DeepResearch: A Unified Multi Agent Framework for Adaptive Intelligence in Enterprise Environments : Abstract: We present Dingtalk DeepResearch, a unified multi agent intelligence framework for real world enterprise environments, delivering deep research, heterogeneous table reasoning, and multimodal...
- Falcon: A Comprehensive Chinese Text-to-SQL Benchmark for Enterprise-Grade Evaluation : Abstract: We introduce Falcon, a cross-domain Chinese text-to-SQL benchmark grounded in an enterprise-compatible dialect (MaxCompute/Hive). It contains 600 Chinese questions over 28 databases; 77% req...
- Dual-Domain Deep Learning-Assisted NOMA-CSK Systems for Secure and Efficient Vehicular Communications : Abstract: Ensuring secure and efficient multi-user (MU) transmission is critical for vehicular communication systems. Chaos-based modulation schemes have garnered considerable interest due to their be...
- Topic-aware Large Language Models for Summarizing the Lived Healthcare Experiences Described in Health Stories : Abstract: Storytelling is a powerful form of communication and may provide insights into factors contributing to gaps in healthcare outcomes. To determine whether Large Language Models (LLMs) can iden...
- Towards Fine-Grained Human Motion Video Captioning : Abstract: Generating accurate descriptions of human actions in videos remains a challenging task for video captioning models. Existing approaches often struggle to capture fine-grained motion details,...
- Combining SAR Simulators to Train ATR Models with Synthetic Data : Abstract: This work aims to train Deep Learning models to perform Automatic Target Recognition (ATR) on Synthetic Aperture Radar (SAR) images. To circumvent the lack of real labelled measurements, we ...
- DMVFC: Deep Learning Based Functionally Consistent Tractography Fiber Clustering Using Multimodal Diffusion MRI and Functional MRI : Abstract: Tractography fiber clustering using diffusion MRI (dMRI) is a crucial method for white matter (WM) parcellation to enable analysis of brains structural connectivity in health and disease. Cu...
- Confidence is Not Competence : Abstract: Large language models (LLMs) often exhibit a puzzling disconnect between their asserted confidence and actual problem-solving competence. We offer a mechanistic account of this decoupling by...
- Cross-Enhanced Multimodal Fusion of Eye-Tracking and Facial Features for Alzheimer's Disease Diagnosis : Abstract: Accurate diagnosis of Alzheimer's disease (AD) is essential for enabling timely intervention and slowing disease progression. Multimodal diagnostic approaches offer considerable promise by i...
- AI & Data Competencies: Scaffolding holistic AI literacy in Higher Education : Abstract: This chapter introduces the AI & Data Acumen Learning Outcomes Framework, a comprehensive tool designed to guide the integration of AI literacy across higher education. Developed through a c...
- ESCA: Enabling Seamless Codec Avatar Execution through Algorithm and Hardware Co-Optimization for Virtual Reality : Abstract: Photorealistic Codec Avatars (PCA), which generate high-fidelity human face renderings, are increasingly being used in Virtual Reality (VR) environments to enable immersive communication and...
- The Underappreciated Power of Vision Models for Graph Structural Understanding : Abstract: Graph Neural Networks operate through bottom-up message-passing, fundamentally differing from human visual perception, which intuitively captures global structures first. We investigate the ...
- PISA-Bench: The PISA Index as a Multilingual and Multimodal Metric for the Evaluation of Vision-Language Models : Abstract: Vision-language models (VLMs) have demonstrated remarkable progress in multimodal reasoning. However, existing benchmarks remain limited in terms of high-quality, human-verified examples. Ma...
- SwiftEmbed: Ultra-Fast Text Embeddings via Static Token Lookup for Real-Time Applications : Abstract: We present a static token lookup methodology for text embedding generation that achieves 1.12 ms p50 latency for single text embeddings while maintaining 60.6 MTEB average score across 8 rep...
- A Survey on Efficient Vision-Language-Action Models : Abstract: Vision-Language-Action models (VLAs) represent a significant frontier in embodied intelligence, aiming to bridge digital knowledge with physical-world interaction. While these models have de...
- Mutual Wanting in Human--AI Interaction: Empirical Evidence from Large-Scale Analysis of GPT Model Transitions : Abstract: The rapid evolution of large language models (LLMs) creates complex bidirectional expectations between users and AI systems that are poorly understood. We introduce the concept of "mutual wa...
- Large Language Models Report Subjective Experience Under Self-Referential Processing : Abstract: Large language models sometimes produce structured, first-person descriptions that explicitly reference awareness or subjective experience. To better understand this behavior, we investigate...
- Fortytwo: Swarm Inference with Peer-Ranked Consensus : Abstract: As centralized AI hits compute ceilings and diminishing returns from ever-larger training runs, meeting demand requires an inference layer that scales horizontally in both capacity and capab...
- From Narrative to Action: A Hierarchical LLM-Agent Framework for Human Mobility Generation : Abstract: Understanding and replicating human mobility requires not only spatial-temporal accuracy but also an awareness of the cognitive hierarchy underlying real-world travel decisions. Traditional ...
- MASPRM: Multi-Agent System Process Reward Model : Abstract: Practical deployment of Multi-Agent Systems (MAS) demands strong test-time performance, motivating methods that guide inference-time search and selectively spend compute to improve quality. ...
- CT-Less Attenuation Correction Using Multiview Ensemble Conditional Diffusion Model on High-Resolution Uncorrected PET Images : Abstract: Accurate quantification in positron emission tomography (PET) is essential for accurate diagnostic results and effective treatment tracking. A major issue encountered in PET imaging is atten...
- COMMUNITYNOTES: A Dataset for Exploring the Helpfulness of Fact-Checking Explanations : Abstract: Fact-checking on major platforms, such as X, Meta, and TikTok, is shifting from expert-driven verification to a community-based setup, where users contribute explanatory notes to clarify why...
- ProofSketch: Efficient Verified Reasoning for Large Language Models : Abstract: Reasoning methods such as chain-of-thought prompting and self-consistency have shown immense potential to improve the accuracy of large language models across various reasoning tasks. Howeve...
- DualCap: Enhancing Lightweight Image Captioning via Dual Retrieval with Similar Scenes Visual Prompts : Abstract: Recent lightweight retrieval-augmented image caption models often utilize retrieved data solely as text prompts, thereby creating a semantic gap by leaving the original visual features unenh...
- Deep Feature Optimization for Enhanced Fish Freshness Assessment : Abstract: Assessing fish freshness is vital for ensuring food safety and minimizing economic losses in the seafood industry. However, traditional sensory evaluation remains subjective, time-consuming,...
- Perception, Understanding and Reasoning, A Multimodal Benchmark for Video Fake News Detection : Abstract: The advent of multi-modal large language models (MLLMs) has greatly advanced research into applications for Video fake news detection (VFND) tasks. Traditional video-based FND benchmarks typ...
- Towards a Method for Synthetic Generation of PWA Transcripts : Abstract: In aphasia research, Speech-Language Pathologists (SLPs) devote extensive time to manually coding speech samples using Correct Information Units (CIUs), a measure of how informative an indiv...
- SafeEditor: Unified MLLM for Efficient Post-hoc T2I Safety Editing : Abstract: With the rapid advancement of text-to-image (T2I) models, ensuring their safety has become increasingly critical. Existing safety approaches can be categorized into training-time and inferen...
- Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation : Abstract: We propose Ming-Flash-Omni, an upgraded version of Ming-Omni, built upon a sparser Mixture-of-Experts (MoE) variant of Ling-Flash-2.0 with 100 billion total parameters, of which only 6.1 bil...
Research Sources: 461 | Generated: 10/31/2025
