AI RESEARCH PAPERS & ACADEMIC SOURCES
- Fast Gradient Methods for Data-Consistent Local Super-Resolution of Medical Images : Abstract: In this work, we propose a new paradigm of iterative model-based reconstruction algorithms for providing real-time solution for zooming-in and refining a region of interest in medical and cl...
- DualCamCtrl: Dual-Branch Diffusion Model for Geometry-Aware Camera-Controlled Video Generation : Abstract: This paper presents DualCamCtrl, a novel end-to-end diffusion model for camera-controlled video generation. Recent works have advanced this field by representing camera poses as ray-based co...
- InstanceV: Instance-Level Video Generation : Abstract: Recent advances in text-to-video diffusion models have enabled the generation of high-quality videos conditioned on textual descriptions. However, most existing text-to-video models rely sol...
- Cascaded Robust Rectification for Arbitrary Document Images : Abstract: Document rectification in real-world scenarios poses significant challenges due to extreme variations in camera perspectives and physical distortions. Driven by the insight that complex tran...
- Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal Grounding : Abstract: Video Temporal Grounding (VTG) aims to localize a temporal segment in a video corresponding to a natural language query. However, existing VTG models assume that a relevant segment always ex...
- PowerCLIP: Powerset Alignment for Contrastive Pre-Training : Abstract: Contrastive vision-language pre-training frameworks such as CLIP have demonstrated impressive zero-shot performance across a range of vision-language tasks. Recent studies have shown that al...
- Fast Multi-view Consistent 3D Editing with Video Priors : Abstract: Text-driven 3D editing enables user-friendly 3D object or scene editing with text instructions. Due to the lack of multi-view consistency priors, existing methods typically resort to employi...
- GeoWorld: Unlocking the Potential of Geometry Models to Facilitate High-Fidelity 3D Scene Generation : Abstract: Previous works leveraging video models for image-to-3D scene generation tend to suffer from geometric distortions and blurry content. In this paper, we renovate the pipeline of image-to-3D s...
- Pathryoshka: Compressing Pathology Foundation Models via Multi-Teacher Knowledge Distillation with Nested Embeddings : Abstract: Pathology foundation models (FMs) have driven significant progress in computational pathology. However, these high-performing models can easily exceed a billion parameters and produce high-d...
- Zero-Shot Multi-Criteria Visual Quality Inspection for Semi-Controlled Industrial Environments via Real-Time 3D Digital Twin Simulation : Abstract: Early-stage visual quality inspection is vital for achieving Zero-Defect Manufacturing and minimizing production waste in modern industrial environments. However, the complexity of robust vi...
- Instruction Tuning of Large Language Models for Tabular Data Generation-in One Day : Abstract: Tabular instruction tuning has emerged as a promising research direction for improving LLMs understanding of tabular data. However, the majority of existing works only consider question-answ...
- Robust 3DGS-based SLAM via Adaptive Kernel Smoothing : Abstract: In this paper, we challenge the conventional notion in 3DGS-SLAM that rendering quality is the primary determinant of tracking accuracy. We argue that, compared to solely pursuing a perfect ...
- DAONet-YOLOv8: An Occlusion-Aware Dual-Attention Network for Tea Leaf Pest and Disease Detection : Abstract: Accurate detection of tea leaf pests and diseases in real plantations remains challenging due to complex backgrounds, variable illumination, and frequent occlusions among dense branches and ...
- PointCNN++: Performant Convolution on Native Points : Abstract: Existing convolutional learning methods for 3D point cloud data are divided into two paradigms: point-based methods that preserve geometric precision but often face performance challenges, a...
- Language-guided 3D scene synthesis for fine-grained functionality understanding : Abstract: Functionality understanding in 3D, which aims to identify the functional element in a 3D scene to complete an action (e.g., the correct handle to "Open the second drawer of the cabinet near ...
- Unlocking Multilingual Reasoning Capability of LLMs and LVLMs through Representation Engineering : Abstract: Large Language Models (LLMs) and Large Vision-Language Models (LVLMs) demonstrate strong reasoning capabilities, yet their performance in English significantly outperforms that in low-resour...
- Synthetic Industrial Object Detection: GenAI vs. Feature-Based Methods : Abstract: Reducing the burden of data generation and annotation remains a major challenge for the cost-effective deployment of machine learning in industrial and robotics settings. While synthetic ren...
- FACT-GS: Frequency-Aligned Complexity-Aware Texture Reparameterization for 2D Gaussian Splatting : Abstract: Realistic scene appearance modeling has advanced rapidly with Gaussian Splatting, which enables real-time, high-quality rendering. Recent advances introduced per-primitive textures that inco...
- A Perceptually Inspired Variational Framework for Color Enhancement : Abstract: Basic phenomenology of human color vision has been widely taken as an inspiration to devise explicit color correction algorithms. The behavior of these models in terms of significative image...
- UniGeoSeg: Towards Unified Open-World Segmentation for Geospatial Scenes : Abstract: Instruction-driven segmentation in remote sensing generates masks from guidance, offering great potential for accessible and generalizable applications. However, existing methods suffer from...
- Markovian Scale Prediction: A New Era of Visual Autoregressive Generation : Abstract: Visual AutoRegressive modeling (VAR) based on next-scale prediction has revitalized autoregressive visual generation. Although its full-context dependency, i.e., modeling all previous scales...
- A Hierarchical Computer Vision Pipeline for Physiological Data Extraction from Bedside Monitors : Abstract: In many low-resource healthcare settings, bedside monitors remain standalone legacy devices without network connectivity, creating a persistent interoperability gap that prevents seamless in...
- SimScale: Learning to Drive via Real-World Simulation at Scale : Abstract: Achieving fully autonomous driving systems requires learning rational decisions in a wide span of scenarios, including safety-critical and out-of-distribution ones. However, such cases are u...
- DEAL-300K: Diffusion-based Editing Area Localization with a 300K-Scale Dataset and Frequency-Prompted Baseline : Abstract: Diffusion-based image editing has made semantic level image manipulation easy for general users, but it also enables realistic local forgeries that are hard to localize. Existing benchmarks ...
- VQRAE: Representation Quantization Autoencoders for Multimodal Understanding, Generation and Reconstruction : Abstract: Unifying multimodal understanding, generation and reconstruction representation in a single tokenizer remains a key challenge in building unified models. Previous research predominantly atte...
- MANTA: Physics-Informed Generalized Underwater Object Tracking : Abstract: Underwater object tracking is challenging due to wavelength dependent attenuation and scattering, which severely distort appearance across depths and water conditions. Existing trackers trai...
- DisMo: Disentangled Motion Representations for Open-World Motion Transfer : Abstract: Recent advances in text-to-video (T2V) and image-to-video (I2V) models, have enabled the creation of visually compelling and dynamic videos from simple textual descriptions or initial frames...
- Hunyuan-GameCraft-2: Instruction-following Interactive Game World Model : Abstract: Recent advances in generative world models have enabled remarkable progress in creating open-ended game environments, evolving from static scene synthesis toward dynamic, interactive simulat...
- Object-Centric Data Synthesis for Category-level Object Detection : Abstract: Deep learning approaches to object detection have achieved reliable detection of specific object classes in images. However, extending a model's detection capability to new object classes re...
- Visual Generation Tuning : Abstract: Large Vision Language Models (VLMs) effectively bridge the modality gap through extensive pretraining, acquiring sophisticated visual representations aligned with language. However, it remai...
- AnyTalker: Scaling Multi-Person Talking Video Generation with Interactivity Refinement : Abstract: Recently, multi-person video generation has started to gain prominence. While a few preliminary works have explored audio-driven multi-person talking video generation, they often face challe...
- Video-CoM: Interactive Video Reasoning via Chain of Manipulations : Abstract: Recent multimodal large language models (MLLMs) have advanced video understanding, yet most still "think about videos" ie once a video is encoded, reasoning unfolds entirely in text, treatin...
- Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models : Abstract: Reasoning over dynamic visual content remains a central challenge for multimodal large language models. Recent thinking models generate explicit reasoning traces for interpretability; howeve...
- Comparing SAM 2 and SAM 3 for Zero-Shot Segmentation of 3D Medical Data : Abstract: Foundation models for promptable segmentation, including SAM, SAM 2, and the recently released SAM 3, have renewed interest in zero-shot segmentation of medical imaging. Although these model...
- GACELLE: GPU-accelerated tools for model parameter estimation and image reconstruction : Abstract: Quantitative MRI (qMRI) offers tissue-specific biomarkers that can be tracked over time or compared across populations; however, its adoption in clinical research is hindered by significant ...
- FIGROTD: A Friendly-to-Handle Dataset for Image Guided Retrieval with Optional Text : Abstract: Image-Guided Retrieval with Optional Text (IGROT) unifies visual retrieval (without text) and composed retrieval (with text). Despite its relevance in applications like Google Image and Bing...
- ColonAdapter: Geometry Estimation Through Foundation Model Adaptation for Colonoscopy : Abstract: Estimating 3D geometry from monocular colonoscopy images is challenging due to non-Lambertian surfaces, moving light sources, and large textureless regions. While recent 3D geometric foundat...
- UNION: A Lightweight Target Representation for Efficient Zero-Shot Image-Guided Retrieval with Optional Textual Queries : Abstract: Image-Guided Retrieval with Optional Text (IGROT) is a general retrieval setting where a query consists of an anchor image, with or without accompanying text, aiming to retrieve semantically...
- Content Adaptive Encoding For Interactive Game Streaming : Abstract: Video-on-demand streaming has benefitted from \textit{content-adaptive encoding} (CAE), i.e., adaptation of resolution and/or quantization parameters for each scene based on convex hull opti...
- RealD$^2$iff: Bridging Real-World Gap in Robot Manipulation via Depth Diffusion : Abstract: Robot manipulation in the real world is fundamentally constrained by the visual sim2real gap, where depth observations collected in simulation fail to reflect the complex noise patterns inhe...
- Hard Spatial Gating for Precision-Driven Brain Metastasis Segmentation: Addressing the Over-Segmentation Paradox in Deep Attention Networks : Abstract: Brain metastasis segmentation in MRI remains a formidable challenge due to diminutive lesion sizes (5-15 mm) and extreme class imbalance (less than 2% tumor volume). While soft-attention CNN...
- Structure-Preserving Unpaired Image Translation to Photometrically Calibrate JunoCam with Hubble Data : Abstract: Insights into Jupiter's atmospheric dynamics are vital for understanding planetary meteorology and exoplanetary gas giant atmospheres. To study these dynamics, we require high-resolution, ph...
- MARVO: Marine-Adaptive Radiance-aware Visual Odometry : Abstract: Underwater visual localization remains challenging due to wavelength-dependent attenuation, poor texture, and non-Gaussian sensor noise. We introduce MARVO, a physics-aware, learning-integra...
- SUPER-AD: Semantic Uncertainty-aware Planning for End-to-End Robust Autonomous Driving : Abstract: End-to-End (E2E) planning has become a powerful paradigm for autonomous driving, yet current systems remain fundamentally uncertainty-blind. They assume perception outputs are fully reliable...
- Geodiffussr: Generative Terrain Texturing with Elevation Fidelity : Abstract: Large-scale terrain generation remains a labor-intensive task in computer graphics. We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps while strictl...
- DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management : Abstract: Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM syste...
- Total Least Square Optimal Analytic Signal by Structure Tensor for N-D images : Abstract: We produce the analytic signal by using the Structure Tensor, which provides Total Least Squares optimal vectors for estimating orientation and scale locally. Together, these vectors represe...
- Source-free Video Domain Adaptation by Learning from Noisy Labels : Abstract: Despite the progress seen in classification methods, current approaches for handling videos with distribution shifts in source and target domains remain source-dependent as they require acce...
- Configurable Fairness: Direct Optimization of Parity Metrics via Vision-Language Models : Abstract: Performance disparities of image recognition across demographic groups are known to exist in deep learning-based models, due to imbalanced group representations or spurious correlation betwe...
- Enhancing Descriptive Image Quality Assessment with A Large-scale Multi-modal Dataset : Abstract: With the rapid advancement of Vision Language Models (VLMs), VLM-based Image Quality Assessment (IQA) seeks to describe image quality linguistically to align with human expression and captur...
- Neural Octahedral Field: Octahedral prior for simultaneous smoothing and sharp edge regularization : Abstract: Neural implicit representation, the parameterization of a continuous distance function as a Multi-Layer Perceptron (MLP), has emerged as a promising lead in tackling surface reconstruction f...
- PoseAdapt: Sustainable Human Pose Estimation via Continual Learning Benchmarks and Toolkit : Abstract: Human pose estimators are typically retrained from scratch or naively fine-tuned whenever keypoint sets, sensing modalities, or deployment domains change--an inefficient, compute-intensive p...
- DINO-Foresight: Looking into the Future with DINO : Abstract: Predicting future dynamics is crucial for applications like autonomous driving and robotics, where understanding the environment is key. Existing pixel-level methods are computationally expe...
- Accelerating Parallel Diffusion Model Serving with Residual Compression : Abstract: Diffusion models produce realistic images and videos but require substantial computational resources, necessitating multi-accelerator parallelism for real-time deployment. However, parallel ...
- Bridging 3D Deep Learning and Curation for Analysis and High-Quality Segmentation in Practice : Abstract: Accurate 3D microscopy image segmentation is critical for quantitative bioimage analysis but even state-of-the-art foundation models yield error-prone results. Therefore, manual curation is ...
- Creating Blank Canvas Against AI-enabled Image Forgery : Abstract: AIGC-based image editing technology has greatly simplified the realistic-level image modification, causing serious potential risks of image forgery. This paper introduces a new approach to t...
- TTSnap: Test-Time Scaling of Diffusion Models via Noise-Aware Pruning : Abstract: A prominent approach to test-time scaling for text-to-image diffusion models formulates the problem as a search over multiple noise seeds, selecting the one that maximizes a certain image-re...
- Semantic Anchoring for Robust Personalization in Text-to-Image Diffusion Models : Abstract: Text-to-image diffusion models have achieved remarkable progress in generating diverse and realistic images from textual descriptions. However, they still struggle with personalization, whic...
- Toward Diffusible High-Dimensional Latent Spaces: A Frequency Perspective : Abstract: Latent diffusion has become the default paradigm for visual generation, yet we observe a persistent reconstruction-generation trade-off as latent dimensionality increases: higher-capacity au...
- UMind-VL: A Generalist Ultrasound Vision-Language Model for Unified Grounded Perception and Comprehensive Interpretation : Abstract: Despite significant strides in medical foundation models, the ultrasound domain lacks a comprehensive solution capable of bridging low-level Ultrasound Grounded Perception (e.g., segmentatio...
- Can Protective Watermarking Safeguard the Copyright of 3D Gaussian Splatting? : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a powerful representation for 3D scenes, widely adopted due to its exceptional efficiency and high-fidelity visual quality. Given the significant ...
- DriveVGGT: Visual Geometry Transformer for Autonomous Driving : Abstract: Feed-forward reconstruction has recently gained significant attention, with VGGT being a notable example. However, directly applying VGGT to autonomous driving (AD) systems leads to sub-opti...
- The Collapse of Patches : Abstract: Observing certain patches in an image reduces the uncertainty of others. Their realization lowers the distribution entropy of each remaining patch feature, analogous to collapsing a particle...
- Match-and-Fuse: Consistent Generation from Unstructured Image Sets : Abstract: We present Match-and-Fuse - a zero-shot, training-free method for consistent controlled generation of unstructured image sets - collections that share a common visual element, yet differ in ...
- Small Object Detection for Birds with Swin Transformer : Abstract: Object detection is the task of detecting objects in an image. In this task, the detection of small objects is particularly difficult. Other than the small size, it is also accompanied by di...
- Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment : Abstract: Normalizing Flows (NFs) are a class of generative models distinguished by a mathematically invertible architecture, where the forward pass transforms data into a latent space for density est...
- INSIGHT: An Interpretable Neural Vision-Language Framework for Reasoning of Generative Artifacts : Abstract: The growing realism of AI-generated images produced by recent GAN and diffusion models has intensified concerns over the reliability of visual media. Yet, despite notable progress in deepfak...
- AnchorFlow: Training-Free 3D Editing via Latent Anchor-Aligned Flows : Abstract: Training-free 3D editing aims to modify 3D shapes based on human instructions without model finetuning. It plays a crucial role in 3D content creation. However, existing approaches often str...
- UAV-MM3D: A Large-Scale Synthetic Benchmark for 3D Perception of Unmanned Aerial Vehicles with Multi-Modal Data : Abstract: Accurate perception of UAVs in complex low-altitude environments is critical for airspace security and related intelligent systems. Developing reliable solutions requires large-scale, accura...
- DiffStyle360: Diffusion-Based 360{\deg} Head Stylization via Style Fusion Attention : Abstract: 3D head stylization has emerged as a key technique for reimagining realistic human heads in various artistic forms, enabling expressive character design and creative visual experiences in di...
- Wukong's 72 Transformations: High-fidelity Textured 3D Morphing via Flow Models : Abstract: We present WUKONG, a novel training-free framework for high-fidelity textured 3D morphing that takes a pair of source and target prompts (image or text) as input. Unlike conventional methods...
- Fin3R: Fine-tuning Feed-forward 3D Reconstruction Models via Monocular Knowledge Distillation : Abstract: We present Fin3R, a simple, effective, and general fine-tuning method for feed-forward 3D reconstruction models. The family of feed-forward reconstruction model regresses pointmap of all inp...
- SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition : Abstract: Recent advances in skeleton-based action recognition increasingly leverage semantic priors from Large Language Models (LLMs) to enrich skeletal representations. However, the LLM is typically...
- ABounD: Adversarial Boundary-Driven Few-Shot Learning for Multi-Class Anomaly Detection : Abstract: Few-shot multi-class industrial anomaly detection remains a challenging task. Vision-language models need to be both category-adaptive and sharply discriminative, yet data scarcity often blu...
- Do You See What I Say? Generalizable Deepfake Detection based on Visual Speech Recognition : Abstract: Deepfake generation has witnessed remarkable progress, contributing to highly realistic generated images, videos, and audio. While technically intriguing, such progress has raised serious co...
- Beyond Real versus Fake Towards Intent-Aware Video Analysis : Abstract: The rapid advancement of generative models has led to increasingly realistic deepfake videos, posing significant societal and security risks. While existing detection methods focus on distin...
- ITS3D: Inference-Time Scaling for Text-Guided 3D Diffusion Models : Abstract: We explore inference-time scaling in text-guided 3D diffusion models to enhance generative quality without additional training. To this end, we introduce ITS3D, a framework that formulates t...
- Gaussians on Fire: High-Frequency Reconstruction of Flames : Abstract: We propose a method to reconstruct dynamic fire in 3D from a limited set of camera views with a Gaussian-based spatiotemporal representation. Capturing and reconstructing fire and its dynami...
- RoadSceneBench: A Lightweight Benchmark for Mid-Level Road Scene Understanding : Abstract: Understanding mid-level road semantics, which capture the structural and contextual cues that link low-level perception to high-level planning, is essential for reliable autonomous driving a...
- Hybrid, Unified and Iterative: A Novel Framework for Text-based Person Anomaly Retrieval : Abstract: Text-based person anomaly retrieval has emerged as a challenging task, with most existing approaches relying on complex deep-learning techniques. This raises a research question: How can the...
- Rethinking Cross-Generator Image Forgery Detection through DINOv3 : Abstract: As generative models become increasingly diverse and powerful, cross-generator detection has emerged as a new challenge. Existing detection methods often memorize artifacts of specific gener...
- AI killed the video star. Audio-driven diffusion model for expressive talking head generation : Abstract: We propose Dimitra++, a novel framework for audio-driven talking head generation, streamlined to learn lip motion, facial expression, as well as head pose motion. Specifically, we propose a ...
- SciPostGen: Bridging the Gap between Scientific Papers and Poster Layouts : Abstract: As the number of scientific papers continues to grow, there is a demand for approaches that can effectively convey research findings, with posters serving as a key medium for presenting pape...
- Fast3Dcache: Training-free 3D Geometry Synthesis Acceleration : Abstract: Diffusion models have achieved impressive generative quality across modalities like 2D images, videos, and 3D shapes, but their inference remains computationally expensive due to the iterati...
- Diff-ICMH: Harmonizing Machine and Human Vision in Image Compression with Generative Prior : Abstract: Image compression methods are usually optimized isolatedly for human perception or machine analysis tasks. We reveal fundamental commonalities between these objectives: preserving accurate s...
- Bringing Your Portrait to 3D Presence : Abstract: We present a unified framework for reconstructing animatable 3D human avatars from a single portrait across head, half-body, and full-body inputs. Our method tackles three bottlenecks: pose-...
- Text Condition Embedded Regression Network for Automated Dental Abutment Design : Abstract: The abutment is an important part of artificial dental implants, whose design process is time-consuming and labor-intensive. Long-term use of inappropriate dental implant abutments may resul...
- AnoRefiner: Anomaly-Aware Group-Wise Refinement for Zero-Shot Industrial Anomaly Detection : Abstract: Zero-shot industrial anomaly detection (ZSAD) methods typically yield coarse anomaly maps as vision transformers (ViTs) extract patch-level features only. To solve this, recent solutions att...
- MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory : Abstract: We present MG-Nav (Memory-Guided Navigation), a dual-scale framework for zero-shot visual navigation that unifies global memory-guided planning with local geometry-enhanced control. At its c...
- REASONEDIT: Towards Reasoning-Enhanced Image Editing Models : Abstract: Recent advances in image editing models have shown remarkable progress. A common architectural design couples a multimodal large language model (MLLM) encoder with a diffusion decoder, as se...
- GeoZero: Incentivizing Reasoning from Scratch on Geospatial Scenes : Abstract: Multimodal large language models (MLLMs) have undergone rapid development in advancing geospatial scene understanding. Recent studies have sought to enhance the reasoning capabilities of rem...
- Architecture Decoupling Is Not All You Need For Unified Multimodal Model : Abstract: Unified multimodal models for image generation and understanding represent a significant step toward AGI and have attracted widespread attention from researchers. The main challenge of this ...
- VaMP: Variational Multi-Modal Prompt Learning for Vision-Language Models : Abstract: Vision-language models (VLMs), such as CLIP, have shown strong generalization under zero-shot settings, yet adapting them to downstream tasks with limited supervision remains a significant c...
- A deep learning perspective on Rubens' attribution : Abstract: This study explores the use of deep learning for the authentication and attribution of paintings, focusing on the complex case of Peter Paul Rubens and his workshop. A convolutional neural n...
- Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield : Abstract: Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators. Among these, Distribution Matching Distillation (DMD) and its var...
- Emergent Extreme-View Geometry in 3D Foundation Models : Abstract: 3D foundation models (3DFMs) have recently transformed 3D vision, enabling joint prediction of depths, poses, and point maps directly from images. Yet their ability to reason under extreme, ...
- Ar2Can: An Architect and an Artist Leveraging a Canvas for Multi-Human Generation : Abstract: Despite recent advances in text-to-image generation, existing models consistently fail to produce reliable multi-human scenes, often duplicating faces, merging identities, or miscounting ind...
- Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer : Abstract: The landscape of high-performance image generation models is currently dominated by proprietary systems, such as Nano Banana Pro and Seedream 4.0. Leading open-source alternatives, including...
- Splat-SAP: Feed-Forward Gaussian Splatting for Human-Centered Scene with Scale-Aware Point Map Reconstruction : Abstract: We present Splat-SAP, a feed-forward approach to render novel views of human-centered scenes from binocular cameras with large sparsity. Gaussian Splatting has shown its promising potential ...
- Fusion or Confusion? Assessing the impact of visible-thermal image fusion for automated wildlife detection : Abstract: Efficient wildlife monitoring methods are necessary for biodiversity conservation and management. The combination of remote sensing, aerial imagery and deep learning offer promising opportun...
- Alzheimer's Disease Prediction Using EffNetViTLoRA and BiLSTM with Multimodal Longitudinal MRI Data : Abstract: Alzheimer's disease (AD) is a prevalent neurodegenerative disorder that progressively impairs memory, decision-making, and overall cognitive function. As AD is irreversible, early prediction...
- World in a Frame: Understanding Culture Mixing as a New Challenge for Vision-Language Models : Abstract: In a globalized world, cultural elements from diverse origins frequently appear together within a single visual scene. We refer to these as culture mixing scenarios, yet how Large Vision-Lan...
- LC4-DViT: Land-cover Creation for Land-cover Classification with Deformable Vision Transformer : Abstract: Land-cover underpins ecosystem services, hydrologic regulation, disaster-risk reduction, and evidence-based land planning; timely, accurate land-cover maps are therefore critical for environ...
- Captain Safari: A World Engine : Abstract: World engines aim to synthesize long, 3D-consistent videos that support interactive exploration of a scene under user-controlled camera motion. However, existing systems struggle under aggre...
- Some Modalities are More Equal Than Others: Decoding and Architecting Multimodal Integration in MLLMs : Abstract: Despite remarkable advancements in Multimodal Large Language Models (MLLMs), a fundamental question remains: are MLLMs robust to contradicting modalities? To rigorously study this, we introd...
- Breaking the Visual Shortcuts in Multimodal Knowledge-Based Visual Question Answering : Abstract: Existing Multimodal Knowledge-Based Visual Question Answering (MKB-VQA) benchmarks suffer from "visual shortcuts", as the query image typically matches the primary subject entity of the targ...
- Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding : Abstract: Document understanding is a long standing practical task. Vision Language Models (VLMs) have gradually become a primary approach in this domain, demonstrating effective performance on single...
- GLOW: Global Illumination-Aware Inverse Rendering of Indoor Scenes Captured with Dynamic Co-Located Light & Camera : Abstract: Inverse rendering of indoor scenes remains challenging due to the ambiguity between reflectance and lighting, exacerbated by inter-reflections among multiple objects. While natural illuminat...
- CoordSpeaker: Exploiting Gesture Captioning for Coordinated Caption-Empowered Co-Speech Gesture Generation : Abstract: Co-speech gesture generation has significantly advanced human-computer interaction, yet speaker movements remain constrained due to the omission of text-driven non-spontaneous gestures (e.g....
- Scalable Diffusion Transformer for Conditional 4D fMRI Synthesis : Abstract: Generating whole-brain 4D fMRI sequences conditioned on cognitive tasks remains challenging due to the high-dimensional, heterogeneous BOLD dynamics across subjects/acquisitions and the lack...
- CNN-Based Framework for Pedestrian Age and Gender Classification Using Far-View Surveillance in Mixed-Traffic Intersections : Abstract: Pedestrian safety remains a pressing concern in congested urban intersections, particularly in low- and middle-income countries where traffic is multimodal, and infrastructure often lacks fo...
- DM$^3$T: Harmonizing Modalities via Diffusion for Multi-Object Tracking : Abstract: Multi-object tracking (MOT) is a fundamental task in computer vision with critical applications in autonomous driving and robotics. Multimodal MOT that integrates visible light and thermal i...
- From Points to Clouds: Learning Robust Semantic Distributions for Multi-modal Prompts : Abstract: Multimodal Prompt Learning (MPL) has emerged as a pivotal technique for adapting large-scale Visual Language Models (VLMs). However, current MPL methods are fundamentally limited by their op...
- See, Rank, and Filter: Important Word-Aware Clip Filtering via Scene Understanding for Moment Retrieval and Highlight Detection : Abstract: Video moment retrieval (MR) and highlight detection (HD) with natural language queries aim to localize relevant moments and key highlights in a video clips. However, existing methods overloo...
- ViGG: Robust RGB-D Point Cloud Registration using Visual-Geometric Mutual Guidance : Abstract: Point cloud registration is a fundamental task in 3D vision. Most existing methods only use geometric information for registration. Recently proposed RGB-D registration methods primarily foc...
- NeuMatC: A General Neural Framework for Fast Parametric Matrix Operation : Abstract: Matrix operations (e.g., inversion and singular value decomposition (SVD)) are fundamental in science and engineering. In many emerging real-world applications (such as wireless communicatio...
- Robust Image Self-Recovery against Tampering using Watermark Generation with Pixel Shuffling : Abstract: The rapid growth of Artificial Intelligence-Generated Content (AIGC) raises concerns about the authenticity of digital media. In this context, image self-recovery, reconstructing original co...
- Barcode and QR Code Object Detection: An Experimental Study on YOLOv8 Models : Abstract: This research work dives into an in-depth evaluation of the YOLOv8 (You Only Look Once) algorithm's efficiency in object detection, specially focusing on Barcode and QR code recognition. Uti...
- DenoiseGS: Gaussian Reconstruction Model for Burst Denoising : Abstract: Burst denoising methods are crucial for enhancing images captured on handheld devices, but they often struggle with large motion or suffer from prohibitive computational costs. In this paper...
- One-to-All Animation: Alignment-Free Character Animation and Image Pose Transfe : Abstract: Recent advances in diffusion models have greatly improved pose-driven character animation. However, existing methods are limited to spatially aligned reference-pose pairs with matched skelet...
- Do We Need Perfect Data? Leveraging Noise for Domain Generalized Segmentation : Abstract: Domain generalization in semantic segmentation faces challenges from domain shifts, particularly under adverse conditions. While diffusion-based data generation methods show promise, they in...
- RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video : Abstract: Accurate robot segmentation is a fundamental capability for robotic perception. It enables precise visual servoing for VLA systems, scalable robot-centric data augmentation, accurate real-to...
- Contrastive Heliophysical Image Pretraining for Solar Dynamics Observatory Records : Abstract: Deep learning has revolutionized solar image analysis, yet most approaches train task-specific encoders from scratch or rely on natural-image pretraining that ignores the unique characterist...
- HMR3D: Hierarchical Multimodal Representation for 3D Scene Understanding with Large Vision-Language Model : Abstract: Recent advances in large vision-language models (VLMs) have shown significant promise for 3D scene understanding. Existing VLM-based approaches typically align 3D scene features with the VLM...
- Taming the Light: Illumination-Invariant Semantic 3DGS-SLAM : Abstract: Extreme exposure degrades both the 3D map reconstruction and semantic segmentation accuracy, which is particularly detrimental to tightly-coupled systems. To achieve illumination invariance,...
- BlockVid: Block Diffusion for High-Quality and Consistent Minute-Long Video Generation : Abstract: Generating minute-long videos is a critical step toward developing world models, providing a foundation for realistic extended scenes and advanced AI simulators. The emerging semi-autoregres...
- McSc: Motion-Corrective Preference Alignment for Video Generation with Self-Critic Hierarchical Reasoning : Abstract: Text-to-video (T2V) generation has achieved remarkable progress in producing high-quality videos aligned with textual prompts. However, aligning synthesized videos with nuanced human prefere...
- Convolutional Feature Noise Reduction for 2D Cardiac MR Image Segmentation : Abstract: Noise reduction constitutes a crucial operation within Digital Signal Processing. Regrettably, it frequently remains neglected when dealing with the processing of convolutional features in s...
- MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation : Abstract: Recent text-to-image generation models have acquired the ability of multi-reference generation and editing; the ability to inherit the appearance of subjects from multiple reference images a...
- Guiding Visual Autoregressive Models through Spectrum Weakening : Abstract: Classifier-free guidance (CFG) has become a widely adopted and practical approach for enhancing generation quality and improving condition alignment. Recent studies have explored guidance me...
- Optimizer Sensitivity In Vision Transformerbased Iris Recognition: Adamw Vs Sgd Vs Rmsprop : Abstract: The security of biometric authentication is increasingly critical as digital identity systems expand. Iris recognition offers high reliability due to its distinctive and stable texture patte...
- MrGS: Multi-modal Radiance Fields with 3D Gaussian Splatting for RGB-Thermal Novel View Synthesis : Abstract: Recent advances in Neural Radiance Fields (NeRFs) and 3D Gaussian Splatting (3DGS) have achieved considerable performance in RGB scene reconstruction. However, multi-modal rendering that inc...
- JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization : Abstract: Agent-based editing models have substantially advanced interactive experiences, processing quality, and creative flexibility. However, two critical challenges persist: (1) instruction halluc...
- Geometry-Consistent 4D Gaussian Splatting for Sparse-Input Dynamic View Synthesis : Abstract: Gaussian Splatting has been considered as a novel way for view synthesis of dynamic scenes, which shows great potential in AIoT applications such as digital twins. However, recent dynamic Ga...
- GOATex: Geometry & Occlusion-Aware Texturing : Abstract: We present GOATex, a diffusion-based method for 3D mesh texturing that generates high-quality textures for both exterior and interior surfaces. While existing methods perform well on visible...
- Image Valuation in NeRF-based 3D reconstruction : Abstract: Data valuation and monetization are becoming increasingly important across domains such as eXtended Reality (XR) and digital media. In the context of 3D scene reconstruction from a set of im...
- Implementation of a Skin Lesion Detection System for Managing Children with Atopic Dermatitis Based on Ensemble Learning : Abstract: The amendments made to the Data 3 Act and impact of COVID-19 have fostered the growth of digital healthcare market and promoted the use of medical data in artificial intelligence in South Ko...
- NumeriKontrol: Adding Numeric Control to Diffusion Transformers for Instruction-based Image Editing : Abstract: Instruction-based image editing enables intuitive manipulation through natural language commands. However, text instructions alone often lack the precision required for fine-grained control ...
- Analyzing Image Beyond Visual Aspect: Image Emotion Classification via Multiple-Affective Captioning : Abstract: Image emotion classification (IEC) is a longstanding research field that has received increasing attention with the rapid progress of deep learning. Although recent advances have leveraged t...
- DNA-Prior: Unsupervised Denoise Anything via Dual-Domain Prior : Abstract: Medical imaging pipelines critically rely on robust denoising to stabilise downstream tasks such as segmentation and reconstruction. However, many existing denoisers depend on large annotate...
- Bridging the Modality Gap by Similarity Standardization with Pseudo-Positive Samples : Abstract: Advances in vision-language models (VLMs) have enabled effective cross-modality retrieval. However, when both text and images exist in the database, similarity scores would differ in scale b...
- C$^2$DLM: Causal Concept-Guided Diffusion Large Language Models : Abstract: Autoregressive (AR) language models and Diffusion Language Models (DLMs) constitute the two principal paradigms of large language models. However, both paradigms suffer from insufficient rea...
- Lips-Jaw and Tongue-Jaw Articulatory Tradeoff in DYNARTmo : Abstract: This paper investigates how the dynamic articulatory model DYNARTmo accounts for articulatory tradeoffs between primary and secondary articulators, with a focus on lips-jaw and tongue-jaw co...
- RefineBench: Evaluating Refinement Capability of Language Models via Checklists : Abstract: Can language models (LMs) self-refine their own responses? This question is increasingly relevant as a wide range of real-world user interactions involve refinement requests. However, prior ...
- Beyond Query-Level Comparison: Fine-Grained Reinforcement Learning for Text-to-SQL with Automated Interpretable Critiques : Abstract: Text-to-SQL, a pivotal natural language processing (NLP) task that converts textual queries into executable SQL, has seen substantial progress in recent years. However, existing evaluation a...
- Token-Level Marginalization for Multi-Label LLM Classifiers : Abstract: This paper addresses the critical challenge of deriving interpretable confidence scores from generative language models (LLMs) when applied to multi-label content safety classification. Whil...
- Sentiment Analysis Of Shopee Product Reviews Using Distilbert : Abstract: The rapid growth of digital commerce has led to the accumulation of a massive number of consumer reviews on online platforms. Shopee, as one of the largest e-commerce platforms in Southeast ...
- Named Entity Recognition for the Kurdish Sorani Language: Dataset Creation and Comparative Analysis : Abstract: This work contributes towards balancing the inclusivity and global applicability of natural language processing techniques by proposing the first 'name entity recognition' dataset for Kurdis...
- Joint Speech and Text Training for LLM-Based End-to-End Spoken Dialogue State Tracking : Abstract: End-to-end spoken dialogue state tracking (DST) is made difficult by the tandem of having to handle speech input and data scarcity. Combining speech foundation encoders and large language mo...
- Extension Condition "violations" and Merge optimality constraints : Abstract: We analyze, using the mathematical formulation of Merge within the Strong Minimalist Thesis framework, a set of linguistic phenomena, including head-to-head movement, phrasal affixes and syn...
- Smarter, not Bigger: Fine-Tuned RAG-Enhanced LLMs for Automotive HIL Testing : Abstract: Hardware-in-the-Loop (HIL) testing is essential for automotive validation but suffers from fragmented and underutilized test artifacts. This paper presents HIL-GPT, a retrieval-augmented gen...
- Improving LLM-based Ontology Matching with fine-tuning on synthetic data : Abstract: Large Language Models (LLMs) are increasingly being integrated into various components of Ontology Matching pipelines. This paper investigates the capability of LLMs to perform ontology matc...
- Modeling Romanized Hindi and Bengali: Dataset Creation and Multilingual LLM Integration : Abstract: The development of robust transliteration techniques to enhance the effectiveness of transforming Romanized scripts into native scripts is crucial for Natural Language Processing tasks, incl...
- RAG System for Supporting Japanese Litigation Procedures: Faithful Response Generation Complying with Legal Norms : Abstract: This study discusses the essential components that a Retrieval-Augmented Generation (RAG)-based LLM system should possess in order to support Japanese medical litigation procedures complying...
- JBE-QA: Japanese Bar Exam QA Dataset for Assessing Legal Domain Knowledge : Abstract: We introduce JBE-QA, a Japanese Bar Exam Question-Answering dataset to evaluate large language models' legal knowledge. Derived from the multiple-choice (tanto-shiki) section of the Japanese...
- FEANEL: A Benchmark for Fine-Grained Error Analysis in K-12 English Writing : Abstract: Large Language Models (LLMs) have transformed artificial intelligence, offering profound opportunities for educational applications. However, their ability to provide fine-grained educationa...
- Visual Puns from Idioms: An Iterative LLM-T2IM-MLLM Framework : Abstract: We study idiom-based visual puns--images that align an idiom's literal and figurative meanings--and present an iterative framework that coordinates a large language model (LLM), a text-to-im...
- Training-Free Loosely Speculative Decoding: Accepting Semantically Correct Drafts Beyond Exact Match : Abstract: Large language models (LLMs) achieve strong performance across diverse tasks but suffer from high inference latency due to their autoregressive generation. Speculative Decoding (SPD) mitigat...
- ShoppingComp: Are LLMs Really Ready for Your Shopping Cart? : Abstract: We present ShoppingComp, a challenging real-world benchmark for rigorously evaluating LLM-powered shopping agents on three core capabilities: precise product retrieval, expert-level report g...
- Social Perceptions of English Spelling Variation on Twitter: A Comparative Analysis of Human and LLM Responses : Abstract: Spelling variation (e.g. funnnn vs. fun) can influence the social perception of texts and their writers: we often have various associations with different forms of writing (is the text infor...
- Decoding the Past: Explainable Machine Learning Models for Dating Historical Texts : Abstract: Accurately dating historical texts is essential for organizing and interpreting cultural heritage collections. This article addresses temporal text classification using interpretable, featur...
- Accent Placement Models for Rigvedic Sanskrit Text : Abstract: The Rigveda, among the oldest Indian texts in Vedic Sanskrit, employs a distinctive pitch-accent system : udātta, anudātta, svarita whose marks encode melodic and interpretive cues but are o...
- Dripper: Token-Efficient Main HTML Extraction with a Lightweight LM : Abstract: Accurately and efficiently extracting main content from general web pages is of great significance for obtaining training data for large models. Using well-pre-trained decoder-only generativ...
- Are LLMs Good Safety Agents or a Propaganda Engine? : Abstract: Large Language Models (LLMs) are trained to refuse to respond to harmful content. However, systematic analyses of whether this behavior is truly a reflection of its safety policies or an ind...
- Behavior-Equivalent Token: Single-Token Replacement for Long Prompts in LLMs : Abstract: Carefully engineered system prompts play a critical role in guiding the behavior of LLM agents, but their considerable length introduces significant drawbacks, including increased inference ...
- MCP vs RAG vs NLWeb vs HTML: A Comparison of the Effectiveness and Efficiency of Different Agent Interfaces to the Web (Technical Report) : Abstract: Large language model agents are increasingly used to automate web tasks such as product search, offer comparison, and checkout. Current research explores different interfaces through which t...
- Tackling a Challenging Corpus for Early Detection of Gambling Disorder: UNSL at MentalRiskES 2025 : Abstract: Gambling disorder is a complex behavioral addiction that is challenging to understand and address, with severe physical, psychological, and social consequences. Early Risk Detection (ERD) on...
- Scaling HuBERT for African Languages: From Base to Large and XL : Abstract: Despite recent progress in multilingual speech processing, African languages remain under-represented in both research and deployed systems, particularly when it comes to strong, open-weight...
- Optimizing Multimodal Language Models through Attention-based Interpretability : Abstract: Modern large language models become multimodal, analyzing various data formats like text and images. While fine-tuning is effective for adapting these multimodal language models (MLMs) to do...
- Ambiguity Awareness Optimization: Towards Semantic Disambiguation for Direct Preference Optimization : Abstract: Direct Preference Optimization (DPO) is a widely used reinforcement learning from human feedback (RLHF) method across various domains. Recent research has increasingly focused on the role of...
- PAT: Accelerating LLM Decoding via Prefix-Aware Attention with Resource Efficient Multi-Tile Kernel : Abstract: LLM serving is increasingly dominated by decode attention, which is a memory-bound operation due to massive KV cache loading from global memory. Meanwhile, real-world workloads exhibit subst...
- Mechanistic Finetuning of Vision-Language-Action Models via Few-Shot Demonstrations : Abstract: Vision-Language Action (VLAs) models promise to extend the remarkable success of vision-language models (VLMs) to robotics. Yet, unlike VLMs in the vision-language domain, VLAs for robotics ...
- PRISM: Privacy-Aware Routing for Adaptive Cloud-Edge LLM Inference via Semantic Sketch Collaboration : Abstract: Large Language Models (LLMs) demonstrate impressive capabilities in natural language understanding and generation, but incur high communication overhead and privacy risks in cloud deployment...
- Artwork Interpretation with Vision Language Models: A Case Study on Emotions and Emotion Symbols : Abstract: Emotions are a fundamental aspect of artistic expression. Due to their abstract nature, there is a broad spectrum of emotion realization in artworks. These are subject to historical change a...
- Is Passive Expertise-Based Personalization Enough? A Case Study in AI-Assisted Test-Taking : Abstract: Novice and expert users have different systematic preferences in task-oriented dialogues. However, whether catering to these preferences actually improves user experience and task performanc...
- AutoHall: Automated Factuality Hallucination Dataset Generation for Large Language Models : Abstract: Large language models (LLMs) have gained broad applications across various domains but still struggle with hallucinations. Currently, hallucinations occur frequently in the generation of fac...
- Fine-grained and Explainable Factuality Evaluation for Multimodal Summarization : Abstract: Multimodal summarization aims to generate a concise summary based on the input text and image. However, the existing methods potentially suffer from unfactual output. To evaluate the factual...
- Linguistically-Controlled Paraphrase Generation : Abstract: Controlled paraphrase generation produces paraphrases that preserve meaning while allowing precise control over linguistic attributes of the output. We introduce LingConv, an encoder-decoder...
- More Documents, Same Length: Isolating the Challenge of Multiple Documents in RAG : Abstract: Retrieval-Augmented Generation (RAG) enhances the accuracy of Large Language Model (LLM) responses by leveraging relevant external documents during generation. Although previous studies note...
- TrackList: Tracing Back Query Linguistic Diversity for Head and Tail Knowledge in Open Large Language Models : Abstract: Large Language Models (LLMs) have proven efficient in giving definition-type answers to user input queries. While for humans giving various types of answers, such as examples and paraphrases...
- Exploring the Human-LLM Synergy in Advancing Theory-driven Qualitative Analysis : Abstract: Qualitative coding is a demanding yet crucial research method in the field of Human-Computer Interaction (HCI). While recent studies have shown the capability of large language models (LLMs)...
- UniArt: Unified 3D Representation for Generating 3D Articulated Objects with Open-Set Articulation : Abstract: Articulated 3D objects play a vital role in realistic simulation and embodied robotics, yet manually constructing such assets remains costly and difficult to scale. In this paper, we present...
- Interpretable Multimodal Cancer Prototyping with Whole Slide Images and Incompletely Paired Genomics : Abstract: Multimodal approaches that integrate histology and genomics hold strong potential for precision oncology. However, phenotypic and genotypic heterogeneity limits the quality of intra-modal re...
- AmodalGen3D: Generative Amodal 3D Object Reconstruction from Sparse Unposed Views : Abstract: Reconstructing 3D objects from a few unposed and partially occluded views is a common yet challenging problem in real-world scenarios, where many object surfaces are never directly observed....
- TAPVid-360: Tracking Any Point in 360 from Narrow Field of View Video : Abstract: Humans excel at constructing panoramic mental models of their surroundings, maintaining object permanence and inferring scene structure beyond visible regions. In contrast, current artificia...
- PAT3D: Physics-Augmented Text-to-3D Scene Generation : Abstract: We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision-language models with physics-based simulation to produce physically plausible, si...
- PPBoost: Progressive Prompt Boosting for Text-Driven Medical Image Segmentation : Abstract: Text-prompted foundation models for medical image segmentation offer an intuitive way to delineate anatomical structures from natural language queries, but their predictions often lack spati...
- Can Multi-Modal LLMs Provide Live Step-by-Step Task Guidance? : Abstract: Multi-modal Large Language Models (LLM) have advanced conversational abilities but struggle with providing live, interactive step-by-step guidance, a key capability for future AI assistants....
- StreamFlow: Theory, Algorithm, and Implementation for High-Efficiency Rectified Flow Generation : Abstract: New technologies such as Rectified Flow and Flow Matching have significantly improved the performance of generative models in the past two years, especially in terms of control accuracy, gen...
- Intra-Class Probabilistic Embeddings for Uncertainty Estimation in Vision-Language Models : Abstract: Vision-language models (VLMs), such as CLIP, have gained popularity for their strong open vocabulary classification performance, but they are prone to assigning high confidence scores to mis...
- Layover or Direct Flight: Rethinking Audio-Guided Image Segmentation : Abstract: Understanding human instructions is essential for enabling smooth human-robot interaction. In this work, we focus on object grounding, i.e., localizing an object of interest in a visual scen...
- PAGen: Phase-guided Amplitude Generation for Domain-adaptive Object Detection : Abstract: Unsupervised domain adaptation (UDA) greatly facilitates the deployment of neural networks across diverse environments. However, most state-of-the-art approaches are overly complex, relying ...
- SparseWorld-TC: Trajectory-Conditioned Sparse Occupancy World Model : Abstract: This paper introduces a novel architecture for trajectory-conditioned forecasting of future 3D scene occupancy. In contrast to methods that rely on variational autoencoders (VAEs) to generat...
- TPCNet: Triple physical constraints for Low-light Image Enhancement : Abstract: Low-light image enhancement is an essential computer vision task to improve image contrast and to decrease the effects of color bias and noise. Many existing interpretable deep-learning algo...
- OralGPT-Omni: A Versatile Dental Multimodal Large Language Model : Abstract: Multimodal Large Language Models (MLLMs) have exhibited immense potential across numerous medical specialties; yet, dentistry remains underexplored, in part due to limited domain-specific da...
- DNA: Dual-branch Network with Adaptation for Open-Set Online Handwriting Generation : Abstract: Online handwriting generation (OHG) enhances handwriting recognition models by synthesizing diverse, human-like samples. However, existing OHG methods struggle to generate unseen characters,...
- WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation : Abstract: Video diffusion models have recently achieved remarkable progress in realism and controllability. However, achieving seamless video translation across different perspectives, such as first-p...
- MoE3D: Mixture of Experts meets Multi-Modal 3D Understanding : Abstract: Multi-modal 3D understanding is a fundamental task in computer vision. Previous multi-modal fusion methods typically employ a single, dense fusion network, struggling to handle the significa...
- HyperST: Hierarchical Hyperbolic Learning for Spatial Transcriptomics Prediction : Abstract: Spatial Transcriptomics (ST) merges the benefits of pathology images and gene expression, linking molecular profiles with tissue structure to analyze spot-level function comprehensively. Pre...
- PROMPTMINER: Black-Box Prompt Stealing against Text-to-Image Generative Models via Reinforcement Learning and Fuzz Optimization : Abstract: Text-to-image (T2I) generative models such as Stable Diffusion and FLUX can synthesize realistic, high-quality images directly from textual prompts. The resulting image quality depends criti...
- GoPrune: Accelerated Structured Pruning with $\ell_{2,p}$-Norm Optimization : Abstract: Convolutional neural networks (CNNs) suffer from rapidly increasing storage and computational costs as their depth grows, which severely hinders their deployment on resource-constrained edge...
- Cue3D: Quantifying the Role of Image Cues in Single-Image 3D Generation : Abstract: Humans and traditional computer vision methods rely on a diverse set of monocular cues to infer 3D structure from a single image, such as shading, texture, silhouette, etc. While recent deep...
- GA2-CLIP: Generic Attribute Anchor for Efficient Prompt Tuningin Video-Language Models : Abstract: Visual and textual soft prompt tuning can effectively improve the adaptability of Vision-Language Models (VLMs) in downstream tasks. However, fine-tuning on video tasks impairs the model's g...
- DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action : Abstract: To build a generalizable Vision-Language-Action (VLA) model with strong reasoning ability, a common strategy is to first train a specialist VLA on robot demonstrations to acquire reliable ma...
- EASL: Multi-Emotion Guided Semantic Disentanglement for Expressive Sign Language Generation : Abstract: Large language models have revolutionized sign language generation by automatically transforming text into high-quality sign language videos, providing accessible communication for the Deaf ...
- SemOD: Semantic Enabled Object Detection Network under Various Weather Conditions : Abstract: In the field of autonomous driving, camera-based perception models are mostly trained on clear weather data. Models that focus on addressing specific weather challenges are unable to adapt t...
- Partially Shared Concept Bottleneck Models : Abstract: Concept Bottleneck Models (CBMs) enhance interpretability by introducing a layer of human-understandable concepts between inputs and predictions. While recent methods automate concept genera...
- BrepGPT: Autoregressive B-rep Generation with Voronoi Half-Patch : Abstract: Boundary representation (B-rep) is the de facto standard for CAD model representation in modern industrial design. The intricate coupling between geometric and topological elements in B-rep ...
- Guiding the Inner Eye: A Framework for Hierarchical and Flexible Visual Grounded Reasoning : Abstract: Models capable of "thinking with images" by dynamically grounding their reasoning in visual evidence represent a major leap in multimodal AI. However, replicating and advancing this ability ...
- Shoe Style-Invariant and Ground-Aware Learning for Dense Foot Contact Estimation : Abstract: Foot contact plays a critical role in human interaction with the world, and thus exploring foot contact can advance our understanding of human movement and physical interaction. Despite its ...
- HybridWorldSim: A Scalable and Controllable High-fidelity Simulator for Autonomous Driving : Abstract: Realistic and controllable simulation is critical for advancing end-to-end autonomous driving, yet existing approaches often struggle to support novel view synthesis under large viewpoint ch...
- Controllable 3D Object Generation with Single Image Prompt : Abstract: Recently, the impressive generative capabilities of diffusion models have been demonstrated, producing images with remarkable fidelity. Particularly, existing methods for the 3D object gener...
- IE-SRGS: An Internal-External Knowledge Fusion Framework for High-Fidelity 3D Gaussian Splatting Super-Resolution : Abstract: Reconstructing high-resolution (HR) 3D Gaussian Splatting (3DGS) models from low-resolution (LR) inputs remains challenging due to the lack of fine-grained textures and geometry. Existing me...
- Graph Laplacian-based Bayesian Multi-fidelity Modeling : Abstract: We present a novel probabilistic approach for generating multi-fidelity data while accounting for errors inherent in both low- and high-fidelity data. In this approach a graph Laplacian cons...
- Rapid optimization in high dimensional space by deep kernel learning augmented genetic algorithms : Abstract: Exploration of complex high-dimensional spaces presents significant challenges in fields such as molecular discovery, process optimization, and supply chain management. Genetic Algorithms (G...
- Interpretability for Time Series Transformers using A Concept Bottleneck Framework : Abstract: Mechanistic interpretability focuses on reverse engineering the internal mechanisms learned by neural networks. We extend our focus and propose to mechanistically forward engineer using our ...
- Predicting Market Trends with Enhanced Technical Indicator Integration and Classification Models : Abstract: Thanks to the high potential for profit, trading has become increasingly attractive to investors as the cryptocurrency and stock markets rapidly expand. However, because financial markets ar...
- Anomaly Resilient Temporal QoS Prediction using Hypergraph Convoluted Transformer Network : Abstract: Quality-of-Service (QoS) prediction is a critical task in the service lifecycle, enabling precise and adaptive service recommendations by anticipating performance variations over time in res...
- Linearly Constrained Diffusion Implicit Models : Abstract: We introduce Linearly Constrained Diffusion Implicit Models (CDIM), a fast and accurate approach to solving noisy linear inverse problems using diffusion models. Traditional diffusion-based ...
- A Trio Neural Model for Dynamic Entity Relatedness Ranking : Abstract: Measuring entity relatedness is a fundamental task for many natural language processing and information retrieval applications. Prior work often studies entity relatedness in static settings...
- Self-concordant smoothing in proximal quasi-Newton algorithms for large-scale convex composite optimization : Abstract: We introduce a notion of self-concordant smoothing for minimizing the sum of two convex functions, one of which is smooth and the other nonsmooth. The key highlight is a natural property of ...
- Improved Generalization Bounds for Transductive Learning by Transductive Local Complexity and Its Applications : Abstract: We introduce Transductive Local Complexity (TLC) as a new tool for analyzing the generalization performance of transductive learning methods. Our work extends the classical Local Rademacher ...
- Fast multiplication by two's complement addition of numbers represented as a set of polynomial radix 2 indexes, stored as an integer list for massively parallel computation : Abstract: We demonstrate a multiplication method based on numbers represented as set of polynomial radix 2 indices stored as an integer list. The 'polynomial integer index multiplication' method is a ...
- Split Conformal Prediction under Data Contamination : Abstract: Conformal prediction is a non-parametric technique for constructing prediction intervals or sets from arbitrary predictive models under the assumption that the data is exchangeable. It is po...
- Detecting Masquerade Attacks in Controller Area Networks Using Graph Machine Learning : Abstract: Modern vehicles rely on a myriad of electronic control units (ECUs) interconnected via controller area networks (CANs) for critical operations. Despite their ubiquitous use and reliability, ...
- A Proximal Modified Quasi-Newton Method for Nonsmooth Regularized Optimization : Abstract: We develop R2N, a modified quasi-Newton method for minimizing the sum of a $\mathcal{C}^1$ function $f$ and a lower semi-continuous prox-bounded $h$. Both $f$ and $h$ may be nonconvex. At ea...
- The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks : Abstract: We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive ...
- Learning and composing of classical music using restricted Boltzmann machines : Abstract: We investigate how machine learning models acquire the ability to compose music and how musical information is internally represented within such models. We develop a composition algorithm b...
- Neural Audio Codecs for Prompt-Driven Universal Sound Separation : Abstract: Text-guided sound separation supports flexible audio editing across media and assistive applications, but existing models like AudioSep are too compute-heavy for edge deployment. Neural audi...
- JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction : Abstract: Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. To address this issue, ...
- On the Cross-lingual Transferability of Pre-trained wav2vec2-based Models : Abstract: Using representations provided by a large pre-trained model has become the primary strategy for achieving state-of-the-art results in a wide range of tasks. A recently proposed large pre-tra...
- Insight-A: Attribution-aware for Multimodal Misinformation Detection : Abstract: AI-generated content (AIGC) technology has emerged as a prevalent alternative to create multimodal misinformation on social media platforms, posing unprecedented threats to societal safety. ...
- An Optimized Machine Learning Classifier for Detecting Fake Reviews Using Extracted Features : Abstract: It is well known that fraudulent reviews cast doubt on the legitimacy and dependability of online purchases. The most recent development that leads customers towards darkness is the appearan...
- CrossCheck-Bench: Diagnosing Compositional Failures in Multimodal Conflict Resolution : Abstract: Multimodal Large Language Models are primarily trained and evaluated on aligned image-text pairs, which leaves their ability to detect and resolve real-world inconsistencies largely unexplor...
- When Harmless Words Harm: A New Threat to LLM Safety via Conceptual Triggers : Abstract: Recent research on large language model (LLM) jailbreaks has primarily focused on techniques that bypass safety mechanisms to elicit overtly harmful outputs. However, such efforts often over...
- AD-CDO: A Lightweight Ontology for Representing Eligibility Criteria in Alzheimer's Disease Clinical Trials : Abstract: Objective This study introduces the Alzheimer's Disease Common Data Element Ontology for Clinical Trials (AD-CDO), a lightweight, semantically enriched ontology designed to represent and s...
- Scaling Competence, Shrinking Reasoning: Cognitive Signatures in Language Model Learning : Abstract: We analyze reasoning in language models during task-specific fine-tuning and draws parallel between reasoning tokens--intermediate steps generated while solving problem and the human working...
- DELTA: Language Diffusion-based EEG-to-Text Architecture : Abstract: Electroencephalogram (EEG)-to-text remains challenging due to high-dimensional noise, subject variability, and error accumulation in autoregressive decoding. We introduce DELTA, which pairs ...
- Dissecting the Ledger: Locating and Suppressing "Liar Circuits" in Financial Large Language Models : Abstract: Large Language Models (LLMs) are increasingly deployed in high-stakes financial domains, yet they suffer from specific, reproducible hallucinations when performing arithmetic operations. Cur...
- LLMs for Low-Resource Dialect Translation Using Context-Aware Prompting: A Case Study on Sylheti : Abstract: Large Language Models (LLMs) have demonstrated strong translation abilities through prompting, even without task-specific training. However, their effectiveness in dialectal and low-resource...
- A Customer Journey in the Land of Oz: Leveraging the Wizard of Oz Technique to Model Emotions in Customer Service Interactions : Abstract: Emotion-aware customer service needs in-domain conversational data, rich annotations, and predictive capabilities, but existing resources for emotion recognition are often out-of-domain, nar...
- Tracing How Annotators Think: Augmenting Preference Judgments with Reading Processes : Abstract: We propose an annotation approach that captures not only labels but also the reading process underlying annotators' decisions, e.g., what parts of the text they focus on, re-read or skim. Us...
- A Comparative Study of LLM Prompting and Fine-Tuning for Cross-genre Authorship Attribution on Chinese Lyrics : Abstract: We propose a novel study on authorship attribution for Chinese lyrics, a domain where clean, public datasets are sorely lacking. Our contributions are twofold: (1) we create a new, balanced ...
- Start Making Sense(s): A Developmental Probe of Attention Specialization Using Lexical Ambiguity : Abstract: Despite an in-principle understanding of self-attention matrix operations in Transformer language models (LMs), it remains unclear precisely how these operations map onto interpretable compu...
- Early Risk Prediction with Temporally and Contextually Grounded Clinical Language Processing : Abstract: Clinical notes in Electronic Health Records (EHRs) capture rich temporal information on events, clinician reasoning, and lifestyle factors often missing from structured data. Leveraging them...
- A Hybrid Theory and Data-driven Approach to Persuasion Detection with Large Language Models : Abstract: Traditional psychological models of belief revision focus on face-to-face interactions, but with the rise of social media, more effective models are needed to capture belief revision at scal...
- CORGI: GNNs with Convolutional Residual Global Interactions for Lagrangian Simulation : Abstract: Partial differential equations (PDEs) are central to dynamical systems modeling, particularly in hydrodynamics, where traditional solvers often struggle with nonlinearity and computational c...
- Experts are all you need: A Composable Framework for Large Language Model Inference : Abstract: Large Language Models (LLMs) have achieved state-of-the-art accuracies in a variety of natural language processing (NLP) tasks. However, this success comes at the cost of increased model siz...
- A Trainable Centrality Framework for Modern Data : Abstract: Measuring how central or typical a data point is underpins robust estimation, ranking, and outlier detection, but classical depth notions become expensive and unstable in high dimensions and...
- A Modular Framework for Rapidly Building Intrusion Predictors : Abstract: We study automated intrusion prediction in an IT system using statistical learning methods. The focus is on developing online attack predictors that detect attacks in real time and identify ...
- Masked Diffusion for Generative Recommendation : Abstract: Generative recommendation (GR) with semantic IDs (SIDs) has emerged as a promising alternative to traditional recommendation approaches due to its performance gains, capitalization on semant...
- Spectral Concentration at the Edge of Stability: Information Geometry of Kernel Associative Memory : Abstract: High-capacity kernel Hopfield networks exhibit a "Ridge of Optimization" characterized by extreme stability. While previously linked to "Spectral Concentration," its origin remains elusive. ...
- Freeze, Diffuse, Decode: Geometry-Aware Adaptation of Pretrained Transformer Embeddings for Antimicrobial Peptide Design : Abstract: Pretrained transformers provide rich, general-purpose embeddings, which are transferred to downstream tasks. However, current transfer strategies: fine-tuning and probing, either distort the...
- Automated Discovery of Laser Dicing Processes with Bayesian Optimization for Semiconductor Manufacturing : Abstract: Laser dicing of semiconductor wafers is a critical step in microelectronic manufacturing, where multiple sequential laser passes precisely separate individual dies from the wafer. Adapting t...
- Adapting Neural Audio Codecs to EEG : Abstract: EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting p...
- A Theoretical Framework for Discovering Groups and Unitary Representations via Tensor Factorization : Abstract: We analyze the HyperCube model, an \textit{operator-valued} tensor factorization architecture that discovers group structures and their unitary representations. We provide a rigorous theoret...
- Estimating the Event-Related Potential from Few EEG Trials : Abstract: Event-related potentials (ERP) are measurements of brain activity with wide applications in basic and clinical neuroscience, that are typically estimated using the average of many trials of ...
- Energy-Efficient Vision Transformer Inference for Edge-AI Deployment : Abstract: The growing deployment of Vision Transformers (ViTs) on energy-constrained devices requires evaluation methods that go beyond accuracy alone. We present a two-stage pipeline for assessing Vi...
- SDE-Attention: Latent Attention in SDE-RNNs for Irregularly Sampled Time Series with Missing Data : Abstract: Irregularly sampled time series with substantial missing observations are common in healthcare and sensor networks. We introduce SDE-Attention, a family of SDE-RNNs equipped with channel-lev...
- Towards Understanding Transformers in Learning Random Walks : Abstract: Transformers have proven highly effective across various applications, especially in handling sequential data such as natural languages and time series. However, transformer models often lac...
- Heteroscedastic Neural Networks for Path Loss Prediction with Link-Specific Uncertainty : Abstract: Traditional and modern machine learning-based path loss models typically assume a constant prediction variance. We propose a neural network that jointly predicts the mean and link-specific v...
- An Improved and Generalised Analysis for Spectral Clustering : Abstract: We revisit the theoretical performances of Spectral Clustering, a classical algorithm for graph partitioning that relies on the eigenvectors of a matrix representation of the graph. Informal...
- BanglaSentNet: An Explainable Hybrid Deep Learning Framework for Multi-Aspect Sentiment Analysis with Cross-Domain Transfer Learning : Abstract: Multi-aspect sentiment analysis of Bangla e-commerce reviews remains challenging due to limited annotated datasets, morphological complexity, code-mixing phenomena, and domain shift issues, ...
- Beyond Curve Fitting: Neuro-Symbolic Agents for Context-Aware Epidemic Forecasting : Abstract: Effective surveillance of hand, foot and mouth disease (HFMD) requires forecasts accounting for epidemiological patterns and contextual drivers like school calendars and weather. While class...
- Closing the Generalization Gap in Parameter-efficient Federated Edge Learning : Abstract: Federated edge learning (FEEL) provides a promising foundation for edge artificial intelligence (AI) by enabling collaborative model training while preserving data privacy. However, limited ...
- Transformer-Driven Triple Fusion Framework for Enhanced Multimodal Author Intent Classification in Low-Resource Bangla : Abstract: The expansion of the Internet and social networks has led to an explosion of user-generated content. Author intent understanding plays a crucial role in interpreting social media content. Th...
- Emergent Coordination and Phase Structure in Independent Multi-Agent Reinforcement Learning : Abstract: A clearer understanding of when coordination emerges, fluctuates, or collapses in decentralized multi-agent reinforcement learning (MARL) is increasingly sought in order to characterize the ...
- Distributed Dynamic Associative Memory via Online Convex Optimization : Abstract: An associative memory (AM) enables cue-response recall, and it has recently been recognized as a key mechanism underlying modern neural architectures such as Transformers. In this work, we i...
- Learning-Augmented Online Bipartite Matching in the Random Arrival Order Model : Abstract: We study the online unweighted bipartite matching problem in the random arrival order model, with $n$ offline and $n$ online vertices, in the learning-augmented setting: The algorithm is pro...
- Quantized-Tinyllava: a new multimodal foundation model enables efficient split learning : Abstract: Split learning is well known as a method for resolving data privacy concerns by training a model on distributed devices, thereby avoiding data sharing that raises privacy issues. However, hi...
- Accelerated Execution of Bayesian Neural Networks using a Single Probabilistic Forward Pass and Code Generation : Abstract: Machine learning models perform well across domains such as diagnostics, weather forecasting, NLP, and autonomous driving, but their limited uncertainty handling restricts use in safety-crit...
- Provable Benefits of Sinusoidal Activation for Modular Addition : Abstract: This paper studies the role of activation functions in learning modular addition with two-layer neural networks. We first establish a sharp expressivity gap: sine MLPs admit width-$2$ exact ...
- SmallWorlds: Assessing Dynamics Understanding of World Models in Isolated Environments : Abstract: Current world models lack a unified and controlled setting for systematic evaluation, making it difficult to assess whether they truly capture the underlying rules that govern environment dy...
- ThetaEvolve: Test-time Learning on Open Problems : Abstract: Recent advances in large language models (LLMs) have enabled breakthroughs in mathematical discovery, exemplified by AlphaEvolve, a closed-source system that evolves programs to improve boun...
- 47B Mixture-of-Experts Beats 671B Dense Models on Chinese Medical Examinations : Abstract: The rapid advancement of large language models(LLMs) has prompted significant interest in their potential applications in medical domains. This paper presents a comprehensive benchmark evalu...
- Addressing Stereotypes in Large Language Models: A Critical Examination and Mitigation : Abstract: Large Language models (LLMs), such as ChatGPT, have gained popularity in recent years with the advancement of Natural Language Processing (NLP), with use cases spanning many disciplines and ...
- DNNs, Dataset Statistics, and Correlation Functions : Abstract: This paper argues that dataset structure is important in image recognition tasks (among other tasks). Specifically, we focus on the nature and genesis of correlational structure in the actua...
- PeerCoPilot: A Language Model-Powered Assistant for Behavioral Health Organizations : Abstract: Behavioral health conditions, which include mental health and substance use disorders, are the leading disease burden in the United States. Peer-run behavioral health organizations (PROs) cr...
- A Multiscale Geometric Method for Capturing Relational Topic Alignment : Abstract: Interpretable topic modeling is essential for tracking how research interests evolve within co-author communities. In scientific corpora, where novelty is prized, identifying underrepresente...
- Orchestrating Dual-Boundaries: An Arithmetic Intensity Inspired Acceleration Framework for Diffusion Language Models : Abstract: Diffusion-based large language models (dLLMs) have recently gained significant attention for their exceptional performance and inherent potential for parallel decoding. Existing frameworks f...
- Automated Statistical and Machine Learning Platform for Biological Research : Abstract: Research increasingly relies on computational methods to analyze experimental data and predict molecular properties. Current approaches often require researchers to use a variety of tools fo...
- Beyond Membership: Limitations of Add/Remove Adjacency in Differential Privacy : Abstract: Training machine learning models with differential privacy (DP) limits an adversary's ability to infer sensitive information about the training data. It can be interpreted as a bound on adve...
- Saddle-Free Guidance: Improved On-Manifold Sampling without Labels or Additional Training : Abstract: Score-based generative models require guidance in order to generate plausible, on-manifold samples. The most popular guidance method, Classifier-Free Guidance (CFG), is only applicable in se...
- Invited to Develop: Institutional Belonging and the Counterfactual Architecture of Development : Abstract: This paper examines how institutional belonging shapes long-term development by comparing Spain and Uruguay, two small democracies with similar historical endowments whose trajectories diver...
- Differential privacy from axioms : Abstract: Differential privacy (DP) is the de facto notion of privacy both in theory and in practice. However, despite its popularity, DP imposes strict requirements which guard against strong worst-c...
- Sparse Multiple Kernel Learning: Alternating Best Response and Semidefinite Relaxations : Abstract: We study Sparse Multiple Kernel Learning (SMKL), which is the problem of selecting a sparse convex combination of prespecified kernels for support vector binary classification. Unlike prevai...
- Algorithms and Scientific Software for Quasi-Monte Carlo, Fast Gaussian Process Regression, and Scientific Machine Learning : Abstract: Most scientific domains elicit the development of efficient algorithms and accessible scientific software. This thesis unifies our developments in three broad domains: Quasi-Monte Carlo (QMC...
- Digital Elevation Model Estimation from RGB Satellite Imagery using Generative Deep Learning : Abstract: Digital Elevation Models (DEMs) are vital datasets for geospatial applications such as hydrological modeling and environmental monitoring. However, conventional methods to generate DEM, such...
- A Sensitivity Approach to Causal Inference Under Limited Overlap : Abstract: Limited overlap between treated and control groups is a key challenge in observational analysis. Standard approaches like trimming importance weights can reduce variance but introduce a fund...
- On the Effect of Regularization on Nonparametric Mean-Variance Regression : Abstract: Uncertainty quantification is vital for decision-making and risk assessment in machine learning. Mean-variance regression models, which predict both a mean and residual noise for each data p...
- ResearchArcade: Graph Interface for Academic Tasks : Abstract: Academic research generates diverse data sources, and as researchers increasingly use machine learning to assist research tasks, a crucial question arises: Can we build a unified data interf...
- Support Vector Machine Classifier with Rescaled Huberized Pinball Loss : Abstract: Support vector machines are widely used in machine learning classification tasks, but traditional SVM models suffer from sensitivity to outliers and instability in resampling, which limits t...
- MRI-Based Brain Age Estimation with Supervised Contrastive Learning of Continuous Representation : Abstract: MRI-based brain age estimation models aim to assess a subject's biological brain age based on information, such as neuroanatomical features. Various factors, including neurodegenerative dise...
- Autonomous labeling of surgical resection margins using a foundation model : Abstract: Assessing resection margins is central to pathological specimen evaluation and has profound implications for patient outcomes. Current practice employs physical inking, which is applied vari...
- Real-PGDN: A Two-level Classification Method for Full-Process Recognition of Newly Registered Pornographic and Gambling Domain Names : Abstract: Online pornography and gambling have consistently posed regulatory challenges for governments, threatening both personal assets and privacy. Therefore, it is imperative to research the class...
- Towards Understanding Generalization in DP-GD: A Case Study in Training Two-Layer CNNs : Abstract: Modern deep learning techniques focus on extracting intricate information from data to achieve accurate predictions. However, the training datasets may be crowdsourced and include sensitive ...
- UCB for Large-Scale Pure Exploration: Beyond Sub-Gaussianity : Abstract: Selecting the best alternative from a finite set represents a broad class of pure exploration problems. Traditional approaches to pure exploration have predominantly relied on Gaussian or su...
- GLA-Grad++: An Improved Griffin-Lim Guided Diffusion Model for Speech Synthesis : Abstract: Recent advances in diffusion models have positioned them as powerful generative frameworks for speech synthesis, demonstrating substantial improvements in audio quality and stability. Nevert...
- Structure is Supervision: Multiview Masked Autoencoders for Radiology : Abstract: Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVM...
- Data-driven informative priors for Bayesian inference with quasi-periodic data : Abstract: Bayesian computational strategies for inference can be inefficient in approximating the posterior distribution in models that exhibit some form of periodicity. This is because the probabilit...
- Unexplored flaws in multiple-choice VQA evaluations : Abstract: Multimodal Large Language Models (MLLMs) demonstrate strong capabilities in handling image-text inputs. A common way to assess this ability is through multiple-choice Visual Question Answeri...
- Benchmarking machine learning models for multi-class state recognition in double duantum dot data : Abstract: Semiconductor quantum dots (QDs) are a leading platform for scalable quantum processors. However, scaling to large arrays requires reliable, automated tuning strategies for devices' bootstra...
- The Machine Learning Approach to Moment Closure Relations for Plasma: A Review : Abstract: The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a ...
- What Shape Is Optimal for Masks in Text Removal? : Abstract: The advent of generative models has dramatically improved the accuracy of image inpainting. In particular, by removing specific text from document images, reconstructing original images is e...
- AdS/Deep-Learning made easy II: neural network-based approaches to holography and inverse problems : Abstract: We apply physics-informed machine learning (PIML) to solve inverse problems in holography and classical mechanics, focusing on neural ordinary differential equations (Neural ODEs) and physic...
- DisCEdge: Distributed Context Management for Large Language Models at the Edge : Abstract: Deploying Large Language Model (LLM) services at the edge benefits latency-sensitive and privacy-aware applications. However, the stateless nature of LLMs makes managing user context (e.g., ...
- Stable-Drift: A Patient-Aware Latent Drift Replay Method for Stabilizing Representations in Continual Learning : Abstract: When deep learning models are sequentially trained on new data, they tend to abruptly lose performance on previously learned tasks, a critical failure known as catastrophic forgetting. This ...
- Generative models for crystalline materials : Abstract: Understanding structure-property relationships in materials is fundamental in condensed matter physics and materials science. Over the past few years, machine learning (ML) has emerged as a ...
- An Efficient Privacy-preserving Intrusion Detection Scheme for UAV Swarm Networks : Abstract: The rapid proliferation of unmanned aerial vehicles (UAVs) and their applications in diverse domains, such as surveillance, disaster management, agriculture, and defense, have revolutionized...
- From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images : Abstract: While Multimodal Large Language Models (MLLMs) are adept at answering what is in an image-identifying objects and describing scenes-they often lack the ability to understand how an image fee...
- Mitigating Semantic Drift: Evaluating LLMs' Efficacy in Psychotherapy through MI Dialogue Summarization : Abstract: Recent advancements in large language models (LLMs) have shown their potential across both general and domain-specific tasks. However, there is a growing concern regarding their lack of sens...
- Resolving Sharp Gradients of Unstable Singularities to Machine Precision via Neural Networks : Abstract: Recent work introduced a robust computational framework combining embedded mathematical structures, advanced optimization, and neural network architecture, leading to the discovery of multip...
- ClearGCD: Mitigating Shortcut Learning For Robust Generalized Category Discovery : Abstract: In open-world scenarios, Generalized Category Discovery (GCD) requires identifying both known and novel categories within unlabeled data. However, existing methods often suffer from prototyp...
- Language-conditioned world model improves policy generalization by reading environmental descriptions : Abstract: To interact effectively with humans in the real world, it is important for agents to understand language that describes the dynamics of the environment--that is, how the environment behaves-...
- Optical diffraction neural networks assisted computational ghost imaging through dynamic scattering media : Abstract: Ghost imaging leverages a single-pixel detector with no spatial resolution to acquire object echo intensity signals, which are correlated with illumination patterns to reconstruct an image. ...
- Maritime Activities Observed Through Open-Access Positioning Data: Moving and Stationary Vessels in the Baltic Sea : Abstract: Understanding past and present maritime activity patterns is critical for navigation safety, environmental assessment, and commercial operations. An increasing number of services now openly ...
- Adaptive Factor Graph-Based Tightly Coupled GNSS/IMU Fusion for Robust Positionin : Abstract: Reliable positioning in GNSS-challenged environments remains a critical challenge for navigation systems. Tightly coupled GNSS/IMU fusion improves robustness but remains vulnerable to non-Ga...
- Time Extrapolation with Graph Convolutional Autoencoder and Tensor Train Decomposition : Abstract: Graph autoencoders have gained attention in nonlinear reduced-order modeling of parameterized partial differential equations defined on unstructured grids. Despite they provide a geometrical...
- Standard Occupation Classifier -- A Natural Language Processing Approach : Abstract: Standard Occupational Classifiers (SOC) are systems used to categorize and classify different types of jobs and occupations based on their similarities in terms of job duties, skills, and qu...
- Buffer replay enhances the robustness of multimodal learning under missing-modality : Abstract: Missing modalities consistently lead to significant performance degradation in multimodal models. Existing approaches either synthesize missing modalities at high computational cost or apply...
- Constraining dark matter halo profiles with symbolic regression : Abstract: Dark matter haloes are typically characterised by radial density profiles with fixed forms motivated by simulations (e.g. NFW). However, simulation predictions depend on uncertain dark matte...
- MathSight: A Benchmark Exploring Have Vision-Language Models Really Seen in University-Level Mathematical Reasoning? : Abstract: Recent advances in Vision-Language Models (VLMs) have achieved impressive progress in multimodal mathematical reasoning. Yet, how much visual information truly contributes to reasoning remai...
- db-SP: Accelerating Sparse Attention for Visual Generative Models with Dual-Balanced Sequence Parallelism : Abstract: Scaling Diffusion Transformer (DiT) inference via sequence parallelism is critical for reducing latency in visual generation, but is severely hampered by workload imbalance when applied to m...
- Machine learning for violence prediction: a systematic review and critical appraisal : Abstract: Purpose To conduct a systematic review of machine learning models for predicting violent behaviour by synthesising and appraising their validity, usefulness, and performance. Methods We sy...
- Fault-Tolerant MARL for CAVs under Observation Perturbations for Highway On-Ramp Merging : Abstract: Multi-Agent Reinforcement Learning (MARL) holds significant promise for enabling cooperative driving among Connected and Automated Vehicles (CAVs). However, its practical application is hind...
- Clustering Malware at Scale: A First Full-Benchmark Study : Abstract: Recent years have shown that malware attacks still happen with high frequency. Malware experts seek to categorize and classify incoming samples to confirm their trustworthiness or prove thei...
- A PLS-Integrated LASSO Method with Application in Index Tracking : Abstract: In traditional multivariate data analysis, dimension reduction and regression have been treated as distinct endeavors. Established techniques such as principal component regression (PCR) and...
- Asymptotic Theory and Phase Transitions for Variable Importance in Quantile Regression Forests : Abstract: Quantile Regression Forests (QRF) are widely used for non-parametric conditional quantile estimation, yet statistical inference for variable importance measures remains challenging due to th...
- Nonstabilizerness Estimation using Graph Neural Networks : Abstract: This article proposes a Graph Neural Network (GNN) approach to estimate nonstabilizerness in quantum circuits, measured by the stabilizer Rényi entropy (SRE). Nonstabilizerness is a fundamen...
- TWEO: Transformers Without Extreme Outliers Enables FP8 Training And Quantization For Dummies : Abstract: Native FP8 support in modern hardware is essential for training large Transformers, but is severely hindered by extreme activation outliers. Existing solutions either rely on complex mixed-p...
- OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning : Abstract: Existing reinforcement learning (RL)-based post-training methods for large language models have advanced rapidly, yet their design has largely been guided by heuristics rather than systemati...
- A Sampling-Based Domain Generalization Study with Diffusion Generative Models : Abstract: In this work, we investigate the domain generalization capabilities of diffusion models in the context of synthesizing images that are distinct from the training data. Instead of fine-tuning...
- Accelerating Diffusion Models with Parallel Sampling: Inference at Sub-Linear Time Complexity : Abstract: Diffusion models have become a leading method for generative modeling of both image and scientific data. As these models are costly to train and \emph{evaluate}, reducing the inference cost ...
- Scaling Equitable Reflection Assessment in Education via Large Language Models and Role-Based Feedback Agents : Abstract: Formative feedback is widely recognized as one of the most effective drivers of student learning, yet it remains difficult to implement equitably at scale. In large or low-resource courses, ...
- NegBLEURT Forest: Leveraging Inconsistencies for Detecting Jailbreak Attacks : Abstract: Jailbreak attacks designed to bypass safety mechanisms pose a serious threat by prompting LLMs to generate harmful or inappropriate content, despite alignment with ethical guidelines. Crafti...
- A Layered Protocol Architecture for the Internet of Agents : Abstract: Large Language Models (LLMs) have demonstrated remarkable performance improvements and the ability to learn domain-specific languages (DSLs), including APIs and tool interfaces. This capabil...
- Beluga: A CXL-Based Memory Architecture for Scalable and Efficient LLM KVCache Management : Abstract: The rapid increase in LLM model sizes and the growing demand for long-context inference have made memory a critical bottleneck in GPU-accelerated serving systems. Although high-bandwidth mem...
- Artificial intelligence for methane detection: from continuous monitoring to verified mitigation : Abstract: Methane is a potent greenhouse gas, responsible for roughly 30\% of warming since pre-industrial times. A small number of large point sources account for a disproportionate share of emission...
- Physics-Informed Spiking Neural Networks via Conservative Flux Quantization : Abstract: Real-time, physically-consistent predictions on low-power edge devices is critical for the next generation embodied AI systems, yet it remains a major challenge. Physics-Informed Neural Netw...
- Dynamical Implicit Neural Representations : Abstract: Implicit Neural Representations (INRs) provide a powerful continuous framework for modeling complex visual and geometric signals, but spectral bias remains a fundamental challenge, limiting ...
- Multiclass threshold-based classification and model evaluation : Abstract: In this paper, we introduce a threshold-based framework for multiclass classification that generalizes the standard argmax rule. This is done by replacing the probabilistic interpretation of...
- The Double-Edged Nature of the Rashomon Set for Trustworthy Machine Learning : Abstract: Real-world machine learning (ML) pipelines rarely produce a single model; instead, they produce a Rashomon set of many near-optimal ones. We show that this multiplicity reshapes key aspects ...
- Unsupervised Anomaly Detection for Smart IoT Devices: Performance and Resource Comparison : Abstract: The rapid expansion of Internet of Things (IoT) deployments across diverse sectors has significantly enhanced operational efficiency, yet concurrently elevated cybersecurity vulnerabilities ...
- Massively Parallel Imitation Learning of Mouse Forelimb Musculoskeletal Reaching Dynamics : Abstract: The brain has evolved to effectively control the body, and in order to understand the relationship we need to model the sensorimotor transformations underlying embodied control. As part of a...
- Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications : Abstract: This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations...
- Closed-Loop Transformers: Autoregressive Modeling as Iterative Latent Equilibrium : Abstract: Contemporary autoregressive transformers operate in open loop: each hidden state is computed in a single forward pass and never revised, causing errors to propagate uncorrected through the s...
- Physically Interpretable Representation Learning with Gaussian Mixture Variational AutoEncoder (GM-VAE) : Abstract: Extracting compact, physically interpretable representations from high-dimensional scientific data is a persistent challenge due to the complex, nonlinear structures inherent in physical sys...
- Exploring Fusion Strategies for Multimodal Vision-Language Systems : Abstract: Modern machine learning models often combine multiple input streams of data to more accurately capture the information that informs their decisions. In multimodal machine learning, choosing ...
- Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings : Abstract: Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions (Zhang et al., 2025), where imperceptible pe...
- Beyond Atoms: Evaluating Electron Density Representation for 3D Molecular Learning : Abstract: Machine learning models for 3D molecular property prediction typically rely on atom-based representations, which may overlook subtle physical information. Electron density maps -- the direct...
- Multi-Modal Machine Learning for Early Trust Prediction in Human-AI Interaction Using Face Image and GSR Bio Signals : Abstract: Predicting human trust in AI systems is crucial for safe integration of AI-based decision support tools, especially in healthcare. This study proposes a multi-modal machine learning framewor...
- Modeling Quantum Autoencoder Trainable Kernel for IoT Anomaly Detection : Abstract: Escalating cyber threats and the high-dimensional complexity of IoT traffic have outpaced classical anomaly detection methods. While deep learning offers improvements, computational bottlene...
- Breaking Algorithmic Collusion in Human-AI Ecosystems : Abstract: AI agents are increasingly deployed in ecosystems where they repeatedly interact not only with each other but also with humans. In this work, we study these human-AI ecosystems from a theore...
- Deep Learning Architectures for Code-Modulated Visual Evoked Potentials Detection : Abstract: Non-invasive Brain-Computer Interfaces (BCIs) based on Code-Modulated Visual Evoked Potentials (C-VEPs) require highly robust decoding methods to address temporal variability and session-dep...
- CTR Prediction on Alibaba's Taobao Advertising Dataset Using Traditional and Deep Learning Models : Abstract: Click-through rates prediction is critical in modern advertising systems, where ranking relevance and user engagement directly impact platform efficiency and business value. In this project,...
- MOTIF-RF: Multi-template On-chip Transformer Synthesis Incorporating Frequency-domain Self-transfer Learning for RFIC Design Automation : Abstract: This paper presents a systematic study on developing multi-template machine learning (ML) surrogate models and applying them to the inverse design of transformers (XFMRs) in radio-frequency ...
- Distance-based Learning of Hypertrees : Abstract: We study the problem of learning hypergraphs with shortest-path queries (SP-queries), and present the first provably optimal online algorithm for a broad and natural class of hypertrees that...
- Equilibrium Propagation Without Limits : Abstract: We liberate Equilibrium Propagation (EP) from the limit of infinitesimal perturbations by establishing a finite-nudge foundation for local credit assignment. By modeling network states as Gi...
- Calibration-Free EEG-based Driver Drowsiness Detection with Online Test-Time Adaptation : Abstract: Drowsy driving is a growing cause of traffic accidents, prompting recent exploration of electroencephalography (EEG)-based drowsiness detection systems. However, the inherent variability of ...
- Convergence Dynamics of Over-Parameterized Score Matching for a Single Gaussian : Abstract: Score matching has become a central training objective in modern generative modeling, particularly in diffusion models, where it is used to learn high-dimensional data distributions through ...
- ARES: Anomaly Recognition Model For Edge Streams : Abstract: Many real-world scenarios involving streaming information can be represented as temporal graphs, where data flows through dynamic changes in edges over time. Anomaly detection in this contex...
- Quantum Bayesian Optimization for Quality Improvement in Fuselage Assembly : Abstract: Recent efforts in smart manufacturing have enhanced aerospace fuselage assembly processes, particularly by innovating shape adjustment techniques to minimize dimensional gaps between assembl...
- Adaptive Dueling Double Deep Q-networks in Uniswap V3 Replication and Extension with Mamba : Abstract: The report goes through the main steps of replicating and improving the article "Adaptive Liquidity Provision in Uniswap V3 with Deep Reinforcement Learning." The replication part includes h...
- Representative Action Selection for Large Action Space: From Bandits to MDPs : Abstract: We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundament...
- Energy Efficient Sleep Mode Optimization in 5G mmWave Networks via Multi Agent Deep Reinforcement Learning : Abstract: Dynamic sleep mode optimization (SMO) in millimeter-wave (mmWave) networks is essential for maximizing energy efficiency (EE) under stringent quality-of-service (QoS) constraints. However, e...
- An energy-efficient spiking neural network with continuous learning for self-adaptive brain-machine interface : Abstract: The number of simultaneously recorded neurons follows an exponentially increasing trend in implantable brain-machine interfaces (iBMIs). Integrating the neural decoder in the implant is an e...
- Toward Data-Driven Surrogates of the Solar Wind with Spherical Fourier Neural Operator : Abstract: The solar wind, a continuous stream of charged particles from the Sun's corona, shapes the heliosphere and impacts space systems near Earth. Variations such as high-speed streams and coronal...
- IVGAE: Handling Incomplete Heterogeneous Data with a Variational Graph Autoencoder : Abstract: Handling missing data remains a fundamental challenge in real-world tabular datasets, especially when data are heterogeneous with both numerical and categorical features. Existing imputation...
- A Variational Manifold Embedding Framework for Nonlinear Dimensionality Reduction : Abstract: Dimensionality reduction algorithms like principal component analysis (PCA) are workhorses of machine learning and neuroscience, but each has well-known limitations. Variants of PCA are simp...
- Benchmarking In-context Experiential Learning Through Repeated Product Recommendations : Abstract: To reliably navigate ever-shifting real-world environments, agents must grapple with incomplete knowledge and adapt their behavior through experience. However, current evaluations largely fo...
- Probabilistic Digital Twin for Misspecified Structural Dynamical Systems via Latent Force Modeling and Bayesian Neural Networks : Abstract: This work presents a probabilistic digital twin framework for response prediction in dynamical systems governed by misspecified physics. The approach integrates Gaussian Process Latent Force...
- TinyLLM: Evaluation and Optimization of Small Language Models for Agentic Tasks on Edge Devices : Abstract: This paper investigates the effectiveness of small language models (SLMs) for agentic tasks (function/tool/API calling) with a focus on running agents on edge devices without reliance on clo...
- From Topology to Retrieval: Decoding Embedding Spaces with Unified Signatures : Abstract: Studying how embeddings are organized in space not only enhances model interpretability but also uncovers factors that drive downstream task performance. In this paper, we present a comprehe...
- Designing Instance-Level Sampling Schedules via REINFORCE with James-Stein Shrinkage : Abstract: Most post-training methods for text-to-image samplers focus on model weights: either fine-tuning the backbone for alignment or distilling it for few-step efficiency. We take a different rout...
- BiCQL-ML: A Bi-Level Conservative Q-Learning Framework for Maximum Likelihood Inverse Reinforcement Learning : Abstract: Offline inverse reinforcement learning (IRL) aims to recover a reward function that explains expert behavior using only fixed demonstration data, without any additional online interaction. W...
- FedRE: A Representation Entanglement Framework for Model-Heterogeneous Federated Learning : Abstract: Federated learning (FL) enables collaborative training across clients without compromising privacy. While most existing FL methods assume homogeneous model architectures, client heterogeneit...
- TreeCoder: Systematic Exploration and Optimisation of Decoding and Constraints for LLM Code Generation : Abstract: Large language models (LLMs) have shown remarkable ability to generate code, yet their outputs often violate syntactic or semantic constraints when guided only through natural language promp...
- The Hidden Cost of Approximation in Online Mirror Descent : Abstract: Online mirror descent (OMD) is a fundamental algorithmic paradigm that underlies many algorithms in optimization, machine learning and sequential decision-making. The OMD iterates are define...
- Online Dynamic Pricing of Complementary Products : Abstract: Traditional pricing paradigms, once dominated by static models and rule-based heuristics, are increasingly being replaced by dynamic, data-driven approaches powered by machine learning algor...
- FLUX: Efficient Descriptor-Driven Clustered Federated Learning under Arbitrary Distribution Shifts : Abstract: Federated Learning (FL) enables collaborative model training across multiple clients while preserving data privacy. Traditional FL methods often use a global model to fit all clients, assumi...
- DeXposure: A Dataset and Benchmarks for Inter-protocol Credit Exposure in Decentralized Financial Networks : Abstract: We curate the DeXposure dataset, the first large-scale dataset for inter-protocol credit exposure in decentralized financial networks, covering global markets of 43.7 million entries across ...
- SingleQuant: Efficient Quantization of Large Language Models in a Single Pass : Abstract: Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization trunc...
- Cleaning the Pool: Progressive Filtering of Unlabeled Pools in Deep Active Learning : Abstract: Existing active learning (AL) strategies capture fundamentally different notions of data value, e.g., uncertainty or representativeness. Consequently, the effectiveness of strategies can var...
- AutoTailor: Automatic and Efficient Adaptive Model Deployment for Diverse Edge Devices : Abstract: On-device machine learning (ML) has become a fundamental component of emerging mobile applications. Adaptive model deployment delivers efficient inference for heterogeneous device capabiliti...
- Efficient-Husformer: Efficient Multimodal Transformer Hyperparameter Optimization for Stress and Cognitive Loads : Abstract: Transformer-based models have gained considerable attention in the field of physiological signal analysis. They leverage long-range dependencies and complex patterns in temporal signals, all...
- Predicting and Interpolating Spatiotemporal Environmental Data: A Case Study of Groundwater Storage in Bangladesh : Abstract: Geospatial observational datasets are often limited to point measurements, making temporal prediction and spatial interpolation essential for constructing continuous fields. This study evalu...
- TS2Vec-Ensemble: An Enhanced Self-Supervised Framework for Time Series Forecasting : Abstract: Self-supervised representation learning, particularly through contrastive methods like TS2Vec, has advanced the analysis of time series data. However, these models often falter in forecastin...
- Improving Stochastic Action-Constrained Reinforcement Learning via Truncated Distributions : Abstract: In reinforcement learning (RL), it is often advantageous to consider additional constraints on the action space to ensure safety or action relevance. Existing work on such action-constrained...
- PISA: Prioritized Invariant Subgraph Aggregation : Abstract: Recent work has extended the invariance principle for out-of-distribution (OOD) generalization from Euclidean to graph data, where challenges arise due to complex structures and diverse dist...
- An Efficient Embedding Based Ad Retrieval with GPU-Powered Feature Interaction : Abstract: In large-scale advertising recommendation systems, retrieval serves as a critical component, aiming to efficiently select a subset of candidate ads relevant to user behaviors from a massive ...
- Adversarial Flow Models : Abstract: We present adversarial flow models, a class of generative models that unifies adversarial models and flow models. Our method supports native one-step or multi-step generation and is trained ...
- Enhancing Trustworthiness with Mixed Precision: Benchmarks, Opportunities, and Challenges : Abstract: Large language models (LLMs) have shown promising performance across various tasks. However, their autoregressive decoding process poses significant challenges for efficient deployment on ex...
- Space Explanations of Neural Network Classification : Abstract: We present a novel logic-based concept called Space Explanations for classifying neural networks that gives provable guarantees of the behavior of the network in continuous areas of the inpu...
- Privacy-Utility-Bias Trade-offs for Privacy-Preserving Recommender Systems : Abstract: Recommender systems (RSs) output ranked lists of items, such as movies or restaurants, that users may find interesting, based on the user's past ratings and ratings from other users. RSs inc...
- List-Decodable Regression via Expander Sketching : Abstract: We introduce an expander-sketching framework for list-decodable linear regression that achieves sample complexity $\tilde{O}((d+\log(1/δ))/α)$, list size $O(1/α)$, and near input-sparsity ru...
- Entropy is all you need for Inter-Seed Cross-Play in Hanabi : Abstract: We find that in Hanabi, one of the most complex and popular benchmarks for zero-shot coordination and ad-hoc teamplay, a standard implementation of independent PPO with a slightly higher ent...
- The Multiclass Score-Oriented Loss (MultiSOL) on the Simplex : Abstract: In the supervised binary classification setting, score-oriented losses have been introduced with the aim of optimizing a chosen performance metric directly during the training phase, thus av...
- LLM-Cave: A benchmark and light environment for large language models reasoning and decision-making system : Abstract: Large language models (LLMs) such as ChatGPT o1, ChatGPT o3, and DeepSeek R1 have shown great potential in solving difficult problems. However, current LLM evaluation benchmarks are limited ...
- Federated Learning Survey: A Multi-Level Taxonomy of Aggregation Techniques, Experimental Insights, and Future Frontiers : Abstract: The integration of IoT and AI has unlocked innovation across industries, but growing privacy concerns and data isolation hinder progress. Traditional centralized ML struggles to overcome the...
- Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning : Abstract: Adapting large-scale foundation flow and diffusion generative models to optimize task-specific objectives while preserving prior information is crucial for real-world applications such as mo...
- Spatially Aware Dictionary-Free Eigenfunction Identification for Modeling and Control of Nonlinear Dynamical Systems : Abstract: A new approach to data-driven discovery of Koopman eigenfunctions without a pre-defined set of basis functions is proposed. The approach is based on a reference trajectory, for which the Koo...
- Structure-aware Hybrid-order Similarity Learning for Multi-view Unsupervised Feature Selection : Abstract: Multi-view unsupervised feature selection (MUFS) has recently emerged as an effective dimensionality reduction method for unlabeled multi-view data. However, most existing methods mainly use...
- Difficulties with Evaluating a Deception Detector for AIs : Abstract: Building reliable deception detectors for AI systems -- methods that could predict when an AI system is being strategically deceptive without necessarily requiring behavioural evidence -- wo...
- Mod\`eles de Fondation et Ajustement : Vers une Nouvelle G\'en\'eration de Mod\`eles pour la Pr\'evision des S\'eries Temporelles : Abstract: Inspired by recent advances in large language models, foundation models have been developed for zero-shot time series forecasting, enabling prediction on datasets unseen during pretraining. ...
- Generative Anchored Fields: Controlled Data Generation via Emergent Velocity Fields and Transport Algebra : Abstract: We present Generative Anchored Fields (GAF), a generative model that learns independent endpoint predictors $J$ (noise) and $K$ (data) rather than a trajectory predictor. The velocity field ...
- Integrated Transcriptomic-proteomic Biomarker Identification for Radiation Response Prediction in Non-small Cell Lung Cancer Cell Lines : Abstract: To develop an integrated transcriptome-proteome framework for identifying concurrent biomarkers predictive of radiation response, as measured by survival fraction at 2 Gy (SF2), in non-small...
- GSpaRC: Gaussian Splatting for Real-time Reconstruction of RF Channels : Abstract: Channel state information (CSI) is essential for adaptive beamforming and maintaining robust links in wireless communication systems. However, acquiring CSI incurs significant overhead, cons...
- Can Synthetic Data Improve Symbolic Regression Extrapolation Performance? : Abstract: Many machine learning models perform well when making predictions within the training data range, but often struggle when required to extrapolate beyond it. Symbolic regression (SR) using ge...
- Intelligent Neural Networks: From Layered Architectures to Graph-Organized Intelligence : Abstract: Biological neurons exhibit remarkable intelligence: they maintain internal states, communicate selectively with other neurons, and self-organize into complex graphs rather than rigid hierarc...
- PerfMamba: Performance Analysis and Pruning of Selective State Space Models : Abstract: Recent advances in sequence modeling have introduced selective SSMs as promising alternatives to Transformer architectures, offering theoretical computational efficiency and sequence process...
- TARFVAE: Efficient One-Step Generative Time Series Forecasting via TARFLOW based VAE : Abstract: Time series data is ubiquitous, with forecasting applications spanning from finance to healthcare. Beyond popular deterministic methods, generative models are gaining attention due to advanc...
- CRAwDAD: Causal Reasoning Augmentation with Dual-Agent Debate : Abstract: When people reason about cause and effect, they often consider many competing "what if" scenarios before deciding which explanation fits best. Analogously, advanced language models capable o...
- Bridging Modalities via Progressive Re-alignment for Multimodal Test-Time Adaptation : Abstract: Test-time adaptation (TTA) enables online model adaptation using only unlabeled test data, aiming to bridge the gap between source and target distributions. However, in multimodal scenarios,...
- ARM-Explainer -- Explaining and improving graph neural network predictions for the maximum clique problem using node features and association rule mining : Abstract: Numerous graph neural network (GNN)-based algorithms have been proposed to solve graph-based combinatorial optimization problems (COPs), but methods to explain their predictions remain large...
- Covering-Space Normalizing Flows: Approximating Pushforwards on Lens Spaces : Abstract: We construct pushforward distributions via the universal covering map rho: S^3 -> L(p;q) with the goal of approximating these distributions using flows on L(p;q). We highlight that our metho...
- Modeling Chaotic Pedestrian Behavior Using Chaos Indicators and Supervised Learning : Abstract: As cities around the world aim to improve walkability and safety, understanding the irregular and unpredictable nature of pedestrian behavior has become increasingly important. This study in...
- Time Series Forecasting via Direct Per-Step Probability Distribution Modeling : Abstract: Deep neural network-based time series prediction models have recently demonstrated superior capabilities in capturing complex temporal dependencies. However, it is challenging for these mode...
- Simultaneous Image Quality Improvement and Artefacts Correction in Accelerated MRI : Abstract: MR data are acquired in the frequency domain, known as k-space. Acquiring high-quality and high-resolution MR images can be time-consuming, posing a significant challenge when multiple seque...
- Machine Learning for Scientific Visualization: Ensemble Data Analysis : Abstract: Scientific simulations and experimental measurements produce vast amounts of spatio-temporal data, yet extracting meaningful insights remains challenging due to high dimensionality, complex ...
- Hard-Constrained Neural Networks with Physics-Embedded Architecture for Residual Dynamics Learning and Invariant Enforcement in Cyber-Physical Systems : Abstract: This paper presents a framework for physics-informed learning in complex cyber-physical systems governed by differential equations with both unknown dynamics and algebraic invariants. First,...
- Toward Automatic Safe Driving Instruction: A Large-Scale Vision Language Model Approach : Abstract: Large-scale Vision Language Models (LVLMs) exhibit advanced capabilities in tasks that require visual information, including object detection. These capabilities have promising applications ...
- Every Token Counts: Generalizing 16M Ultra-Long Context in Large Language Models : Abstract: This work explores the challenge of building ``Machines that Can Remember'', framing long-term memory as the problem of efficient ultra-long context modeling. We argue that this requires thr...
- Towards Improving Interpretability of Language Model Generation through a Structured Knowledge Discovery Approach : Abstract: Knowledge-enhanced text generation aims to enhance the quality of generated text by utilizing internal or external knowledge sources. While language models have demonstrated impressive capab...
- ParaGate: Parasitic-Driven Domain Adaptation Transfer Learning for Netlist Performance Prediction : Abstract: In traditional EDA flows, layout-level performance metrics are only obtainable after placement and routing, hindering global optimization at earlier stages. Although some neural-network-base...
- Flow Straighter and Faster: Efficient One-Step Generative Modeling via MeanFlow on Rectified Trajectories : Abstract: Flow-based generative models have recently demonstrated strong performance, yet sampling typically relies on expensive numerical integration of ordinary differential equations (ODEs). Rectif...
- MegaChat: A Synthetic Persian Q&A Dataset for High-Quality Sales Chatbot Evaluation : Abstract: Small and medium-sized enterprises (SMEs) in Iran increasingly leverage Telegram for sales, where real-time engagement is essential for conversion. However, developing AI-driven chatbots for...
- LFM2 Technical Report : Abstract: We present LFM2, a family of Liquid Foundation Models designed for efficient on-device deployment and strong task capabilities. Using hardware-in-the-loop architecture search under edge late...
- Evaluating LLMs for One-Shot Patching of Real and Artificial Vulnerabilities : Abstract: Automated vulnerability patching is crucial for software security, and recent advancements in Large Language Models (LLMs) present promising capabilities for automating this task. However, e...
- ASTRO: Adaptive Stitching via Dynamics-Guided Trajectory Rollouts : Abstract: Offline reinforcement learning (RL) enables agents to learn optimal policies from pre-collected datasets. However, datasets containing suboptimal and fragmented trajectories present challeng...
- Physics-Informed Neural Networks for Thermophysical Property Retrieval : Abstract: Inverse heat problems refer to the estimation of material thermophysical properties given observed or known heat diffusion behaviour. Inverse heat problems have wide-ranging uses, but a crit...
- The Price of Progress: Algorithmic Efficiency and the Falling Cost of AI Inference : Abstract: Language models have seen enormous progress on advanced benchmarks in recent years, but much of this progress has only been possible by using more costly models. Benchmarks may therefore pre...
- Learning Rules from Rewards : Abstract: Humans can flexibly generalize knowledge across domains by leveraging structured relational representations. While prior research has shown how such representations support analogical reason...
- Extensible Multi-Granularity Fusion Network and Transferable Curriculum Learning for Aspect-based Sentiment Analysis : Abstract: Aspect-based Sentiment Analysis (ABSA) aims to determine sentiment polarity toward specific aspects in text. Existing methods enrich semantic and syntactic representations through external k...
- Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models : Abstract: Capability evaluations play a crucial role in assessing and regulating frontier AI systems. The effectiveness of these evaluations faces a significant challenge: strategic underperformance, ...
- Domain adaptation of large language models for geotechnical applications : Abstract: The rapid advancement of large language models (LLMs) is transforming opportunities in geotechnical engineering, where workflows rely on complex, text-rich data. While general-purpose LLMs d...
- New-Onset Diabetes Assessment Using Artificial Intelligence-Enhanced Electrocardiography : Abstract: Diabetes has a long asymptomatic period which can often remain undiagnosed for multiple years. In this study, we trained a deep learning model to detect new-onset diabetes using 12-lead ECG ...
- Continual Learning with Global Alignment : Abstract: Continual learning aims to sequentially learn new tasks without forgetting previous tasks' knowledge (catastrophic forgetting). One factor that can cause forgetting is the interference betwe...
- Data efficient surrogate modeling for engineering design: Ensemble-free batch mode deep active learning for regression : Abstract: High fidelity design evaluation processes such as Computational Fluid Dynamics and Finite Element Analysis are often replaced with data driven surrogates to reduce computational cost in engi...
- Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey : Abstract: The integration of biomolecular modeling with natural language (BL) has emerged as a promising interdisciplinary area at the intersection of artificial intelligence, chemistry and biology. T...
- Towards Responsible Development of Generative AI for Education: An Evaluation-Driven Approach : Abstract: A major challenge facing the world is the provision of equitable and universal access to quality education. Recent advances in generative AI (gen AI) have created excitement about the potent...
- Event Stream-based Sign Language Translation: A High-Definition Benchmark Dataset and A Novel Baseline : Abstract: Sign Language Translation (SLT) is a core task in the field of AI-assisted disability. Traditional SLT methods are typically based on visible light videos, which are easily affected by facto...
- Model-Based Reward Shaping for Adversarial Inverse Reinforcement Learning in Stochastic Environments : Abstract: In this paper, we aim to tackle the limitation of the Adversarial Inverse Reinforcement Learning (AIRL) method in stochastic environments where theoretical results cannot hold and performanc...
- Reranking partisan animosity in algorithmic social media feeds alters affective polarization : Abstract: Today, social media platforms hold sole power to study the effects of feed ranking algorithms. We developed a platform-independent method that reranks participants' feeds in real-time and us...
- FairPO: Robust Preference Optimization for Fair Multi-Label Learning : Abstract: Multi-label classification (MLC) often suffers from performance disparities across labels. We propose \textbf{FairPO}, a framework combining preference-based loss and group-robust optimizati...
- PathGene: Benchmarking Driver Gene Mutations and Exon Prediction Using Multicenter Lung Cancer Histopathology Image Dataset : Abstract: Accurately predicting gene mutations, mutation subtypes and their exons in lung cancer is critical for personalized treatment planning and prognostic assessment. Faced with regional disparit...
- Dual-Model Weight Selection and Self-Knowledge Distillation for Medical Image Classification : Abstract: We propose a novel medical image classification method that integrates dual-model weight selection with self-knowledge distillation (SKD). In real-world medical settings, deploying large-sca...
- Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space : Abstract: Simulating the long-timescale dynamics of biomolecules is a central challenge in computational science. While enhanced sampling methods can accelerate these simulations, they rely on pre-def...
- Real-Time Obstacle Avoidance for a Mobile Robot Using CNN-Based Sensor Fusion : Abstract: Obstacle avoidance is a critical component of the navigation stack required for mobile robots to operate effectively in complex and unknown environments. In this research, three end-to-end C...
- iSeal: Encrypted Fingerprinting for Reliable LLM Ownership Verification : Abstract: Given the high cost of large language model (LLM) training from scratch, safeguarding LLM intellectual property (IP) has become increasingly crucial. As the standard paradigm for IP ownershi...
- MTR-VP: Towards End-to-End Trajectory Planning through Context-Driven Image Encoding and Multiple Trajectory Prediction : Abstract: We present a method for trajectory planning for autonomous driving, learning image-based context embeddings that align with motion prediction frameworks and planning-based intention input. W...
- ARPGNet: Appearance- and Relation-aware Parallel Graph Attention Fusion Network for Facial Expression Recognition : Abstract: The key to facial expression recognition is to learn discriminative spatial-temporal representations that embed facial expression dynamics. Previous studies predominantly rely on pre-trained...
- PULSE-ICU: A Pretrained Unified Long-Sequence Encoder for Multi-task Prediction in Intensive Care Units : Abstract: Intensive care unit (ICU) data are highly irregular, heterogeneous, and temporally fragmented, posing challenges for generalizable clinical prediction. We present PULSE-ICU, a self-supervise...
- 3D-Consistent Multi-View Editing by Diffusion Guidance : Abstract: Recent advancements in diffusion models have greatly improved text-based image editing, yet methods that edit images independently often produce geometrically and photometrically inconsisten...
- From Compound Figures to Composite Understanding: Developing a Multi-Modal LLM from Biomedical Literature with Medical Multiple-Image Benchmarking and Validation : Abstract: Multi-modal large language models (MLLMs) have shown promise in advancing healthcare. However, most existing models remain confined to single-image understanding, which greatly limits their ...
- DeepPNI: Language- and graph-based model for mutation-driven protein-nucleic acid energetics : Abstract: The interaction between proteins and nucleic acids is crucial for processes that sustain cellular function, including DNA maintenance and the regulation of gene expression and translation. A...
- Evaluating Embedding Models and Pipeline Optimization for AI Search Quality : Abstract: We evaluate the performance of various text embedding models and pipeline configurations for AI-driven search systems. We compare sentence-transformer and generative embedding models (e.g., ...
- An interpretable unsupervised representation learning for high precision measurement in particle physics : Abstract: Unsupervised learning has been widely applied to various tasks in particle physics. However, existing models lack precise control over their learned representations, limiting physical interp...
- Efficiency and Effectiveness of SPLADE Models on Billion-Scale Web Document Title : Abstract: This paper presents a comprehensive comparison of BM25, SPLADE, and Expanded-SPLADE models in the context of large-scale web document retrieval. We evaluate the effectiveness and efficiency ...
- Adaptive tumor growth forecasting via neural & universal ODEs : Abstract: Forecasting tumor growth is critical for optimizing treatment. Classical growth models such as the Gompertz and Bertalanffy equations capture general tumor dynamics but may fail to adapt to ...
- RELiQ: Scalable Entanglement Routing via Reinforcement Learning in Quantum Networks : Abstract: Quantum networks are becoming increasingly important because of advancements in quantum computing and quantum sensing, such as recent developments in distributed quantum computing and federa...
- Prompt-based Consistent Video Colorization : Abstract: Existing video colorization methods struggle with temporal flickering or demand extensive manual input. We propose a novel approach automating high-fidelity video colorization using rich sem...
- On the Condition Number Dependency in Bilevel Optimization : Abstract: Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of find...
- Edge Deployment of Small Language Models, a comprehensive comparison of CPU, GPU and NPU backends : Abstract: Edge computing processes data where it is generated, enabling faster decisions, lower bandwidth usage, and improved privacy. However, edge devices typically operate under strict constraints ...
- Test Time Training for AC Power Flow Surrogates via Physics and Operational Constraint Refinement : Abstract: Power Flow (PF) calculation based on machine learning (ML) techniques offer significant computational advantages over traditional numerical methods but often struggle to maintain full physic...
- BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands : Abstract: Open-vocabulary mobile manipulation (OVMM) requires robots to follow language instructions, navigate, and manipulate while updating their world representation under dynamic environmental cha...
- SuRe: Surprise-Driven Prioritised Replay for Continual LLM Learning : Abstract: Continual learning, one's ability to adapt to a sequence of tasks without forgetting previously acquired knowledge, remains a major challenge in machine learning and a key gap between artifi...
- Distributed Knowing How : Abstract: Distributed knowledge is a key concept in the standard epistemic logic of knowledge-that. In this paper, we propose a corresponding notion of distributed knowledge-how and study its logic. O...
- Conditionals Based on Selection Functions, Modal Operators and Probabilities : Abstract: Methods for probability updating, of which Bayesian conditionalization is the most well-known and widely used, are modeling tools that aim to represent the process of modifying an initial ep...
- Graded Distributed Belief : Abstract: We introduce a new logic of graded distributed belief that allows us to express the fact that a group of agents distributively believe that a certain fact holds with at least strength k. We ...
- Asking like Socrates: Socrates helps VLMs understand remote sensing images : Abstract: Recent multimodal reasoning models, inspired by DeepSeek-R1, have significantly advanced vision-language systems. However, in remote sensing (RS) tasks, we observe widespread pseudo reasonin...
- Mapping Clinical Doubt: Locating Linguistic Uncertainty in LLMs : Abstract: Large Language Models (LLMs) are increasingly used in clinical settings, where sensitivity to linguistic uncertainty can influence diagnostic interpretation and decision-making. Yet little i...
- MATCH: Engineering Transparent and Controllable Conversational XAI Systems through Composable Building Blocks : Abstract: While the increased integration of AI technologies into interactive systems enables them to solve an increasing number of tasks, the black-box problem of AI models continues to spread throug...
- FastFHE: Packing-Scalable and Depthwise-Separable CNN Inference Over FHE : Abstract: The deep learning (DL) has been penetrating daily life in many domains, how to keep the DL model inference secure and sample privacy in an encrypted environment has become an urgent and incr...
- GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents : Abstract: Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs)...
- What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$ : Abstract: Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and ...
- Exploring Performance Variations in Finetuned Translators of Ultra-Low Resource Languages: Do Linguistic Differences Matter? : Abstract: Finetuning pre-trained language models with small amounts of data is a commonly-used method to create translators for ultra-low resource languages such as endangered Indigenous languages. Ho...
- HW-GNN: Homophily-Aware Gaussian-Window Constrained Graph Spectral Network for Social Network Bot Detection : Abstract: Social bots are increasingly polluting online platforms by spreading misinformation and engaging in coordinated manipulation, posing severe threats to cybersecurity. Graph Neural Networks (G...
- DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA : Abstract: Document visual question answering (DocVQA) requires models to jointly reason over textual content and spatial layout, yet current systems exhibit a sharp accuracy--efficiency trade-off: lar...
- CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving : Abstract: Vision-Language-Action (VLA) models have recently attracted growing attention in end-to-end autonomous driving for their strong reasoning capabilities and rich world knowledge. However, exis...
- Where to Measure: Epistemic Uncertainty-Based Sensor Placement with ConvCNPs : Abstract: Accurate sensor placement is critical for modeling spatio-temporal systems such as environmental and climate processes. Neural Processes (NPs), particularly Convolutional Conditional Neural ...
- Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization : Abstract: We study how different Chain-of-Thought (CoT) designs affect the acquisition of the generalizable visual reasoning ability in vision-language models (VLMs). While CoT data, especially long o...
- HarmoCLIP: Harmonizing Global and Regional Representations in Contrastive Vision-Language Models : Abstract: Contrastive Language-Image Pre-training (CLIP) has demonstrated remarkable generalization ability and strong performance across a wide range of vision-language tasks. However, due to the lac...
- GazeTrack: High-Precision Eye Tracking Based on Regularization and Spatial Computing : Abstract: Eye tracking has become increasingly important in virtual and augmented reality applications; however, the current gaze accuracy falls short of meeting the requirements for spatial computing...
- Variational analysis of determinantal varieties : Abstract: Determinantal varieties -- the sets of bounded-rank matrices or tensors -- have attracted growing interest in low-rank optimization. The tangent cone to low-rank sets is widely studied and u...
- Automated Design Optimization via Strategic Search with Large Language Models : Abstract: Traditional optimization methods excel in well-defined search spaces but struggle with design problems where transformations and design parameters are difficult to define. Large language mod...
- Foundations of Quantum Granular Computing with Effect-Based Granules, Algebraic Properties and Reference Architectures : Abstract: This paper develops the foundations of Quantum Granular Computing (QGC), extending classical granular computing including fuzzy, rough, and shadowed granules to the quantum regime. Quantum g...
- Test-time scaling of diffusions with flow maps : Abstract: A common recipe to improve diffusion models at test-time so that samples score highly against a user-specified reward is to introduce the gradient of the reward into the dynamics of the diff...
- Probabilistic Fusion and Calibration of Neural Speaker Diarization Models : Abstract: End-to-End Neural Diarization (EEND) systems produce frame-level probabilistic speaker activity estimates, yet since evaluation focuses primarily on Diarization Error Rate (DER), the reliabi...
- CoFiRec: Coarse-to-Fine Tokenization for Generative Recommendation : Abstract: In web environments, user preferences are often refined progressively as users move from browsing broad categories to exploring specific items. However, existing generative recommenders over...
- ReAG: Reasoning-Augmented Generation for Knowledge-based Visual Question Answering : Abstract: Multimodal Large Language Models (MLLMs) have shown impressive capabilities in jointly understanding text, images, and videos, often evaluated via Visual Question Answering (VQA). However, e...
- All Centers Are at most a Few Tokens Apart: Knowledge Distillation with Domain Invariant Prompt Tuning : Abstract: Domain generalization is critical in computational pathology (CPath) due to inherent domain shifts caused by variations in staining protocols, scanner devices, and imaging settings across cl...
- VeriDispatcher: Multi-Model Dispatching through Pre-Inference Difficulty Prediction for RTL Generation Optimization : Abstract: Large Language Models (LLMs) show strong performance in RTL generation, but different models excel on different tasks because of architecture and training differences. Prior work mainly prom...
- Exact Learning of Arithmetic with Differentiable Agents : Abstract: We explore the possibility of exact algorithmic learning with gradient-based methods and introduce a differentiable framework capable of strong length generalization on arithmetic tasks. Our...
- MammoRGB: Dual-View Mammogram Synthesis Using Denoising Diffusion Probabilistic Models : Abstract: Purpose: This study aims to develop and evaluate a three channel denoising diffusion probabilistic model (DDPM) for synthesizing single breast dual view mammograms and to assess the impact o...
- CAPE: Context-Aware Diffusion Policy Via Proximal Mode Expansion for Collision Avoidance : Abstract: In robotics, diffusion models can capture multi-modal trajectories from demonstrations, making them a transformative approach in imitation learning. However, achieving optimal performance fo...
- Improving Robotic Manipulation Robustness via NICE Scene Surgery : Abstract: Learning robust visuomotor policies for robotic manipulation remains a challenge in real-world settings, where visual distractors can significantly degrade performance and safety. In this wo...
- Distracted Robot: How Visual Clutter Undermine Robotic Manipulation : Abstract: In this work, we propose an evaluation protocol for examining the performance of robotic manipulation policies in cluttered scenes. Contrary to prior works, we approach evaluation from a psy...
- The Hidden AI Race: Tracking Environmental Costs of Innovation : Abstract: The past decade has seen a massive rise in the popularity of AI systems, mainly owing to the developments in Gen AI, which has revolutionized numerous industries and applications. However, t...
- AI summaries in online search influence users' attitudes : Abstract: This study examined how AI-generated summaries, which have become visually prominent in online search results, affect how users think about different issues. In a preregistered randomized co...
- A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees : Abstract: Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing m...
- CausalProfiler: Generating Synthetic Benchmarks for Rigorous and Transparent Evaluation of Causal Machine Learning : Abstract: Causal machine learning (Causal ML) aims to answer "what if" questions using machine learning algorithms, making it a promising tool for high-stakes decision-making. Yet, empirical evaluatio...
- Escaping Barren Plateaus in Variational Quantum Algorithms Using Negative Learning Rate in Quantum Internet of Things : Abstract: Variational Quantum Algorithms (VQAs) are becoming the primary computational primitive for next-generation quantum computers, particularly those embedded as resource-constrained accelerators...
- Serving Heterogeneous LoRA Adapters in Distributed LLM Inference Systems : Abstract: Low-Rank Adaptation (LoRA) has become the de facto method for parameter-efficient fine-tuning of large language models (LLMs), enabling rapid adaptation to diverse domains. In production, Lo...
- Adversarial Training for Process Reward Models : Abstract: Process Reward Models (PRMs) enhance reasoning ability of LLMs by providing step-level supervision. However, their widespread adoption is limited due to expensive manual step-level annotatio...
- Switching-time bioprocess control with pulse-width-modulated optogenetics : Abstract: Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowin...
- Leveraging Textual Compositional Reasoning for Robust Change Captioning : Abstract: Change captioning aims to describe changes between a pair of images. However, existing works rely on visual features alone, which often fail to capture subtle but meaningful changes because ...
- MICCAI STS 2024 Challenge: Semi-Supervised Instance-Level Tooth Segmentation in Panoramic X-ray and CBCT Images : Abstract: Orthopantomogram (OPGs) and Cone-Beam Computed Tomography (CBCT) are vital for dentistry, but creating large datasets for automated tooth segmentation is hindered by the labor-intensive proc...
- AgentShield: Make MAS more secure and efficient : Abstract: Large Language Model (LLM)-based Multi-Agent Systems (MAS) offer powerful cooperative reasoning but remain vulnerable to adversarial attacks, where compromised agents can undermine the syste...
- EnECG: Efficient Ensemble Learning for Electrocardiogram Multi-task Foundation Model : Abstract: Electrocardiogram (ECG) analysis plays a vital role in the early detection, monitoring, and management of various cardiovascular conditions. While existing models have achieved notable succe...
- Bandit Guided Submodular Curriculum for Adaptive Subset Selection : Abstract: Traditional curriculum learning proceeds from easy to hard samples, yet defining a reliable notion of difficulty remains elusive. Prior work has used submodular functions to induce difficult...
- Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary : Abstract: Enabling humanoid robots to follow free-form language commands is critical for seamless human-robot interaction, collaborative task execution, and general-purpose embodied intelligence. Whil...
- Pooling Attention: Evaluating Pretrained Transformer Embeddings for Deception Classification : Abstract: This paper investigates fake news detection as a downstream evaluation of Transformer representations, benchmarking encoder-only and decoder-only pre-trained models (BERT, GPT-2, Transformer...
- Ovis-Image Technical Report : Abstract: We introduce $\textbf{Ovis-Image}$, a 7B text-to-image model specifically optimized for high-quality text rendering, designed to operate efficiently under stringent computational constraints...
- MIMM-X: Disentangling Spurious Correlations for Medical Image Analysis : Abstract: Deep learning models can excel on medical tasks, yet often experience spurious correlations, known as shortcut learning, leading to poor generalization in new environments. Particularly in m...
- A transfer learning approach for automatic conflicts detection in software requirement sentence pairs based on dual encoders : Abstract: Software Requirement Document (RD) typically contain tens of thousands of individual requirements, and ensuring consistency among these requirements is critical for the success of software e...
- From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning : Abstract: Recent advances in vision-language reasoning underscore the importance of thinking with images, where models actively ground their reasoning in visual evidence. Yet, prevailing frameworks tr...
- Delta-XAI: A Unified Framework for Explaining Prediction Changes in Online Time Series Monitoring : Abstract: Explaining online time series monitoring models is crucial across sensitive domains such as healthcare and finance, where temporal and contextual prediction dynamics underpin critical decisi...
- High-Resolution Probabilistic Data-Driven Weather Modeling with a Stretched-Grid : Abstract: We present a probabilistic data-driven weather model capable of providing an ensemble of high spatial resolution realizations of 87 variables at arbitrary forecast length and ensemble size. ...
- Conveying Imagistic Thinking in TCM Translation: A Prompt Engineering and LLM-Based Evaluation Framework : Abstract: Traditional Chinese Medicine theory is built on imagistic thinking, in which medical principles and diagnostic and therapeutic logic are structured through metaphor and metonymy. However, ex...
- Evaluating the Clinical Impact of Generative Inpainting on Bone Age Estimation : Abstract: Generative foundation models can remove visual artifacts through realistic image inpainting, but their impact on medical AI performance remains uncertain. Pediatric hand radiographs often co...
- Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding : Abstract: Reading scene text, that is, text appearing in images, has numerous application areas, including assistive technology, search, and e-commerce. Although scene text recognition in English has ...
- What If They Took the Shot? A Hierarchical Bayesian Framework for Counterfactual Expected Goals : Abstract: This study develops a hierarchical Bayesian framework that integrates expert domain knowledge to quantify player-specific effects in expected goals (xG) estimation, addressing a limitation o...
- SpaceMind: Camera-Guided Modality Fusion for Spatial Reasoning in Vision-Language Models : Abstract: Large vision-language models (VLMs) show strong multimodal understanding but still struggle with 3D spatial reasoning, such as distance estimation, size comparison, and cross-view consistenc...
- Fairness in the Multi-Secretary Problem : Abstract: This paper bridges two perspectives: it studies the multi-secretary problem through the fairness lens of social choice, and examines multi-winner elections from the viewpoint of online decis...
- Mind Reading or Misreading? LLMs on the Big Five Personality Test : Abstract: We evaluate large language models (LLMs) for automatic personality prediction from text under the binary Five Factor Model (BIG5). Five models -- including GPT-4 and lightweight open-source ...
- Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models : Abstract: The complex reasoning ability of Large Language Models (LLMs) poses a critical bottleneck for their practical applications. Test-time expansion methods such as Tree-of-Thought (ToT) and Grap...
- Automated Generation of MDPs Using Logic Programming and LLMs for Robotic Applications : Abstract: We present a novel framework that integrates Large Language Models (LLMs) with automated planning and formal verification to streamline the creation and use of Markov Decision Processes (MDP...
- REVEAL: Reasoning-enhanced Forensic Evidence Analysis for Explainable AI-generated Image Detection : Abstract: With the rapid advancement of generative models, visually realistic AI-generated images have become increasingly difficult to distinguish from authentic ones, posing severe threats to social...
- AI for software engineering: from probable to provable : Abstract: Vibe coding, the much-touted use of AI techniques for programming, faces two overwhelming obstacles: the difficulty of specifying goals ("prompt engineering" is a form of requirements engine...
- Identification of Malicious Posts on the Dark Web Using Supervised Machine Learning : Abstract: Given the constant growth and increasing sophistication of cyberattacks, cybersecurity can no longer rely solely on traditional defense techniques and tools. Proactive detection of cyber thr...
- Listwise Preference Optimization with Element-wise Confusions for Aspect Sentiment Quad Prediction : Abstract: Aspect sentiment quad prediction (ASQP) is inherently challenging to predict a structured quadruple with four core sentiment elements, including aspect term (a), aspect category (c), opinion...
- Obstruction reasoning for robotic grasping : Abstract: Successful robotic grasping in cluttered environments not only requires a model to visually ground a target object but also to reason about obstructions that must be cleared beforehand. Whil...
- Vision Bridge Transformer at Scale : Abstract: We introduce Vision Bridge Transformer (ViBT), a large-scale instantiation of Brownian Bridge Models designed for conditional generation. Unlike traditional diffusion models that transform n...
- GAVINA: flexible aggressive undervolting for bit-serial mixed-precision DNN acceleration : Abstract: Voltage overscaling, or undervolting, is an enticing approximate technique in the context of energy-efficient Deep Neural Network (DNN) acceleration, given the quadratic relationship between...
- Tourism Question Answer System in Indian Language using Domain-Adapted Foundation Models : Abstract: This article presents the first comprehensive study on designing a baseline extractive question-answering (QA) system for the Hindi tourism domain, with a specialized focus on the Varanasi-a...
- Learning to Predict Aboveground Biomass from RGB Images with 3D Synthetic Scenes : Abstract: Forests play a critical role in global ecosystems by supporting biodiversity and mitigating climate change via carbon sequestration. Accurate aboveground biomass (AGB) estimation is essentia...
- One-Shot Secure Aggregation: A Hybrid Cryptographic Protocol for Private Federated Learning in IoT : Abstract: Federated Learning (FL) offers a promising approach to collaboratively train machine learning models without centralizing raw data, yet its scalability is often throttled by excessive commun...
- Robust HRRP Recognition under Interrupted Sampling Repeater Jamming using a Prior Jamming Information-Guided Network : Abstract: Radar automatic target recognition (RATR) based on high-resolution range profile (HRRP) has attracted increasing attention due to its ability to capture fine-grained structural features. How...
- Agentic AI Framework for Smart Inventory Replenishment : Abstract: In contemporary retail, the variety of products available (e.g. clothing, groceries, cosmetics, frozen goods) make it difficult to predict the demand, prevent stockouts, and find high-potent...
- Hierarchical AI-Meteorologist: LLM-Agent System for Multi-Scale and Explainable Weather Forecast Reporting : Abstract: We present the Hierarchical AI-Meteorologist, an LLM-agent system that generates explainable weather reports using a hierarchical forecast reasoning and weather keyword generation. Unlike st...
- Towards Continuous Intelligence Growth: Self-Training, Continual Learning, and Dual-Scale Memory in SuperIntelliAgent : Abstract: We introduce SuperIntelliAgent, an agentic learning framework that couples a trainable small diffusion model (the learner) with a frozen large language model (the verifier) to enable continu...
- Thinking by Doing: Building Efficient World Model Reasoning in LLMs via Multi-turn Interaction : Abstract: Developing robust world model reasoning is crucial for large language model (LLM) agents to plan and interact in complex environments. While multi-turn interaction offers a superior understa...
- $\mathcal{E}_0$: Enhancing Generalization and Fine-Grained Control in VLA Models via Continuized Discrete Diffusion : Abstract: Vision-Language-Action (VLA) models offer a unified framework for robotic manipulation by integrating visual perception, language understanding, and control generation. Yet existing VLA mode...
- EvalCards: A Framework for Standardized Evaluation Reporting : Abstract: Evaluation has long been a central concern in NLP, and transparent reporting practices are more critical than ever in today's landscape of rapidly released open-access models. Drawing on a s...
- TIP and Polish: Text-Image-Prototype Guided Multi-Modal Generation via Commonality-Discrepancy Modeling and Refinement : Abstract: Multi-modal generation struggles to ensure thematic coherence and style consistency. Semantically, existing methods suffer from cross-modal mismatch and lack explicit modeling of commonality...
- Cacheback: Speculative Decoding With Nothing But Cache : Abstract: We present Cacheback Decoding, a training-free and model-agnostic speculative decoding method that exploits the locality in language to accelerate Large Language Model (LLM) inference. Cache...
- CSV-Decode: Certifiable Sub-Vocabulary Decoding for Efficient Large Language Model Inference : Abstract: Large language models face significant computational bottlenecks during inference due to the expensive output layer computation over large vocabularies. We present CSV-Decode, a novel approa...
- Evaluating Embedding Generalization: How LLMs, LoRA, and SLERP Shape Representational Geometry : Abstract: We investigate the generalization properties of dense text embeddings when the embedding backbone is a large language model (LLM) versus when it is a non-LLM encoder, and we study the extent...
- A General Highly Accurate Online Planning Method Integrating Large Language Models into Nested Rollout Policy Adaptation for Dialogue Tasks : Abstract: In goal-oriented dialogue tasks, the main challenge is to steer the interaction towards a given goal within a limited number of turns. Existing approaches either rely on elaborate prompt eng...
- Sensing and Understanding the World over Air: A Large Multimodal Model for Mobile Networks : Abstract: Large models (LMs), such as ChatGPT, have made a significant impact across diverse domains and hold great potential to facilitate the evolution of network intelligence. Wireless-native multi...
- Lost in the Pipeline: How Well Do Large Language Models Handle Data Preparation? : Abstract: Large language models have recently demonstrated their exceptional capabilities in supporting and automating various tasks. Among the tasks worth exploring for testing large language model c...
- Quantifying and Mitigating Selection Bias in LLMs: A Transferable LoRA Fine-Tuning and Efficient Majority Voting Approach : Abstract: Multiple Choice Question (MCQ) answering is a widely used method for evaluating the performance of Large Language Models (LLMs). However, LLMs often exhibit selection bias in MCQ tasks, wher...
- EulerESG: Automating ESG Disclosure Analysis with LLMs : Abstract: Environmental, Social, and Governance (ESG) reports have become central to how companies communicate climate risk, social impact, and governance practices, yet they are still published prima...
- GPS: General Per-Sample Prompter : Abstract: LLMs are sensitive to prompting, with task performance often hinging on subtle, sometimes imperceptible variations in phrasing. As a result, crafting effective prompts manually remains chall...
- German General Personas: A Survey-Derived Persona Prompt Collection for Population-Aligned LLM Studies : Abstract: The use of Large Language Models (LLMs) for simulating human perspectives via persona prompting is gaining traction in computational social science. However, well-curated, empirically ground...
- PromptTailor: Multi-turn Intent-Aligned Prompt Synthesis for Lightweight LLMs : Abstract: Lightweight language models remain attractive for on-device and privacy-sensitive applications, but their responses are highly sensitive to prompt quality. For open-ended generation, non-exp...
- Goal-Directed Search Outperforms Goal-Agnostic Memory Compression in Long-Context Memory Tasks : Abstract: How to enable human-like long-term memory in large language models (LLMs) has been a central question for unlocking more general capabilities such as few-shot generalization. Existing memory...
- Affective Multimodal Agents with Proactive Knowledge Grounding for Emotionally Aligned Marketing Dialogue : Abstract: Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conv...
- Beyond Component Strength: Synergistic Integration and Adaptive Calibration in Multi-Agent RAG Systems : Abstract: Building reliable retrieval-augmented generation (RAG) systems requires more than adding powerful components; it requires understanding how they interact. Using ablation studies on 50 querie...
- A Benchmark for Procedural Memory Retrieval in Language Agents : Abstract: Current AI agents excel in familiar settings, but fail sharply when faced with novel tasks with unseen vocabularies -- a core limitation of procedural memory systems. We present the first be...
- Identifying Quantum Structure in AI Language: Evidence for Evolutionary Convergence of Human and Artificial Cognition : Abstract: We present the results of cognitive tests on conceptual combinations, performed using specific Large Language Models (LLMs) as test subjects. In the first test, performed with ChatGPT and Ge...
- HUMORCHAIN: Theory-Guided Multi-Stage Reasoning for Interpretable Multimodal Humor Generation : Abstract: Humor, as both a creative human activity and a social binding mechanism, has long posed a major challenge for AI generation. Although producing humor requires complex cognitive reasoning and...
- RoSA: Enhancing Parameter-Efficient Fine-Tuning via RoPE-aware Selective Adaptation in Large Language Models : Abstract: Fine-tuning large language models is essential for task-specific adaptation, yet it remains computationally prohibitive. Parameter-Efficient Fine-Tuning (PEFT) methods have emerged as a solu...
- Asking LLMs to Verify First is Almost Free Lunch : Abstract: To enhance the reasoning capabilities of Large Language Models (LLMs) without high costs of training, nor extensive test-time sampling, we introduce Verification-First (VF), a strategy that ...
- Closing the Performance Gap Between AI and Radiologists in Chest X-Ray Reporting : Abstract: AI-assisted report generation offers the opportunity to reduce radiologists' workload stemming from expanded screening guidelines, complex cases and workforce shortages, while maintaining di...
- R2Q: Towards Robust 2-Bit Large Language Models via Residual Refinement Quantization : Abstract: The rapid progress of Large Language Models (LLMs) has brought substantial computational and memory demands, spurring the adoption of low-bit quantization. While 8-bit and 4-bit formats have...
- Polarity-Aware Probing for Quantifying Latent Alignment in Language Models : Abstract: Advances in unsupervised probes such as Contrast-Consistent Search (CCS), which reveal latent beliefs without relying on token outputs, raise the question of whether these methods can reliab...
- The Rapid Growth of AI Foundation Model Usage in Science : Abstract: We present the first large-scale analysis of AI foundation model usage in science - not just citations or keywords. We find that adoption has grown rapidly, at nearly-exponential rates, with...
- Decoding inner speech with an end-to-end brain-to-text neural interface : Abstract: Speech brain-computer interfaces (BCIs) aim to restore communication for people with paralysis by translating neural activity into text. Most systems use cascaded frameworks that decode phon...
- EduMod-LLM: A Modular Approach for Designing Flexible and Transparent Educational Assistants : Abstract: With the growing use of Large Language Model (LLM)-based Question-Answering (QA) systems in education, it is critical to evaluate their performance across individual pipeline components. In ...
- A Lightweight Approach to Detection of AI-Generated Texts Using Stylometric Features : Abstract: A growing number of AI-generated texts raise serious concerns. Most existing approaches to AI-generated text detection rely on fine-tuning large transformer models or building ensembles, whi...
- QuantumChem-200K: A Large-Scale Open Organic Molecular Dataset for Quantum-Chemistry Property Screening and Language Model Benchmarking : Abstract: The discovery of next-generation photoinitiators for two-photon polymerization (TPP) is hindered by the absence of large, open datasets containing the quantum-chemical and photophysical prop...
- Building Domain-Specific Small Language Models via Guided Data Generation : Abstract: Large Language Models (LLMs) have shown remarkable success in supporting a wide range of knowledge-intensive tasks. In specialized domains, there is growing interest in leveraging LLMs to as...
- Proactive Defense: Compound AI for Detecting Persuasion Attacks and Measuring Inoculation Effectiveness : Abstract: This paper introduces BRIES, a novel compound AI architecture designed to detect and measure the effectiveness of persuasion attacks across information environments. We present a system with...
- SO-Bench: A Structural Output Evaluation of Multimodal LLMs : Abstract: Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despit...
- Semantics as a Shield: Label Disguise Defense (LDD) against Prompt Injection in LLM Sentiment Classification : Abstract: Large language models are increasingly used for text classification tasks such as sentiment analysis, yet their reliance on natural language prompts exposes them to prompt injection attacks....
- Extracting Disaster Impacts and Impact Related Locations in Social Media Posts Using Large Language Models : Abstract: Large-scale disasters can often result in catastrophic consequences on people and infrastructure. Situation awareness about such disaster impacts generated by authoritative data from in-situ...
- Who Owns the Knowledge? Copyright, GenAI, and the Future of Academic Publishing : Abstract: The integration of generative artificial intelligence (GenAI) and large language models (LLMs) into scientific research and higher education presents a paradigm shift, offering revolutionizi...
- Medical Malice: A Dataset for Context-Aware Safety in Healthcare LLMs : Abstract: The integration of Large Language Models (LLMs) into healthcare demands a safety paradigm rooted in \textit{primum non nocere}. However, current alignment techniques rely on generic definiti...
- A Longitudinal Measurement of Privacy Policy Evolution for Large Language Models : Abstract: Large language model (LLM) services have been rapidly integrated into people's daily lives as chatbots and agentic systems. They are nourished by collecting rich streams of data, raising pri...
- fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding : Abstract: Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unex...
- Factors That Support Grounded Responses in LLM Conversations: A Rapid Review : Abstract: Large language models (LLMs) may generate outputs that are misaligned with user intent, lack contextual grounding, or exhibit hallucinations during conversation, which compromises the reliab...
- LAYER: A Quantitative Explainable AI Framework for Decoding Tissue-Layer Drivers of Myofascial Low Back Pain : Abstract: Myofascial pain (MP) is a leading cause of chronic low back pain, yet its tissue-level drivers remain poorly defined and lack reliable image biomarkers. Existing studies focus predominantly ...
- BeeRNA: tertiary structure-based RNA inverse folding using Artificial Bee Colony : Abstract: The Ribonucleic Acid (RNA) inverse folding problem, designing nucleotide sequences that fold into specific tertiary structures, is a fundamental computational biology problem with important ...
- Reducing research bureaucracy in UK higher education: Can generative AI assist with the internal evaluation of quality? : Abstract: This paper examines the potential for generative artificial intelligence (GenAI) to assist with internal review processes for research quality evaluations in UK higher education and particul...
- Tacit Bidder-Side Collusion: Artificial Intelligence in Dynamic Auctions : Abstract: We study whether large language models acting as autonomous bidders can tacitly collude by coordinating when to accept platform posted payouts in repeated Dutch auctions, without any communi...
- Dark Speculation: Combining Qualitative and Quantitative Understanding in Frontier AI Risk Analysis : Abstract: Estimating catastrophic harms from frontier AI is hindered by deep ambiguity: many of its risks are not only unobserved but unanticipated by analysts. The central limitation of current risk ...
- FLAWS: A Benchmark for Error Identification and Localization in Scientific Papers : Abstract: The identification and localization of errors is a core task in peer review, yet the exponential growth of scientific output has made it increasingly difficult for human reviewers to reliabl...
- LILAD: Learning In-context Lyapunov-stable Adaptive Dynamics Models : Abstract: System identification in control theory aims to approximate dynamical systems from trajectory data. While neural networks have demonstrated strong predictive accuracy, they often fail to pre...
- Improving Score Reliability of Multiple Choice Benchmarks with Consistency Evaluation and Altered Answer Choices : Abstract: In this work we present the Consistency-Rebalanced Accuracy (CoRA) metric, improving the reliability of Large Language Model (LLM) scores computed on multiple choice (MC) benchmarks. Our met...
- Towards a Foundation Model for Partial Differential Equations Across Physics Domains : Abstract: We present PDE-FM, a modular foundation model for physics-informed machine learning that unifies spatial, spectral, and temporal reasoning across heterogeneous partial differential equation ...
- Advancing Marine Bioacoustics with Deep Generative Models: A Hybrid Augmentation Strategy for Southern Resident Killer Whale Detection : Abstract: Automated detection and classification of marine mammals vocalizations is critical for conservation and management efforts but is hindered by limited annotated datasets and the acoustic comp...
- LLM-Empowered Event-Chain Driven Code Generation for ADAS in SDV systems : Abstract: This paper presents an event-chain-driven, LLM-empowered workflow for generating validated, automotive code from natural-language requirements. A Retrieval-Augmented Generation (RAG) layer r...
- Bridging Planning and Execution: Multi-Agent Path Finding Under Real-World Deadlines : Abstract: The Multi-Agent Path Finding (MAPF) problem aims to find collision-free paths for multiple agents while optimizing objectives such as the sum of costs or makespan. MAPF has wide applications...
- Standardized Threat Taxonomy for AI Security, Governance, and Regulatory Compliance : Abstract: The accelerating deployment of artificial intelligence systems across regulated sectors has exposed critical fragmentation in risk assessment methodologies. A significant "language barrier" ...
- PathReasoning: A multimodal reasoning agent for query-based ROI navigation on whole-slide images : Abstract: Deciphering tumor microenvironment from Whole Slide Images (WSIs) is intriguing as it is key to cancer diagnosis, prognosis and treatment response. While these gigapixel images on one hand o...
- Adaptive Parameter Optimization for Robust Remote Photoplethysmography : Abstract: Remote photoplethysmography (rPPG) enables contactless vital sign monitoring using standard RGB cameras. However, existing methods rely on fixed parameters optimized for particular lighting ...
- Toward Automated and Trustworthy Scientific Analysis and Visualization with LLM-Generated Code : Abstract: As modern science becomes increasingly data-intensive, the ability to analyze and visualize large-scale, complex datasets is critical to accelerating discovery. However, many domain scientis...
- Exploring Dynamic Properties of Backdoor Training Through Information Bottleneck : Abstract: Understanding how backdoor data influences neural network training dynamics remains a complex and underexplored challenge. In this paper, we present a rigorous analysis of the impact of back...
- Prompted Policy Search: Reinforcement Learning through Linguistic and Numerical Reasoning in LLMs : Abstract: Reinforcement Learning (RL) traditionally relies on scalar reward signals, limiting its ability to leverage the rich semantic knowledge often available in real-world tasks. In contrast, huma...
- Does the Model Say What the Data Says? A Simple Heuristic for Model Data Alignment : Abstract: In this work, we propose a simple and computationally efficient framework to evaluate whether machine learning models align with the structure of the data they learn from; that is, whether \...
- Heterogeneous Multi-Agent Reinforcement Learning with Attention for Cooperative and Scalable Feature Transformation : Abstract: Feature transformation enhances downstream task performance by generating informative features through mathematical feature crossing. Despite the advancements in deep learning, feature trans...
- WalkCLIP: Multimodal Learning for Urban Walkability Prediction : Abstract: Urban walkability is a cornerstone of public health, sustainability, and quality of life. Traditional walkability assessments rely on surveys and field audits, which are costly and difficult...
- ABLE: Using Adversarial Pairs to Construct Local Models for Explaining Model Predictions : Abstract: Machine learning models are increasingly used in critical applications but are mostly "black boxes" due to their lack of transparency. Local explanation approaches, such as LIME, address thi...
- DeepGI: Explainable Deep Learning for Gastrointestinal Image Classification : Abstract: This paper presents a comprehensive comparative model analysis on a novel gastrointestinal medical imaging dataset, comprised of 4,000 endoscopic images spanning four critical disease classe...
- The Risk-Adjusted Intelligence Dividend: A Quantitative Framework for Measuring AI Return on Investment Integrating ISO 42001 and Regulatory Exposure : Abstract: Organizations investing in artificial intelligence face a fundamental challenge: traditional return on investment calculations fail to capture the dual nature of AI implementations, which si...
- DialBench: Towards Accurate Reading Recognition of Pointer Meter using Large Foundation Models : Abstract: The precise reading recognition of pointer meters plays a key role in smart power systems, but existing approaches remain fragile due to challenges like reflections, occlusions, dynamic view...
- A Safety and Security Framework for Real-World Agentic Systems : Abstract: This paper introduces a dynamic and actionable framework for securing agentic AI systems in enterprise deployment. We contend that safety and security are not merely fixed attributes of indi...
- Joint Estimation of Sea State and Vessel Parameters Using a Mass-Spring-Damper Equivalence Model : Abstract: Real-time sea state estimation is vital for applications like shipbuilding and maritime safety. Traditional methods rely on accurate wave-vessel transfer functions to estimate wave spectra f...
- When Do Domain-Specific Foundation Models Justify Their Cost? A Systematic Evaluation Across Retinal Imaging Tasks : Abstract: Large vision foundation models have been widely adopted for retinal disease classification without systematic evidence justifying their parameter requirements. In the present work we address...
- AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models : Abstract: Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains...
- MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis : Abstract: Accurate medical diagnosis often involves progressive visual focusing and iterative reasoning, characteristics commonly observed in clinical workflows. While recent vision-language models de...
- Predicting Public Health Impacts of Electricity Usage : Abstract: The electric power sector is a leading source of air pollutant emissions, impacting the public health of nearly every community. Although regulatory measures have reduced air pollutants, fos...
- Distillability of LLM Security Logic: Predicting Attack Success Rate of Outline Filling Attack via Ranking Regression : Abstract: In the realm of black-box jailbreak attacks on large language models (LLMs), the feasibility of constructing a narrow safety proxy, a lightweight model designed to predict the attack success...
- ICM-SR: Image-Conditioned Manifold Regularization for Image Super-Resoultion : Abstract: Real world image super-resolution (Real-ISR) often leverages the powerful generative priors of text-to-image diffusion models by regularizing the output to lie on their learned manifold. How...
- A Multi-View Multi-Timescale Hypergraph-Empowered Spatiotemporal Framework for EV Charging Forecasting : Abstract: Accurate electric vehicle (EV) charging demand forecasting is essential for stable grid operation and proactive EV participation in electricity market. Existing forecasting methods, particul...
- A Fast and Flat Federated Learning Method via Weighted Momentum and Sharpness-Aware Minimization : Abstract: In federated learning (FL), models must \emph{converge quickly} under tight communication budgets while \emph{generalizing} across non-IID client distributions. These twin requirements have ...
- Binary-30K: A Heterogeneous Dataset for Deep Learning in Binary Analysis and Malware Detection : Abstract: Deep learning research for binary analysis faces a critical infrastructure gap. Today, existing datasets target single platforms, require specialized tooling, or provide only hand-engineered...
- Decomposed Trust: Exploring Privacy, Adversarial Robustness, Fairness, and Ethics of Low-Rank LLMs : Abstract: Large language models (LLMs) have driven major advances across domains, yet their massive size hinders deployment in resource-constrained settings. Model compression addresses this challenge...
- Stacked Ensemble of Fine-Tuned CNNs for Knee Osteoarthritis Severity Grading : Abstract: Knee Osteoarthritis (KOA) is a musculoskeletal condition that can cause significant limitations and impairments in daily activities, especially among older individuals. To evaluate the sever...
- RemedyGS: Defend 3D Gaussian Splatting against Computation Cost Attacks : Abstract: As a mainstream technique for 3D reconstruction, 3D Gaussian splatting (3DGS) has been applied in a wide range of applications and services. Recent studies have revealed critical vulnerabili...
- Towards Heterogeneous Quantum Federated Learning: Challenges and Solutions : Abstract: Quantum federated learning (QFL) combines quantum computing and federated learning to enable decentralized model training while maintaining data privacy. QFL can improve computational effici...
- A Theoretically Grounded Hybrid Ensemble for Reliable Detection of LLM-Generated Text : Abstract: The rapid proliferation of Large Language Models (LLMs) has blurred the line between human and machine authorship, creating practical risks for academic integrity and information reliability...
- IMTalker: Efficient Audio-driven Talking Face Generation with Implicit Motion Transfer : Abstract: Talking face generation aims to synthesize realistic speaking portraits from a single image, yet existing methods often rely on explicit optical flow and local warping, which fail to model c...
- Real-Time Long Horizon Air Quality Forecasting via Group-Relative Policy Optimization : Abstract: Accurate long horizon forecasting of particulate matter (PM) concentration fields is essential for operational public health decisions. However, achieving reliable forecasts remains challeng...
- Focused Chain-of-Thought: Efficient LLM Reasoning via Structured Input Information : Abstract: Recent large language models achieve strong reasoning performance by generating detailed chain-of-thought traces, but this often leads to excessive token use and high inference latency. Exis...
- Enhanced Graph Convolutional Network with Chebyshev Spectral Graph and Graph Attention for Autism Spectrum Disorder Classification : Abstract: ASD is a complicated neurodevelopmental disorder marked by variation in symptom presentation and neurological underpinnings, making early and objective diagnosis extremely problematic. This ...
- Aligning Artificial Superintelligence via a Multi-Box Protocol : Abstract: We propose a novel protocol for aligning artificial superintelligence (ASI) based on mutual verification among multiple isolated systems that self-modify to achieve alignment. The protocol o...
- Evaluating Strategies for Synthesizing Clinical Notes for Medical Multimodal AI : Abstract: Multimodal (MM) learning is emerging as a promising paradigm in biomedical artificial intelligence (AI) applications, integrating complementary modality, which highlight different aspects of...
- Pathology-Aware Prototype Evolution via LLM-Driven Semantic Disambiguation for Multicenter Diabetic Retinopathy Diagnosis : Abstract: Diabetic retinopathy (DR) grading plays a critical role in early clinical intervention and vision preservation. Recent explorations predominantly focus on visual lesion feature extraction th...
- Real-Time Procedural Learning From Experience for AI Agents : Abstract: Learning how to do things from trial and error in real time is a hallmark of biological intelligence, yet most LLM-based agents lack mechanisms to acquire procedural knowledge after deployme...
- Hybrid Stackelberg Game and Diffusion-based Auction for Two-tier Agentic AI Task Offloading in Internet of Agents : Abstract: The Internet of Agents (IoA) is rapidly gaining prominence as a foundational architecture for interconnected intelligent systems, designed to facilitate seamless discovery, communication, an...
- A perceptual bias of AI Logical Argumentation Ability in Writing : Abstract: Can machines think? This is a central question in artificial intelligence research. However, there is a substantial divergence of views on the answer to this question. Why do people have suc...
- WearVQA: A Visual Question Answering Benchmark for Wearables in Egocentric Authentic Real-world scenarios : Abstract: We introduce WearVQA, the first benchmark specifically designed to evaluate the Visual Question Answering (VQA) capabilities of multi-model AI assistant on wearable devices like smart glasse...
- Embedded Universal Predictive Intelligence: a coherent framework for multi-agent learning : Abstract: The standard theory of model-free reinforcement learning assumes that the environment dynamics are stationary and that agents are decoupled from their environment, such that policies are tre...
- Training High-Level Schedulers with Execution-Feedback Reinforcement Learning for Long-Horizon GUI Automation : Abstract: The rapid development of large vision-language model (VLM) has greatly promoted the research of GUI agent. However, GUI agents still face significant challenges in handling long-horizon task...
- Co-Evolving Agents: Learning from Failures as Hard Negatives : Abstract: The rapid progress of large foundation models has accelerated the development of task-specialized agents across diverse domains. However, the effectiveness of agents remains tightly coupled ...
- RecToM: A Benchmark for Evaluating Machine Theory of Mind in LLM-based Conversational Recommender Systems : Abstract: Large Language models are revolutionizing the conversational recommender systems through their impressive capabilities in instruction comprehension, reasoning, and human interaction. A core ...
- When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming : Abstract: Numerical simulations have revolutionized the industrial design process by reducing prototyping costs, design iterations, and enabling product engineers to explore the design space more effi...
- Enhanced Conditional Generation of Double Perovskite by Knowledge-Guided Language Model Feedback : Abstract: Double perovskites (DPs) are promising candidates for sustainable energy technologies due to their compositional tunability and compatibility with low-energy fabrication, yet their vast desi...
- Swarms of Large Language Model Agents for Protein Sequence Design with Experimental Validation : Abstract: Designing proteins de novo with tailored structural, physicochemical, and functional properties remains a grand challenge in biotechnology, medicine, and materials science, due to the vastne...
- Tracing Footsteps of Similar Cities: Modeling Urban Economic Vitality with Dynamic Inter-City Graph Embeddings : Abstract: Urban economic vitality is a crucial indicator of a city's long-term growth potential, comprising key metrics such as the annual number of new companies and the population employed. However,...
- On the Complexity of the Grounded Semantics for Infinite Argumentation Frameworks : Abstract: Argumentation frameworks, consisting of arguments and an attack relation representing conflicts, are fundamental for formally studying reasoning under conflicting information. We use methods...
- Who is Afraid of Minimal Revision? : Abstract: The principle of minimal change in belief revision theory requires that, when accepting new information, one keeps one's belief state as close to the initial belief state as possible. This i...
- Structured Extraction from Business Process Diagrams Using Vision-Language Models : Abstract: Business Process Model and Notation (BPMN) is a widely adopted standard for representing complex business workflows. While BPMN diagrams are often exchanged as visual images, existing method...
- A Computable Game-Theoretic Framework for Multi-Agent Theory of Mind : Abstract: Originating in psychology, $\textit{Theory of Mind}$ (ToM) has attracted significant attention across multiple research communities, especially logic, economics, and robotics. Most psycholog...
- Counting Still Counts: Understanding Neural Complex Query Answering Through Query Relaxation : Abstract: Neural methods for Complex Query Answering (CQA) over knowledge graphs (KGs) are widely believed to learn patterns that generalize beyond explicit graph structure, allowing them to infer ans...
- DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning : Abstract: Large language models have made significant progress in mathematical reasoning, which serves as an important testbed for AI and could impact scientific research if further advanced. By scali...
- AI Deception: Risks, Dynamics, and Controls : Abstract: As intelligence increases, so does its shadow. AI deception, in which systems induce false beliefs to secure self-beneficial outcomes, has evolved from a speculative concern to an empiricall...
- Optimized Agent Shift Scheduling Using Multi-Phase Allocation Approach : Abstract: Effective agent shift scheduling is crucial for businesses, especially in the Contact Center as a Service (CCaaS) industry, to ensure seamless operations and fulfill employee needs. Most stu...
- Geometrically-Constrained Agent for Spatial Reasoning : Abstract: Vision Language Models (VLMs) exhibit a fundamental semantic-to-geometric gap in spatial reasoning: they excel at qualitative semantic inference but their reasoning operates within a lossy s...
- Solving Context Window Overflow in AI Agents : Abstract: Large Language Models (LLMs) have become increasingly capable of interacting with external tools, granting access to specialized knowledge beyond their training data - critical in dynamic, k...
- Agentic AI Framework for Individuals with Disabilities and Neurodivergence: A Multi-Agent System for Healthy Eating, Daily Routines, and Inclusive Well-Being : Abstract: The paper presents a detailed Agentic Artificial Intelligence (AI) model that would enable people with disabilities and neurodivergence to lead healthier lives and have more regular days. Th...
- Agentic AI Framework for Cloudburst Prediction and Coordinated Response : Abstract: The challenge is growing towards extreme and short-duration rainfall events like a cloudburst that are peculiar to the traditional forecasting systems, in which the predictions and the respo...
- Fast dynamical similarity analysis : Abstract: To understand how neural systems process information, it is often essential to compare one circuit with another, one brain with another, or data with a model. Traditional similarity measures...
- InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents : Abstract: Data analysis has become an indispensable part of scientific research. To discover the latent knowledge and insights hidden within massive datasets, we need to perform deep exploratory analy...
- ORION: Teaching Language Models to Reason Efficiently in the Language of Thought : Abstract: Large Reasoning Models (LRMs) achieve strong performance in mathematics, code generation, and task planning, but their reliance on long chains of verbose "thinking" tokens leads to high late...
- TIM-PRM: Verifying multimodal reasoning with Tool-Integrated PRM : Abstract: Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that...
- MindPower: Enabling Theory-of-Mind Reasoning in VLM-based Embodied Agents : Abstract: Theory of Mind (ToM) refers to the ability to infer others' mental states, such as beliefs, desires, and intentions. Current vision-language embodied agents lack ToM-based decision-making, a...
- Does Self-Evaluation Enable Wireheading in Language Models? : Abstract: Self-evaluation is increasingly central to language model training, from constitutional AI to self-refinement. We investigate whether coupling self-evaluation to reward signals creates incen...
- Evolutionary Discovery of Heuristic Policies for Traffic Signal Control : Abstract: Traffic Signal Control (TSC) involves a challenging trade-off: classic heuristics are efficient but oversimplified, while Deep Reinforcement Learning (DRL) achieves high performance yet suff...
- Peer-to-Peer Energy Trading in Dairy Farms using Multi-Agent Reinforcement Learning : Abstract: The integration of renewable energy resources in rural areas, such as dairy farming communities, enables decentralized energy management through Peer-to-Peer (P2P) energy trading. This resea...
- AgriCoT: A Chain-of-Thought Benchmark for Evaluating Reasoning in Vision-Language Models for Agriculture : Abstract: Recent advancements in Vision-Language Models (VLMs) have significantly transformed various industries. In agriculture, these dual-modal capabilities offer promising applications such as pre...
- Adapting Like Humans: A Metacognitive Agent with Test-time Reasoning : Abstract: Recent Vision-Language Models (VLMs) exhibit strong perceptual reasoning abilities, yet they often struggle to adapt efficiently when encountering novel tasks at test time. In contrast, huma...
- OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning : Abstract: High-quality and carefully curated data is a cornerstone of training medical large language models, as it directly impacts both generalization and robustness to unseen clinical tasks. We inv...
- Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering : Abstract: In this paper, we propose a novel Multi-Modal Scene Graph with Kolmogorov-Arnold Expert Network for Audio-Visual Question Answering (SHRIKE). The task aims to mimic human reasoning by extrac...
Research Sources: 666 | Generated: 12/1/2025
