AI RESEARCH PAPERS & ACADEMIC SOURCES
- FMVP: Masked Flow Matching for Adversarial Video Purification : Abstract: Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regr...
- SLGNet: Synergizing Structural Priors and Language-Guided Modulation for Multimodal Object Detection : Abstract: Multimodal object detection leveraging RGB and Infrared (IR) images is pivotal for robust perception in all-weather scenarios. While recent adapter-based approaches efficiently transfer RGB-...
- DiffProxy: Multi-View Human Mesh Recovery via Diffusion-Generated Dense Proxies : Abstract: Human mesh recovery from multi-view images faces a fundamental challenge: real-world datasets contain imperfect ground-truth annotations that bias the models' training, while synthetic data ...
- InfiniteVGGT: Visual Geometry Grounded Transformer for Endless Streams : Abstract: The grand vision of enabling persistent, large-scale 3D visual geometry understanding is shackled by the irreconcilable demands of scalability and long-term stability. While offline models l...
- Rank-based Geographical Regularization: Revisiting Contrastive Self-Supervised Learning for Multispectral Remote Sensing Imagery : Abstract: Self-supervised learning (SSL) has become a powerful paradigm for learning from large, unlabeled datasets, particularly in computer vision (CV). However, applying SSL to multispectral remote...
- SortWaste: A Densely Annotated Dataset for Object Detection in Industrial Waste Sorting : Abstract: The increasing production of waste, driven by population growth, has created challenges in managing and recycling materials effectively. Manual waste sorting is a common practice; however, i...
- 360DVO: Deep Visual Odometry for Monocular 360-Degree Camera : Abstract: Monocular omnidirectional visual odometry (OVO) systems leverage 360-degree cameras to overcome field-of-view limitations of perspective VO systems. However, existing methods, reliant on han...
- Prithvi-Complimentary Adaptive Fusion Encoder (CAFE): unlocking full-potential for flood inundation mapping : Abstract: Geo-Foundation Models (GFMs), have proven effective in diverse downstream applications, including semantic segmentation, classification, and regression tasks. However, in case of flood mappi...
- Fusion2Print: Deep Flash-Non-Flash Fusion for Contactless Fingerprint Matching : Abstract: Contactless fingerprint recognition offers a hygienic and convenient alternative to contact-based systems, enabling rapid acquisition without latent prints, pressure artifacts, or hygiene ri...
- BEDS: Bayesian Emergent Dissipative Structures : Abstract: We present BEDS (Bayesian Emergent Dissipative Structures), a theoretical framework that unifies concepts from non-equilibrium thermodynamics, Bayesian inference, information geometry, and m...
- Joint Semantic and Rendering Enhancements in 3D Gaussian Modeling with Anisotropic Local Encoding : Abstract: Recent works propose extending 3DGS with semantic feature vectors for simultaneous semantic segmentation and image rendering. However, these methods often treat the semantic and rendering br...
- Talk2Move: Reinforcement Learning for Text-Instructed Object-Level Geometric Transformation in Scenes : Abstract: We introduce Talk2Move, a reinforcement learning (RL) based diffusion framework for text-instructed spatial transformation of objects within scenes. Spatially manipulating objects in a scene...
- VINO: A Unified Visual Generator with Interleaved OmniModal Context : Abstract: We present VINO, a unified visual generator that performs image and video generation and editing within a single framework. Instead of relying on task-specific models or independent modules ...
- ExposeAnyone: Personalized Audio-to-Expression Diffusion Models Are Robust Zero-Shot Face Forgery Detectors : Abstract: Detecting unknown deepfake manipulations remains one of the most challenging problems in face forgery detection. Current state-of-the-art approaches fail to generalize to unseen manipulation...
- MetaFormer-driven Encoding Network for Robust Medical Semantic Segmentation : Abstract: Semantic segmentation is crucial for medical image analysis, enabling precise disease diagnosis and treatment planning. However, many advanced models employ complex architectures, limiting t...
- Simulations of MRI Guided and Powered Ferric Applicators for Tetherless Delivery of Therapeutic Interventions : Abstract: Magnetic Resonance Imaging (MRI) is a well-established modality for pre-operative planning and is also explored for intra-operative guidance of procedures such as intravascular interventions...
- Uncertainty-Calibrated Explainable AI for Fetal Ultrasound Plane Classification : Abstract: Fetal ultrasound standard-plane classification underpins reliable prenatal biometry and anomaly screening, yet real-world deployment is limited by domain shift, image noise, and poor calibra...
- YODA: Yet Another One-step Diffusion-based Video Compressor : Abstract: While one-step diffusion models have recently excelled in perceptual image compression, their application to video remains limited. Prior efforts typically rely on pretrained 2D autoencoders...
- DST-Calib: A Dual-Path, Self-Supervised, Target-Free LiDAR-Camera Extrinsic Calibration Network : Abstract: LiDAR-camera extrinsic calibration is essential for multi-modal data fusion in robotic perception systems. However, existing approaches typically rely on handcrafted calibration targets (e.g...
- An Energy-Efficient Smart Bus Transport Management System with Blind-Spot Collision Detection Ability : Abstract: Public bus transport systems in developing countries often suffer from a lack of real-time location updates and for users, making commuting inconvenient and unreliable for passengers. Furthe...
- Image Synthesis Using Spintronic Deep Convolutional Generative Adversarial Network : Abstract: The computational requirements of generative adversarial networks (GANs) exceed the limit of conventional Von Neumann architectures, necessitating energy efficient alternatives such as neuro...
- Sim2Real SAR Image Restoration: Metadata-Driven Models for Joint Despeckling and Sidelobes Reduction : Abstract: Synthetic aperture radar (SAR) provides valuable information about the Earth's surface under all weather and illumination conditions. However, the inherent phenomenon of speckle and the pres...
- OpenRT: An Open-Source Red Teaming Framework for Multimodal LLMs : Abstract: The rapid integration of Multimodal Large Language Models (MLLMs) into critical applications is increasingly hindered by persistent safety vulnerabilities. However, existing red-teaming benc...
- AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving : Abstract: End-to-end autonomous driving has rapidly progressed, enabling joint perception and planning in complex environments. In the planning stage, state-of-the-art (SOTA) end-to-end autonomous dri...
- DisCo-FLoc: Using Dual-Level Visual-Geometric Contrasts to Disambiguate Depth-Aware Visual Floorplan Localization : Abstract: Since floorplan data is readily available, long-term persistent, and robust to changes in visual appearance, visual Floorplan Localization (FLoc) has garnered significant attention. Existing...
- SketchRodGS: Sketch-based Extraction of Slender Geometries for Animating Gaussian Splatting Scenes : Abstract: Physics simulation of slender elastic objects often requires discretization as a polyline. However, constructing a polyline from Gaussian splatting is challenging as Gaussian splatting lacks...
- Dancing Points: Synthesizing Ballroom Dancing with Three-Point Inputs : Abstract: Ballroom dancing is a structured yet expressive motion category. Its highly diverse movement and complex interactions between leader and follower dancers make the understanding and synthesis...
- Answering from Sure to Uncertain: Uncertainty-Aware Curriculum Learning for Video Question Answering : Abstract: While significant advancements have been made in video question answering (VideoQA), the potential benefits of enhancing model generalization through tailored difficulty scheduling have been...
- Attire-Based Anomaly Detection in Restricted Areas Using YOLOv8 for Enhanced CCTV Security : Abstract: This research introduces an innovative security enhancement approach, employing advanced image analysis and soft computing. The focus is on an intelligent surveillance system that detects un...
- RaffeSDG: Random Frequency Filtering enabled Single-source Domain Generalization for Medical Image Segmentation : Abstract: Deep learning models often encounter challenges in making accurate inferences when there are domain shifts between the source and target data. This issue is particularly pronounced in clinic...
- PrevMatch: Revisiting and Maximizing Temporal Knowledge in Semi-Supervised Semantic Segmentation : Abstract: In semi-supervised semantic segmentation, the Mean Teacher- and co-training-based approaches are employed to mitigate confirmation bias and coupling problems. However, despite their high per...
- Neural Surface Reconstruction from Sparse Views Using Epipolar Geometry : Abstract: Reconstructing accurate surfaces from sparse multi-view images remains challenging due to severe geometric ambiguity and occlusions. Existing generalizable neural surface reconstruction meth...
- Training-Free Video Editing via Optical Flow-Enhanced Score Distillation : Abstract: The rapid advancement in visual generation, particularly the emergence of pre-trained text-to-image and text-to-video models, has catalyzed growing interest in training-free video editing re...
- Towards Vision-Language Geo-Foundation Model: A Survey : Abstract: Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks, such as image captioning, image-text retrieval, visual question answering, and visual gro...
- RAD: A Dataset and Benchmark for Real-Life Anomaly Detection with Robotic Observations : Abstract: Anomaly detection is a core capability for robotic perception and industrial inspection, yet most existing benchmarks are collected under controlled conditions with fixed viewpoints and stab...
- MotionCharacter: Fine-Grained Motion Controllable Human Video Generation : Abstract: Recent advancements in personalized Text-to-Video (T2V) generation have made significant strides in synthesizing character-specific content. However, these methods face a critical limitation...
- AdaVLN: Towards Visual Language Navigation in Continuous Indoor Environments with Moving Humans : Abstract: Visual Language Navigation is a task that challenges robots to navigate in realistic environments based on natural language instructions. While previous research has largely focused on stati...
- SJTU:Spatial judgments in multimodal models towards unified segmentation through coordinate detection : Abstract: Despite significant advances in vision-language understanding, implementing image segmentation within multimodal architectures remains a fundamental challenge in modern artificial intelligen...
- Bridging Geometry and Appearance: Topological Features for Robust Self-Supervised Segmentation : Abstract: Self-supervised semantic segmentation methods often fail when faced with appearance ambiguities. We argue that this is due to an over-reliance on unstable, appearance-based features such as ...
- Point Cloud to Mesh Reconstruction: Methods, Trade-offs, and Implementation Guide : Abstract: Reconstructing meshes from point clouds is a fundamental task in computer vision with applications spanning robotics, autonomous systems, and medical imaging. Selecting an appropriate learni...
- RingMo-Agent: A Unified Remote Sensing Foundation Model for Multi-Platform and Multi-Modal Reasoning : Abstract: Remote sensing (RS) images from multiple modalities and platforms exhibit diverse details due to differences in sensor characteristics and imaging perspectives. Existing vision-language rese...
- COMPASS: High-Efficiency Deep Image Compression with Arbitrary-scale Spatial Scalability : Abstract: Recently, neural network (NN)-based image compression studies have actively been made and has shown impressive performance in comparison to traditional methods. However, most of the works ha...
- Enhancing Blind Video Quality Assessment with Rich Quality-aware Features : Abstract: Blind video quality assessment (BVQA) is a highly challenging task due to the intrinsic complexity of video content and visual distortions, especially given the high popularity of social med...
- Energy Propagation in Scattering Convolution Networks Can Be Arbitrarily Slow : Abstract: We analyze energy decay for deep convolutional neural networks employed as feature extractors, including Mallat's wavelet scattering transform. For time-frequency scattering transforms based...
- P2U-SLAM: A Monocular Wide-FoV SLAM System Based on Point Uncertainty and Pose Uncertainty : Abstract: This paper presents P2U-SLAM, a visual Simultaneous Localization And Mapping (SLAM) system with a wide Field of View (FoV) camera, which utilizes pose uncertainty and point uncertainty. Whil...
- Cross-Layer Attentive Feature Upsampling for Low-latency Semantic Segmentation : Abstract: Semantic segmentation is a fundamental problem in computer vision and it requires high-resolution feature maps for dense prediction. Current coordinate-guided low-resolution feature interpol...
- CardioMOD-Net: A Modal Decomposition-Neural Network Framework for Diagnosis and Prognosis of HFpEF from Echocardiography Cine Loops : Abstract: Introduction: Heart failure with preserved ejection fraction (HFpEF) arises from diverse comorbidities and progresses through prolonged subclinical stages, making early diagnosis and prognos...
- GenCAMO: Scene-Graph Contextual Decoupling for Environment-aware and Mask-free Camouflage Image-Dense Annotation Generation : Abstract: Conceal dense prediction (CDP), especially RGB-D camouflage object detection and open-vocabulary camouflage object segmentation, plays a crucial role in advancing the understanding and reaso...
- Crowded Video Individual Counting Informed by Social Grouping and Spatial-Temporal Displacement Priors : Abstract: Video Individual Counting (VIC) is a recently introduced task aiming to estimate pedestrian flux from a video. It extends Video Crowd Counting (VCC) beyond the per-frame pedestrian count. In...
- MS-ISSM: Objective Quality Assessment of Point Clouds Using Multi-scale Implicit Structural Similarity : Abstract: The unstructured and irregular nature of point clouds poses a significant challenge for objective quality assessment (PCQA), particularly in establishing accurate perceptual feature correspo...
- XStreamVGGT: Extremely Memory-Efficient Streaming Vision Geometry Grounded Transformer with KV Cache Compression : Abstract: Learning-based 3D visual geometry models have benefited substantially from large-scale transformers. Among these, StreamVGGT leverages frame-wise causal attention for strong streaming recons...
- Real-Time LiDAR Point Cloud Densification for Low-Latency Spatial Data Transmission : Abstract: To realize low-latency spatial transmission system for immersive telepresence, there are two major problems: capturing dynamic 3D scene densely and processing them in real time. LiDAR sensor...
- UniSH: Unifying Scene and Human Reconstruction in a Feed-Forward Pass : Abstract: We present UniSH, a unified, feed-forward framework for joint metric-scale 3D scene and human reconstruction. A key challenge in this domain is the scarcity of large-scale, annotated real-wo...
- HyDRA: Hybrid Denoising Regularization for Measurement-Only DEQ Training : Abstract: Solving image reconstruction problems of the form \(\mathbf{A} \mathbf{x} = \mathbf{y}\) remains challenging due to ill-posedness and the lack of large-scale supervised datasets. Deep Equili...
- RFAssigner: A Generic Label Assignment Strategy for Dense Object Detection : Abstract: Label assignment is a critical component in training dense object detectors. State-of-the-art methods typically assign each training sample a positive and a negative weight, optimizing the a...
- S2M-Net: Spectral-Spatial Mixing for Medical Image Segmentation with Morphology-Aware Adaptive Loss : Abstract: Medical image segmentation requires balancing local precision for boundary-critical clinical applications, global context for anatomical coherence, and computational efficiency for deploymen...
- VReID-XFD: Video-based Person Re-identification at Extreme Far Distance Challenge Results : Abstract: Person re-identification (ReID) across aerial and ground views at extreme far distances introduces a distinct operating regime where severe resolution degradation, extreme viewpoint changes,...
- Achieving Fine-grained Cross-modal Understanding through Brain-inspired Hierarchical Representation Learning : Abstract: Understanding neural responses to visual stimuli remains challenging due to the inherent complexity of brain representations and the modality gap between neural data and visual inputs. Exist...
- Advanced Machine Learning Approaches for Enhancing Person Re-Identification Performance : Abstract: Person re-identification (ReID) plays a critical role in intelligent surveillance systems by linking identities across multiple cameras in complex environments. However, ReID faces significa...
- Garment Inertial Denoiser (GID): Endowing Accurate Motion Capture via Loose IMU Denoiser : Abstract: Wearable inertial motion capture (MoCap) provides a portable, occlusion-free, and privacy-preserving alternative to camera-based systems, but its accuracy depends on tightly attached sensors...
- Unsupervised SE(3) Disentanglement for in situ Macromolecular Morphology Identification from Cryo-Electron Tomography : Abstract: Cryo-electron tomography (cryo-ET) provides direct 3D visualization of macromolecules inside the cell, enabling analysis of their in situ morphology. This morphology can be regarded as an SE...
- Evaluation of Convolutional Neural Network For Image Classification with Agricultural and Urban Datasets : Abstract: This paper presents the development and evaluation of a custom Convolutional Neural Network (CustomCNN) created to study how architectural design choices affect multi-domain image classifica...
- Mask-Guided Multi-Task Network for Face Attribute Recognition : Abstract: Face Attribute Recognition (FAR) plays a crucial role in applications such as person re-identification, face retrieval, and face editing. Conventional multi-task attribute recognition method...
- AirSpatialBot: A Spatially-Aware Aerial Agent for Fine-Grained Vehicle Attribute Recognization and Retrieval : Abstract: Despite notable advancements in remote sensing vision-language models (VLMs), existing models often struggle with spatial understanding, limiting their effectiveness in real-world applicatio...
- DreamID-V:Bridging the Image-to-Video Gap for High-Fidelity Face Swapping via Diffusion Transformer : Abstract: Video Face Swapping (VFS) requires seamlessly injecting a source identity into a target video while meticulously preserving the original pose, expression, lighting, background, and dynamic i...
- EdgeNeRF: Edge-Guided Regularization for Neural Radiance Fields from Sparse Views : Abstract: Neural Radiance Fields (NeRF) achieve remarkable performance in dense multi-view scenarios, but their reconstruction quality degrades significantly under sparse inputs due to geometric artif...
- In defense of the two-stage framework for open-set domain adaptive semantic segmentation : Abstract: Open-Set Domain Adaptation for Semantic Segmentation (OSDA-SS) presents a significant challenge, as it requires both domain adaptation for known classes and the distinction of unknowns. Exis...
- PartImageNet++ Dataset: Enhancing Visual Models with High-Quality Part Annotations : Abstract: To address the scarcity of high-quality part annotations in existing datasets, we introduce PartImageNet++ (PIN++), a dataset that provides detailed part annotations for all categories in Im...
- Language as Prior, Vision as Calibration: Metric Scale Recovery for Monocular Depth Estimation : Abstract: Relative-depth foundation models transfer well, yet monocular metric depth remains ill-posed due to unidentifiable global scale and heightened domain-shift sensitivity. Under a frozen-backbo...
- Domain Adaptation of Carotid Ultrasound Images using Generative Adversarial Network : Abstract: Deep learning has been extensively used in medical imaging applications, assuming that the test and training datasets belong to the same probability distribution. However, a common challenge...
- Robust Ship Detection and Tracking Using Modified ViBe and Backwash Cancellation Algorithm : Abstract: In this paper, we propose a robust real time detection and tracking method for detecting ships in a coastal video sequences. Since coastal scenarios are unpredictable and scenes have dynamic...
- Unified Generation and Self-Verification for Vision-Language Models via Advantage Decoupled Preference Optimization : Abstract: Parallel test-time scaling typically trains separate generation and verification models, incurring high training and inference costs. We propose Advantage Decoupled Preference Optimization (...
- Higher-Order Domain Generalization in Magnetic Resonance-Based Assessment of Alzheimer's Disease : Abstract: Despite progress in deep learning for Alzheimer's disease (AD) diagnostics, models trained on structural magnetic resonance imaging (sMRI) often do not perform well when applied to new cohor...
- DiffKD-DCIS: Predicting Upgrade of Ductal Carcinoma In Situ with Diffusion Augmentation and Knowledge Distillation : Abstract: Accurately predicting the upgrade of ductal carcinoma in situ (DCIS) to invasive ductal carcinoma (IDC) is crucial for surgical planning. However, traditional deep learning methods face chal...
- BARE: Towards Bias-Aware and Reasoning-Enhanced One-Tower Visual Grounding : Abstract: Visual Grounding (VG), which aims to locate a specific region referred to by expressions, is a fundamental yet challenging task in the multimodal understanding fields. While recent grounding...
- Improving Flexible Image Tokenizers for Autoregressive Image Generation : Abstract: Flexible image tokenizers aim to represent an image using an ordered 1D variable-length token sequence. This flexible tokenization is typically achieved through nested dropout, where a porti...
- FAR-AMTN: Attention Multi-Task Network for Face Attribute Recognition : Abstract: To enhance the generalization performance of Multi-Task Networks (MTN) in Face Attribute Recognition (FAR), it is crucial to share relevant information across multiple related prediction tas...
- Beyond Patches: Global-aware Autoregressive Model for Multimodal Few-Shot Font Generation : Abstract: Manual font design is an intricate process that transforms a stylistic visual concept into a coherent glyph set. This challenge persists in automated Few-shot Font Generation (FFG), where mo...
- Guiding Token-Sparse Diffusion Models : Abstract: Diffusion models deliver high quality in image synthesis but remain expensive during training and inference. Recent works have leveraged the inherent redundancy in visual content to make tra...
- CAP-IQA: Context-Aware Prompt-Guided CT Image Quality Assessment : Abstract: Prompt-based methods, which encode medical priors through descriptive text, have been only minimally explored for CT Image Quality Assessment (IQA). While such prompts can embed prior knowle...
- An Empirical Study of Monocular Human Body Measurement Under Weak Calibration : Abstract: Estimating human body measurements from monocular RGB imagery remains challenging due to scale ambiguity, viewpoint sensitivity, and the absence of explicit depth information. This work pres...
- Animated 3DGS Avatars in Diverse Scenes with Consistent Lighting and Shadows : Abstract: We present a method for consistent lighting and shadows when animated 3D Gaussian Splatting (3DGS) avatars interact with 3DGS scenes or with dynamic objects inserted into otherwise static sc...
- LabelAny3D: Label Any Object 3D in the Wild : Abstract: Detecting objects in 3D space from monocular input is crucial for applications ranging from robotics to scene understanding. Despite advanced performance in the indoor and autonomous driving...
- Trustworthy Data-Driven Wildfire Risk Prediction and Understanding in Western Canada : Abstract: In recent decades, the intensification of wildfire activity in western Canada has resulted in substantial socio-economic and environmental losses. Accurate wildfire risk prediction is hinder...
- Evaluating Deep Learning-Based Face Recognition for Infants and Toddlers: Impact of Age Across Developmental Stages : Abstract: Face recognition for infants and toddlers presents unique challenges due to rapid facial morphology changes, high inter-class similarity, and limited dataset availability. This study evaluat...
- Mitigating Longitudinal Performance Degradation in Child Face Recognition Using Synthetic Data : Abstract: Longitudinal face recognition in children remains challenging due to rapid and nonlinear facial growth, which causes template drift and increasing verification errors over time. This work in...
- Learnability-Driven Submodular Optimization for Active Roadside 3D Detection : Abstract: Roadside perception datasets are typically constructed via cooperative labeling between synchronized vehicle and roadside frame pairs. However, real deployment often requires annotation of r...
- Real-Time Lane Detection via Efficient Feature Alignment and Covariance Optimization for Low-Power Embedded Systems : Abstract: Real-time lane detection in embedded systems encounters significant challenges due to subtle and sparse visual signals in RGB images, often constrained by limited computational resources and...
- FFP-300K: Scaling First-Frame Propagation for Generalizable Video Editing : Abstract: First-Frame Propagation (FFP) offers a promising paradigm for controllable video editing, but existing methods are hampered by a reliance on cumbersome run-time guidance. We identify the roo...
- Point-SRA: Self-Representation Alignment for 3D Representation Learning : Abstract: Masked autoencoders (MAE) have become a dominant paradigm in 3D representation learning, setting new performance benchmarks across various downstream tasks. Existing methods with fixed mask ...
- MANGO:Natural Multi-speaker 3D Talking Head Generation via 2D-Lifted Enhancement : Abstract: Current audio-driven 3D head generation methods mainly focus on single-speaker scenarios, lacking natural, bidirectional listen-and-speak interaction. Achieving seamless conversational behav...
- CTIS-QA: Clinical Template-Informed Slide-level Question Answering for Pathology : Abstract: In this paper, we introduce a clinical diagnosis template-based pipeline to systematically collect and structure pathological information. In collaboration with pathologists and guided by th...
- DDNet: A Dual-Stream Graph Learning and Disentanglement Framework for Temporal Forgery Localization : Abstract: The rapid evolution of AIGC technology enables misleading viewers by tampering mere small segments within a video, rendering video-level detection inaccurate and unpersuasive. Consequently, ...
- Causality-Aware Temporal Projection for Video Understanding in Video-LLMs : Abstract: Recent Video Large Language Models (Video-LLMs) have shown strong multimodal reasoning capabilities, yet remain challenged by video understanding tasks that require consistent temporal order...
- Robust Egocentric Visual Attention Prediction Through Language-guided Scene Context-aware Learning : Abstract: As the demand for analyzing egocentric videos grows, egocentric visual attention prediction, anticipating where a camera wearer will attend, has garnered increasing attention. However, it re...
- ESGaussianFace: Emotional and Stylized Audio-Driven Facial Animation via 3D Gaussian Splatting : Abstract: Most current audio-driven facial animation research primarily focuses on generating videos with neutral emotions. While some studies have addressed the generation of facial videos driven by ...
- GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection : Abstract: Feature-based anomaly detection is widely adopted in industrial inspection due to the strong representational power of large pre-trained vision encoders. While most existing methods focus on...
- RRNet: Configurable Real-Time Video Enhancement with Arbitrary Local Lighting Variations : Abstract: With the growing demand for real-time video enhancement in live applications, existing methods often struggle to balance speed and effective exposure control, particularly under uneven light...
- Entity-Guided Multi-Task Learning for Infrared and Visible Image Fusion : Abstract: Existing text-driven infrared and visible image fusion approaches often rely on textual information at the sentence level, which can lead to semantic noise from redundant text and fail to fu...
- Agentic AI in Remote Sensing: Foundations, Taxonomy, and Emerging Systems : Abstract: The paradigm of Earth Observation analysis is shifting from static deep learning models to autonomous agentic AI. Although recent vision foundation models and multimodal large language model...
- Learning Action Hierarchies via Hybrid Geometric Diffusion : Abstract: Temporal action segmentation is a critical task in video understanding, where the goal is to assign action labels to each frame in a video. While recent advances leverage iterative refinemen...
- TalkPhoto: A Versatile Training-Free Conversational Assistant for Intelligent Image Editing : Abstract: Thanks to the powerful language comprehension capabilities of Large Language Models (LLMs), existing instruction-based image editing methods have introduced Multimodal Large Language Models ...
- AR-MOT: Autoregressive Multi-object Tracking : Abstract: As multi-object tracking (MOT) tasks continue to evolve toward more general and multi-modal scenarios, the rigid and task-specific architectures of existing MOT methods increasingly hinder t...
- MacVQA: Adaptive Memory Allocation and Global Noise Filtering for Continual Visual Question Answering : Abstract: Visual Question Answering (VQA) requires models to reason over multimodal information, combining visual and textual data. With the development of continual learning, significant progress has...
- Face Normal Estimation from Rags to Riches : Abstract: Although recent approaches to face normal estimation have achieved promising results, their effectiveness heavily depends on large-scale paired data for training. This paper concentrates on ...
- MotionAdapter: Video Motion Transfer via Content-Aware Attention Customization : Abstract: Recent advances in diffusion-based text-to-video models, particularly those built on the diffusion transformer architecture, have achieved remarkable progress in generating high-quality and ...
- AFTER: Mitigating the Object Hallucination of LVLM via Adaptive Factual-Guided Activation Editing : Abstract: Large Vision-Language Models (LVLMs) have achieved substantial progress in cross-modal tasks. However, due to language bias, LVLMs are susceptible to object hallucination, which can be prima...
- Thinking with Blueprints: Assisting Vision-Language Models in Spatial Reasoning via Structured Object Representation : Abstract: Spatial reasoning -- the ability to perceive and reason about relationships in space -- advances vision-language models (VLMs) from visual perception toward spatial semantic understanding. E...
- API: Empowering Generalizable Real-World Image Dehazing via Adaptive Patch Importance Learning : Abstract: Real-world image dehazing is a fundamental yet challenging task in low-level vision. Existing learning-based methods often suffer from significant performance degradation when applied to com...
- Nighttime Hazy Image Enhancement via Progressively and Mutually Reinforcing Night-Haze Priors : Abstract: Enhancing the visibility of nighttime hazy images is challenging due to the complex degradation distributions. Existing methods mainly address a single type of degradation (e.g., haze or low...
- Towards Any-Quality Image Segmentation via Generative and Adaptive Latent Space Enhancement : Abstract: Segment Anything Models (SAMs), known for their exceptional zero-shot segmentation performance, have garnered significant attention in the research community. Nevertheless, their performance...
- Adapting Depth Anything to Adverse Imaging Conditions with Events : Abstract: Robust depth estimation under dynamic and adverse lighting conditions is essential for robotic systems. Currently, depth foundation models, such as Depth Anything, achieve great success in i...
- Leveraging 2D-VLM for Label-Free 3D Segmentation in Large-Scale Outdoor Scene Understanding : Abstract: This paper presents a novel 3D semantic segmentation method for large-scale point cloud data that does not require annotated 3D training data or paired RGB images. The proposed approach proj...
- AlignVTOFF: Texture-Spatial Feature Alignment for High-Fidelity Virtual Try-Off : Abstract: Virtual Try-Off (VTOFF) is a challenging multimodal image generation task that aims to synthesize high-fidelity flat-lay garments under complex geometric deformation and rich high-frequency ...
- PhysSFI-Net: Physics-informed Geometric Learning of Skeletal and Facial Interactions for Orthognathic Surgical Outcome Prediction : Abstract: Orthognathic surgery repositions jaw bones to restore occlusion and enhance facial aesthetics. Accurate simulation of postoperative facial morphology is essential for preoperative planning. ...
- MCD-Net: A Lightweight Deep Learning Baseline for Optical-Only Moraine Segmentation : Abstract: Glacial segmentation is essential for reconstructing past glacier dynamics and evaluating climate-driven landscape change. However, weak optical contrast and the limited availability of high...
- InpaintHuman: Reconstructing Occluded Humans with Multi-Scale UV Mapping and Identity-Preserving Diffusion Inpainting : Abstract: Reconstructing complete and animatable 3D human avatars from monocular videos remains challenging, particularly under severe occlusions. While 3D Gaussian Splatting has enabled photorealisti...
- 360-GeoGS: Geometrically Consistent Feed-Forward 3D Gaussian Splatting Reconstruction for 360 Images : Abstract: 3D scene reconstruction is fundamental for spatial intelligence applications such as AR, robotics, and digital twins. Traditional multi-view stereo struggles with sparse viewpoints or low-te...
- HeadLighter: Disentangling Illumination in Generative 3D Gaussian Heads via Lightstage Captures : Abstract: Recent 3D-aware head generative models based on 3D Gaussian Splatting achieve real-time, photorealistic and view-consistent head synthesis. However, a fundamental limitation persists: the de...
- MagicFight: Personalized Martial Arts Combat Video Generation : Abstract: Amid the surge in generic text-to-video generation, the field of personalized human video generation has witnessed notable advancements, primarily concentrated on single-person scenarios. Ho...
- Beyond Segmentation: An Oil Spill Change Detection Framework Using Synthetic SAR Imagery : Abstract: Marine oil spills are urgent environmental hazards that demand rapid and reliable detection to minimise ecological and economic damage. While Synthetic Aperture Radar (SAR) imagery has becom...
- Efficient Unrolled Networks for Large-Scale 3D Inverse Problems : Abstract: Deep learning-based methods have revolutionized the field of imaging inverse problems, yielding state-of-the-art performance across various imaging domains. The best performing networks inco...
- Why Commodity WiFi Sensors Fail at Multi-Person Gait Identification: A Systematic Analysis Using ESP32 : Abstract: WiFi Channel State Information (CSI) has shown promise for single-person gait identification, with numerous studies reporting high accuracy. However, multi-person identification remains larg...
- Parameter-Efficient Domain Adaption for CSI Crowd-Counting via Self-Supervised Learning with Adapter Modules : Abstract: Device-free crowd-counting using WiFi Channel State Information (CSI) is a key enabling technology for a new generation of privacy-preserving Internet of Things (IoT) applications. However, ...
- Unraveling MMDiT Blocks: Training-free Analysis and Enhancement of Text-conditioned Diffusion : Abstract: Recent breakthroughs of transformer-based diffusion models, particularly with Multimodal Diffusion Transformers (MMDiT) driven models like FLUX and Qwen Image, have facilitated thrilling exp...
- Prior-Guided DETR for Ultrasound Nodule Detection : Abstract: Accurate detection of ultrasound nodules is essential for the early diagnosis and treatment of thyroid and breast cancers. However, this task remains challenging due to irregular nodule shap...
- CSCBench: A PVC Diagnostic Benchmark for Commodity Supply Chain Reasoning : Abstract: Large Language Models (LLMs) have achieved remarkable success in general benchmarks, yet their competence in commodity supply chains (CSCs) -- a domain governed by institutional rule systems...
- Towards Automated Lexicography: Generating and Evaluating Definitions for Learner's Dictionaries : Abstract: We study dictionary definition generation (DDG), i.e., the generation of non-contextualized definitions for given headwords. Dictionary definitions are an essential resource for learning wor...
- Judging with Personality and Confidence: A Study on Personality-Conditioned LLM Relevance Assessment : Abstract: Recent studies have shown that prompting can enable large language models (LLMs) to simulate specific personality traits and produce behaviors that align with those traits. However, there is...
- DermoGPT: Open Weights and Open Data for Morphology-Grounded Dermatological Reasoning MLLMs : Abstract: Multimodal Large Language Models (MLLMs) show promise for medical applications, yet progress in dermatology lags due to limited training data, narrow task coverage, and lack of clinically-gr...
- Agentic Memory: Learning Unified Long-Term and Short-Term Memory Management for Large Language Model Agents : Abstract: Large language model (LLM) agents face fundamental limitations in long-horizon reasoning due to finite context windows, making effective memory management critical. Existing methods typicall...
- CSF: Contrastive Semantic Features for Direct Multilingual Sign Language Generation : Abstract: Sign language translation systems typically require English as an intermediary language, creating barriers for non-English speakers in the global deaf community. We present Canonical Semanti...
- Hidden State Poisoning Attacks against Mamba-based Language Models : Abstract: State space models (SSMs) like Mamba offer efficient alternatives to Transformer-based language models, with linear time complexity. Yet, their adversarial robustness remains critically unex...
- Towards Multi-Level Transcript Segmentation: LoRA Fine-Tuning for Table-of-Contents Generation : Abstract: Segmenting speech transcripts into thematic sections benefits both downstream processing and users who depend on written text for accessibility. We introduce a novel approach to hierarchical...
- Confidence Estimation for LLMs in Multi-turn Interactions : Abstract: While confidence estimation is a promising direction for mitigating hallucinations in Large Language Models (LLMs), current research dominantly focuses on single-turn settings. The dynamics ...
- Toward Global Large Language Models in Medicine : Abstract: Despite continuous advances in medical technology, the global distribution of health care resources remains uneven. The development of large language models (LLMs) has transformed the landsc...
- ARCADE: A City-Scale Corpus for Fine-Grained Arabic Dialect Tagging : Abstract: The Arabic language is characterized by a rich tapestry of regional dialects that differ substantially in phonetics and lexicon, reflecting the geographic and cultural diversity of its speak...
- From XAI to Stories: A Factorial Study of LLM-Generated Explanation Quality : Abstract: Explainable AI (XAI) methods like SHAP and LIME produce numerical feature attributions that remain inaccessible to non expert users. Prior work has shown that Large Language Models (LLMs) ca...
- CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models : Abstract: Autoregressive large language models achieve strong results on many benchmarks, but decoding remains fundamentally latency-limited by sequential dependence on previously generated tokens. Di...
- Power-of-Two Quantization-Aware-Training (PoT-QAT) in Large Language Models (LLMs) : Abstract: In Large Language Models (LLMs), the number of parameters has grown exponentially in the past few years, e.g., from 1.5 billion parameters in GPT-2 to 175 billion in GPT-3 to possibly more t...
- Classifying several dialectal Nawatl varieties : Abstract: Mexico is a country with a large number of indigenous languages, among which the most widely spoken is Nawatl, with more than two million people currently speaking it (mainly in North and Ce...
- Estimating Text Temperature : Abstract: Autoregressive language models typically use temperature parameter at inference to shape the probability distribution and control the randomness of the text generated. After the text was gen...
- Robust Persona-Aware Toxicity Detection with Prompt Optimization and Learned Ensembling : Abstract: Toxicity detection is inherently subjective, shaped by the diverse perspectives and social priors of different demographic groups. While ``pluralistic'' modeling as used in economics and the...
- 600k-ks-ocr: a large-scale synthetic dataset for optical character recognition in kashmiri script : Abstract: This technical report presents the 600K-KS-OCR Dataset, a large-scale synthetic corpus comprising approximately 602,000 word-level segmented images designed for training and evaluating optic...
- Entity-Aware and Secure Query Optimization in Database Using Named Entity Recognition : Abstract: Cloud storage has become the backbone of modern data infrastructure, yet privacy and efficient data retrieval remain significant challenges. Traditional privacy-preserving approaches primari...
- SAFE-QAQ: End-to-End Slow-Thinking Audio-Text Fraud Detection via Reinforcement Learning : Abstract: Existing fraud detection methods predominantly rely on transcribed text, suffering from ASR errors and missing crucial acoustic cues like vocal tone and environmental context. This limits th...
- SWE-Lego: Pushing the Limits of Supervised Fine-tuning for Software Issue Resolving : Abstract: We present SWE-Lego, a supervised fine-tuning (SFT) recipe designed to achieve state-ofthe-art performance in software engineering (SWE) issue resolving. In contrast to prevalent methods tha...
- The Gray Area: Characterizing Moderator Disagreement on Reddit : Abstract: Volunteer moderators play a crucial role in sustaining online dialogue, but they often disagree about what should or should not be allowed. In this paper, we study the complexity of content ...
- LACONIC: Dense-Level Effectiveness for Scalable Sparse Retrieval via a Two-Phase Training Curriculum : Abstract: While dense retrieval models have become the standard for state-of-the-art information retrieval, their deployment is often constrained by high memory requirements and reliance on GPU accele...
- Context-aware Decoding Reduces Hallucination in Query-focused Summarization : Abstract: Query-focused summarization (QFS) aims to provide a summary of a single document/multi documents that can satisfy the information needs of a given query. It is useful for various real-world ...
- RiTeK: A Dataset for Large Language Models Complex Reasoning over Textual Knowledge Graphs in Medicine : Abstract: Answering complex real-world questions in the medical domain often requires accurate retrieval from medical Textual Knowledge Graphs (medical TKGs), as the relational path information from T...
- RankMamba: Benchmarking Mamba's Document Ranking Performance in the Era of Transformers : Abstract: Transformer structure has achieved great success in multiple applied machine learning communities, such as natural language processing (NLP), computer vision (CV) and information retrieval (...
- From Bench to Bedside: A Review of Clinical Trials in Drug Discovery and Development : Abstract: Clinical trials are an indispensable part of the drug development process, bridging the gap between basic research and clinical application. During the development of new drugs, clinical tri...
- Can Generative Models Actually Forge Realistic Identity Documents? : Abstract: Generative image models have recently shown significant progress in image realism, leading to public concerns about their potential misuse for document forgery. This paper explores whether c...
- Unified Review and Benchmark of Deep Segmentation Architectures for Cardiac Ultrasound on CAMUS : Abstract: Several review papers summarize cardiac imaging and DL advances, few works connect this overview to a unified and reproducible experimental benchmark. In this study, we combine a focused rev...
- Motion-Compensated Latent Semantic Canvases for Visual Situational Awareness on Edge : Abstract: We propose Motion-Compensated Latent Semantic Canvases (MCLSC) for visual situational awareness on resource-constrained edge devices. The core idea is to maintain persistent semantic metadat...
- VL-OrdinalFormer: Vision Language Guided Ordinal Transformers for Interpretable Knee Osteoarthritis Grading : Abstract: Knee osteoarthritis (KOA) is a leading cause of disability worldwide, and accurate severity assessment using the Kellgren Lawrence (KL) grading system is critical for clinical decision makin...
- VideoCuRL: Video Curriculum Reinforcement Learning with Orthogonal Difficulty Decomposition : Abstract: Reinforcement Learning (RL) is crucial for empowering VideoLLMs with complex spatiotemporal reasoning. However, current RL paradigms predominantly rely on random data shuffling or naive curr...
- Comparative Evaluation of CNN Architectures for Neural Style Transfer in Indonesian Batik Motif Generation: A Comprehensive Study : Abstract: Neural Style Transfer (NST) provides a computational framework for the digital preservation and generative exploration of Indonesian batik motifs; however, existing approaches remain largely...
- Four-Stage Alzheimer's Disease Classification from MRI Using Topological Feature Extraction, Feature Selection, and Ensemble Learning : Abstract: Accurate and efficient classification of Alzheimer's disease (AD) severity from brain magnetic resonance imaging (MRI) remains a critical challenge, particularly when limited data and model ...
- ShadowGS: Shadow-Aware 3D Gaussian Splatting for Satellite Imagery : Abstract: 3D Gaussian Splatting (3DGS) has emerged as a novel paradigm for 3D reconstruction from satellite imagery. However, in multi-temporal satellite images, prevalent shadows exhibit significant ...
- Learning to Segment Liquids in Real-world Images : Abstract: Different types of liquids such as water, wine and medicine appear in all aspects of daily life. However, limited attention has been given to the task, hindering the ability of robots to avo...
- PhyEduVideo: A Benchmark for Evaluating Text-to-Video Models for Physics Education : Abstract: Generative AI models, particularly Text-to-Video (T2V) systems, offer a promising avenue for transforming science education by automating the creation of engaging and intuitive visual explan...
- A Deep Learning Approach for Automated Skin Lesion Diagnosis with Explainable AI : Abstract: Skin cancer is also one of the most common and dangerous types of cancer in the world that requires timely and precise diagnosis. In this paper, a deep-learning architecture of the multi-cla...
- Few-Shot Video Object Segmentation in X-Ray Angiography Using Local Matching and Spatio-Temporal Consistency Loss : Abstract: We introduce a novel FSVOS model that employs a local matching strategy to restrict the search space to the most relevant neighboring pixels. Rather than relying on inefficient standard im2c...
- UnrealPose: Leveraging Game Engine Kinematics for Large-Scale Synthetic Human Pose Data : Abstract: Diverse, accurately labeled 3D human pose data is expensive and studio-bound, while in-the-wild datasets lack known ground truth. We introduce UnrealPose-Gen, an Unreal Engine 5 pipeline bui...
- DVGBench: Implicit-to-Explicit Visual Grounding Benchmark in UAV Imagery with Large Vision-Language Models : Abstract: Remote sensing (RS) large vision-language models (LVLMs) have shown strong promise across visual grounding (VG) tasks. However, existing RS VG datasets predominantly rely on explicit referri...
- Lightweight Channel Attention for Efficient CNNs : Abstract: Attention mechanisms have become integral to modern convolutional neural networks (CNNs), delivering notable performance improvements with minimal computational overhead. However, the effici...
- Mono3DV: Monocular 3D Object Detection with 3D-Aware Bipartite Matching and Variational Query DeNoising : Abstract: While DETR-like architectures have demonstrated significant potential for monocular 3D object detection, they are often hindered by a critical limitation: the exclusion of 3D attributes from...
- Deepfake Detection with Multi-Artifact Subspace Fine-Tuning and Selective Layer Masking : Abstract: Deepfake detection still faces significant challenges in cross-dataset and real-world complex scenarios. The root cause lies in the high diversity of artifact distributions introduced by dif...
- Efficient Hyperspectral Image Reconstruction Using Lightweight Separate Spectral Transformers : Abstract: Hyperspectral imaging (HSI) is essential across various disciplines for its capacity to capture rich spectral information. However, efficiently reconstructing hyperspectral images from compr...
- A UAV-Based Multispectral and RGB Dataset for Multi-Stage Paddy Crop Monitoring in Indian Agricultural Fields : Abstract: We present a large-scale unmanned aerial vehicle (UAV)-based RGB and multispectral image dataset collected over paddy fields in the Vijayawada region, Andhra Pradesh, India, covering nursery...
- Histogram Assisted Quality Aware Generative Model for Resolution Invariant NIR Image Colorization : Abstract: We present HAQAGen, a unified generative model for resolution-invariant NIR-to-RGB colorization that balances chromatic realism with structural fidelity. The proposed model introduces (i) a ...
- ManiBox: Enhancing Embodied Spatial Generalization via Scalable Simulation Data Generations : Abstract: Embodied agents require robust spatial intelligence to execute precise real-world manipulations. However, this remains a significant challenge, as current methods often struggle to accuratel...
- Dynamic Graph Neural Networks for Physiological Based Pharmacokinetic Modeling: A Novel Data Driven Approach to Drug Concentration Prediction : Abstract: Physiologically Based Pharmacokinetic (PBPK) modeling is a key tool in drug development for predicting drug concentration dynamics across organs. Traditional PBPK approaches rely on ordinary...
- Effects of algorithmic flagging on fairness: quasi-experimental evidence from Wikipedia : Abstract: Online community moderators often rely on social signals such as whether or not a user has an account or a profile page as clues that users may cause problems. Reliance on these clues can le...
- ETDock: A Novel Equivariant Transformer for Protein-Ligand Docking : Abstract: Predicting the docking between proteins and ligands is a crucial and challenging task for drug discovery. However, traditional docking methods mainly rely on scoring functions, and deep lear...
- Development of a high-resolution indoor radon map using a new machine learning-based probabilistic model and German radon survey data : Abstract: Accurate knowledge of indoor radon concentration is crucial for assessing radon-related health effects or identifying radon-prone areas. Indoor radon concentration at the national scale is u...
- Matrix Manifold Neural Networks++ : Abstract: Deep neural networks (DNNs) on Riemannian manifolds have garnered increasing interest in various applied areas. For instance, DNNs on spherical and hyperbolic manifolds have been designed to...
- On the social bias of speech self-supervised models : Abstract: Self-supervised learning (SSL) speech models have achieved remarkable performance in various tasks, yet the biased outcomes, especially affecting marginalized groups, raise significant conce...
- Design and Scheduling of an AI-based Queueing System : Abstract: To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other j...
- Consistency for Large Neural Networks: Regression and Classification : Abstract: Although overparameterized models have achieved remarkable practical success, their theoretical properties, particularly their generalization behavior, remain incompletely understood. The we...
- Bayesian uncertainty-aware deep learning with noisy labels: Tackling annotation ambiguity in EEG seizure detection : Abstract: Deep learning is advancing EEG processing for automated epileptic seizure detection and onset zone localization, yet its performance relies heavily on high-quality annotated training data. H...
- Causal Multi-fidelity Surrogate Forward and Inverse Models for ICF Implosions : Abstract: Continued progress in inertial confinement fusion (ICF) requires solving inverse problems relating experimental observations to simulation input parameters, followed by design optimization. ...
- Rate-Distortion Analysis of Compressed Query Delegation with Low-Rank Riemannian Updates : Abstract: Bounded-context agents fail when intermediate reasoning exceeds an effective working-memory budget. We study compressed query delegation (CQD): (i) compress a high-dimensional latent reasoni...
- HyperJoin: LLM-augmented Hypergraph Link Prediction for Joinable Table Discovery : Abstract: As a pivotal task in data lake management, joinable table discovery has attracted widespread interest. While existing language model-based methods achieve remarkable performance by combining...
- KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs : Abstract: While LLMs are powerful embedding backbones, their application in training-free settings faces two structural challenges: causal attention restricts early tokens from accessing subsequent co...
- Unsupervised Text Style Transfer for Controllable Intensity : Abstract: Unsupervised Text Style Transfer (UTST) aims to build a system to transfer the stylistic properties of a given text without parallel text pairs. Compared with text transfer between style pol...
- EmoLoom-2B: Fast Base-Model Screening for Emotion Classification and VAD with Lexicon-Weak Supervision and KV-Off Evaluation : Abstract: We introduce EmoLoom-2B, a lightweight and reproducible pipeline that turns small language models under 2B parameters into fast screening candidates for joint emotion classification and Vale...
- Listen, Attend, Understand: a Regularization Technique for Stable E2E Speech Translation Training on High Variance labels : Abstract: End-to-End Speech Translation often shows slower convergence and worse performance when target transcriptions exhibit high variance and semantic ambiguity. We propose Listen, Attend, Underst...
- RoboPhD: Self-Improving Text-to-SQL Through Autonomous Agent Evolution : Abstract: We present RoboPhD, a system where AI agents autonomously conduct research to improve Text-to-SQL performance. RoboPhD implements a closed-loop evolution cycle with two coordinated component...
- KOS-TL (Knowledge Operation System Type Logic) : Abstract: This paper introduces KOS-TL (Knowledge Operation System Type Logic), a novel constructive framework designed to provide a rigorous logical foundation for autonomous and executable knowledge...
- SongSage: A Large Musical Language Model with Lyric Generative Pre-training : Abstract: Large language models have achieved significant success in various domains, yet their understanding of lyric-centric knowledge has not been fully explored. In this work, we first introduce P...
- DHI: Leveraging Diverse Hallucination Induction for Enhanced Contrastive Factuality Control in Large Language Models : Abstract: Large language models (LLMs) frequently produce inaccurate or fabricated information, known as "hallucinations," which compromises their reliability. Existing approaches often train an "Evil...
- Almost Clinical: Linguistic properties of synthetic electronic health records : Abstract: This study evaluates the linguistic and clinical suitability of synthetic electronic health records (EHRs) in the field of mental health. First, we describe the rationale and the methodology...
- Racka: Efficient Hungarian LLM Adaptation on Academic Infrastructure : Abstract: We present Racka, a lightweight, continually pretrained large language model designed to bridge the resource gap between Hungarian and high-resource languages such as English and German. Rac...
- Reasoning Over Recall: Evaluating the Efficacy of Generalist Architectures vs. Specialized Fine-Tunes in RAG-Based Mental Health Dialogue Systems : Abstract: The deployment of Large Language Models (LLMs) in mental health counseling faces the dual challenges of hallucinations and lack of empathy. While the former may be mitigated by RAG (retrieva...
- FC-CONAN: An Exhaustively Paired Dataset for Robust Evaluation of Retrieval Systems : Abstract: Hate speech (HS) is a critical issue in online discourse, and one promising strategy to counter it is through the use of counter-narratives (CNs). Datasets linking HS with CNs are essential ...
- EternalMath: A Living Benchmark of Frontier Mathematics that Evolves with Human Discovery : Abstract: Current evaluations of mathematical reasoning in large language models (LLMs) are dominated by static benchmarks, either derived from competition-style problems or curated through costly exp...
- From Emotion Classification to Emotional Reasoning: Enhancing Emotional Intelligence in Large Language Models : Abstract: This work investigates whether synthetic emotional chain-of-thought data can improve the emotional reasoning abilities of smaller open large language models (LLMs). We design a multi-agent g...
- Bridging the gap: A comparative exploration of Speech-LLM and end-to-end architecture for multilingual conversational ASR : Abstract: The INTERSPEECH 2025 Challenge on Multilingual Conversational Speech Language Models (MLC-SLM) promotes multilingual conversational ASR with large language models (LLMs). Our previous SHNU-m...
- Can Legislation Be Made Machine-Readable in PROLEG? : Abstract: The anticipated positive social impact of regulatory processes requires both the accuracy and efficiency of their application. Modern artificial intelligence technologies, including natural ...
- From Failure to Mastery: Generating Hard Samples for Tool-use Agents : Abstract: The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random samp...
- EmoHarbor: Evaluating Personalized Emotional Support by Simulating the User's Internal World : Abstract: Current evaluation paradigms for emotional support conversations tend to reward generic empathetic responses, yet they fail to assess whether the support is genuinely personalized to users' ...
- HalluZig: Hallucination Detection using Zigzag Persistence : Abstract: The factual reliability of Large Language Models (LLMs) remains a critical barrier to their adoption in high-stakes domains due to their propensity to hallucinate. Current detection methods ...
- Steerability of Instrumental-Convergence Tendencies in LLMs : Abstract: We examine two properties of AI systems: capability (what a system can do) and steerability (how reliably one can shift behavior toward intended outcomes). In our experiments, higher capabil...
- How Does Prefix Matter in Reasoning Model Tuning? : Abstract: Recent alignment studies commonly remove introductory boilerplate phrases from supervised fine-tuning (SFT) datasets. This work challenges that assumption. We hypothesize that safety- and re...
- A Training-Free Large Reasoning Model-based Knowledge Tracing Framework for Unified Prediction and Prescription : Abstract: Knowledge Tracing (KT) aims to estimate a learner's evolving mastery based on interaction histories. Recent studies have explored Large Language Models (LLMs) for KT via autoregressive natur...
- Can LLMs Track Their Output Length? A Dynamic Feedback Mechanism for Precise Length Regulation : Abstract: Precisely controlling the length of generated text is a common requirement in real-world applications. However, despite significant advancements in following human instructions, Large Langua...
- BanglaIPA: Towards Robust Text-to-IPA Transcription with Contextual Rewriting in Bengali : Abstract: Despite its widespread use, Bengali lacks a robust automated International Phonetic Alphabet (IPA) transcription system that effectively supports both standard language and regional dialecta...
- Game of Coding: Coding Theory in the Presence of Rational Adversaries, Motivated by Decentralized Machine Learning : Abstract: Coding theory plays a crucial role in enabling reliable communication, storage, and computation. Classical approaches assume a worst-case adversarial model and ensure error correction and da...
- Heterogeneous Low-Bandwidth Pre-Training of LLMs : Abstract: Pre-training large language models (LLMs) increasingly requires distributed compute, yet bandwidth constraints make it difficult to scale beyond well-provisioned datacenters-especially when ...
- ChronoPlastic Spiking Neural Networks : Abstract: Spiking neural networks (SNNs) offer a biologically grounded and energy-efficient alternative to conventional neural architectures; however, they struggle with long-range temporal dependenci...
- Energy-Efficient Eimeria Parasite Detection Using a Two-Stage Spiking Neural Network Architecture : Abstract: Coccidiosis, a disease caused by the Eimeria parasite, represents a major threat to the poultry and rabbit industries, demanding rapid and accurate diagnostic tools. While deep learning mode...
- Autonomous battery research: Principles of heuristic operando experimentation : Abstract: Unravelling the complex processes governing battery degradation is critical to the energy transition, yet the efficacy of operando characterisation is severely constrained by a lack of Relia...
- Physically-Constrained Autoencoder-Assisted Bayesian Optimization for Refinement of High-Dimensional Defect-Sensitive Single Crystalline Structure : Abstract: Physical properties and functionalities of materials are dictated by global crystal structures as well as local defects. To establish a structure-property relationship, not only the crystall...
- Deep versus Broad Technology Search and the Timing of Innovation Impact : Abstract: This study offers a new perspective on the depth-versus-breadth debate in innovation strategy, by modeling inventive search within dynamic collective knowledge systems, and underscoring the ...
- Towards eco friendly cybersecurity: machine learning based anomaly detection with carbon and energy metrics : Abstract: The rising energy footprint of artificial intelligence has become a measurable component of US data center emissions, yet cybersecurity research seldom considers its environmental cost. This...
- Deep Learning Framework for RNA Inverse Folding with Geometric Structure Potentials : Abstract: RNA's diverse biological functions stem from its structural versatility, yet accurately predicting and designing RNA sequences given a 3D conformation (inverse folding) remains a challenge. ...
- Investigation into U.S. Citizen and Non-Citizen Worker Health Insurance and Employment : Abstract: Socioeconomic integration is a critical dimension of social equity, yet persistent disparities remain in access to health insurance, education, and employment across different demographic gr...
- Noise-Aware and Dynamically Adaptive Federated Defense Framework for SAR Image Target Recognition : Abstract: As a critical application of computational intelligence in remote sensing, deep learning-based synthetic aperture radar (SAR) image target recognition facilitates intelligent perception but ...
- Deep Deterministic Nonlinear ICA via Total Correlation Minimization with Matrix-Based Entropy Functional : Abstract: Blind source separation, particularly through independent component analysis (ICA), is widely utilized across various signal processing domains for disentangling underlying components from o...
- Security Hardening Using FABRIC: Implementing a Unified Compliance Aggregator for Linux Servers : Abstract: This paper presents a unified framework for evaluating Linux security hardening on the FABRIC testbed through aggregation of heterogeneous security auditing tools. We deploy three Ubuntu 22....
- Clean-GS: Semantic Mask-Guided Pruning for 3D Gaussian Splatting : Abstract: 3D Gaussian Splatting produces high-quality scene reconstructions but generates hundreds of thousands of spurious Gaussians (floaters) scattered throughout the environment. These artifacts o...
- Deep Clustering with Associative Memories : Abstract: Deep clustering - joint representation learning and latent space clustering - is a well studied problem especially in computer vision and text processing under the deep learning framework. W...
- Dynamic Accuracy Estimation in a Wi-Fi-based Positioning System : Abstract: The paper presents a concept of a dynamic accuracy estimation method, in which the localization errors are derived based on the measurement results used by the positioning algorithm. The con...
- Evaluating transfer learning strategies for improving dairy cattle body weight prediction in small farms using depth-image and point-cloud data : Abstract: Computer vision provides automated, non-invasive, and scalable tools for monitoring dairy cattle, thereby supporting management, health assessment, and phenotypic data collection. Although t...
- Byzantine-Robust Federated Learning Framework with Post-Quantum Secure Aggregation for Real-Time Threat Intelligence Sharing in Critical IoT Infrastructure : Abstract: The proliferation of Internet of Things devices in critical infrastructure has created unprecedented cybersecurity challenges, necessitating collaborative threat detection mechanisms that pr...
- Fibonacci-Driven Recursive Ensembles: Algorithms, Convergence, and Learning Dynamics : Abstract: This paper develops the algorithmic and dynamical foundations of recursive ensemble learning driven by Fibonacci-type update flows. In contrast with classical boosting Freund and Schapire (...
- NarrativeTrack: Evaluating Video Language Models Beyond the Frame : Abstract: Multimodal large language models (MLLMs) have achieved impressive progress in vision-language reasoning, yet their ability to understand temporally unfolding narratives in videos remains und...
- Neural Networks on Symmetric Spaces of Noncompact Type : Abstract: Recent works have demonstrated promising performances of neural networks on hyperbolic spaces and symmetric positive definite (SPD) manifolds. These spaces belong to a family of Riemannian m...
- Conformal Blindness: A Note on $A$-Cryptic change-points : Abstract: Conformal Test Martingales (CTMs) are a standard method within the Conformal Prediction framework for testing the crucial assumption of data exchangeability by monitoring deviations from uni...
- Gradient-Free Approaches is a Key to an Efficient Interaction with Markovian Stochasticity : Abstract: This paper deals with stochastic optimization problems involving Markovian noise with a zero-order oracle. We present and analyze a novel derivative-free method for solving such problems in ...
- Promptable Foundation Models for SAR Remote Sensing: Adapting the Segment Anything Model for Snow Avalanche Segmentation : Abstract: Remote sensing solutions for avalanche segmentation and mapping are key to supporting risk forecasting and mitigation in mountain regions. Synthetic Aperture Radar (SAR) imagery from Sentine...
- NeuroSSM: Multiscale Differential State-Space Modeling for Context-Aware fMRI Analysis : Abstract: Accurate fMRI analysis requires sensitivity to temporal structure across multiple scales, as BOLD signals encode cognitive processes that emerge from fast transient dynamics to slower, large...
- Evidence Slopes and Effective Dimension in Singular Linear Models : Abstract: Bayesian model selection commonly relies on Laplace approximation or the Bayesian Information Criterion (BIC), which assume that the effective model dimension equals the number of parameters...
- Stochastic Control Methods for Optimization : Abstract: In this work, we investigate a stochastic control framework for global optimization over both finite-dimensional Euclidean spaces and the Wasserstein space of probability measures. In the Eu...
- Making MoE based LLM inference resilient with Tarragon : Abstract: Mixture-of-Experts (MoE) models are increasingly used to serve LLMs at scale, but failures become common as deployment scale grows. Existing systems exhibit poor failure resilience: even a s...
- Concave Certificates: Geometric Framework for Distributionally Robust Risk and Complexity Analysis : Abstract: Distributionally Robust (DR) optimization aims to certify worst-case risk within a Wasserstein uncertainty set. Current certifications typically rely either on global Lipschitz bounds, which...
- AppellateGen: A Benchmark for Appellate Legal Judgment Generation : Abstract: Legal judgment generation is a critical task in legal intelligence. However, existing research in legal judgment generation has predominantly focused on first-instance trials, relying on sta...
- FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness : Abstract: Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing t...
- A New Framework for Explainable Rare Cell Identification in Single-Cell Transcriptomics Data : Abstract: The detection of rare cell types in single-cell transcriptomics data is crucial for elucidating disease pathogenesis and tissue development dynamics. However, a critical gap that persists in...
- Investigating the Multilingual Calibration Effects of Language Model Instruction-Tuning : Abstract: Ensuring that deep learning models are well-calibrated in terms of their predictive uncertainty is essential in maintaining their trustworthiness and reliability, yet despite increasing adva...
- SGD with Dependent Data: Optimal Estimation, Regret, and Inference : Abstract: This work investigates the performance of the final iterate produced by stochastic gradient descent (SGD) under temporally dependent data. We consider two complementary sources of dependence...
- Bayesian Negative Binomial Regression of Afrobeats Chart Persistence : Abstract: Afrobeats songs compete for attention on streaming platforms, where chart visibility can influence both revenue and cultural impact. This paper examines whether collaborations help songs rem...
- LANCET: Neural Intervention via Structural Entropy for Mitigating Faithfulness Hallucinations in LLMs : Abstract: Large Language Models have revolutionized information processing, yet their reliability is severely compromised by faithfulness hallucinations. While current approaches attempt to mitigate t...
- Efficient Cover Construction for Ball Mapper via Accelerated Range Queries : Abstract: Ball Mapper is an widely used tool in topological data analysis for summarizing the structure of high-dimensional data through metric-based coverings and graph representations. A central com...
- Fast Gibbs Sampling on Bayesian Hidden Markov Model with Missing Observations : Abstract: The Hidden Markov Model (HMM) is a widely-used statistical model for handling sequential data. However, the presence of missing observations in real-world datasets often complicates the appl...
- iFlip: Iterative Feedback-driven Counterfactual Example Refinement : Abstract: Counterfactual examples are minimal edits to an input that alter a model's prediction. They are widely employed in explainable AI to probe model behavior and in natural language processing (...
- Segmentation and Processing of German Court Decisions from Open Legal Data : Abstract: The availability of structured legal data is important for advancing Natural Language Processing (NLP) techniques for the German legal system. One of the most widely used datasets, Open Lega...
- Modeling Information Blackouts in Missing Not-At-Random Time Series Data : Abstract: Large-scale traffic forecasting relies on fixed sensor networks that often exhibit blackouts: contiguous intervals of missing measurements caused by detector or communication failures. These...
- Four Quadrants of Difficulty: A Simple Categorisation and its Limits : Abstract: Curriculum Learning (CL) aims to improve the outcome of model training by estimating the difficulty of samples and scheduling them accordingly. In NLP, difficulty is commonly approximated us...
- A Novel Deep Learning Method for Segmenting the Left Ventricle in Cardiac Cine MRI : Abstract: This research aims to develop a novel deep learning network, GBU-Net, utilizing a group-batch-normalized U-Net framework, specifically designed for the precise semantic segmentation of the l...
- Learning Relationship between Quantum Walks and Underdamped Langevin Dynamics : Abstract: Fast computational algorithms are in constant demand, and their development has been driven by advances such as quantum speedup and classical acceleration. This paper intends to study search...
- Identifying recurrent flows in high-dimensional dissipative chaos from low-dimensional embeddings : Abstract: Unstable periodic orbits (UPOs) are the non-chaotic, dynamical building blocks of spatio-temporal chaos, motivating a first-principles based theory for turbulence ever since the discovery of...
- Variance-Reduced Diffusion Sampling via Conditional Score Expectation Identity : Abstract: We introduce and prove a \textbf{Conditional Score Expectation (CSE)} identity: an exact relation for the marginal score of affine diffusion processes that links scores across time via a con...
- Deep Linear Discriminant Analysis Revisited : Abstract: We show that for unconstrained Deep Linear Discriminant Analysis (LDA) classifiers, maximum-likelihood training admits pathological solutions in which class means drift together, covariances...
- Simplex Deep Linear Discriminant Analysis : Abstract: We revisit Deep Linear Discriminant Analysis (Deep LDA) from a likelihood-based perspective. While classical LDA is a simple Gaussian model with linear decision boundaries, attaching an LDA ...
- Hidden costs for inference with deep network on embedded system devices : Abstract: This study evaluates the inference performance of various deep learning models under an embedded system environment. In previous works, Multiply-Accumulate operation is typically used to mea...
- Reinforcement Learning for Option Hedging: Static Implied-Volatility Fit versus Shortfall-Aware Performance : Abstract: We extend the Q-learner in Black-Scholes (QLBS) framework by incorporating risk aversion and trading costs, and propose a novel Replication Learning of Option Pricing (RLOP) approach. Both m...
- Latent Space Element Method : Abstract: How can we build surrogate solvers that train on small domains but scale to larger ones without intrusive access to PDE operators? Inspired by the Data-Driven Finite Element Method (DD-FEM) ...
- Sparse Convex Biclustering : Abstract: Biclustering is an essential unsupervised machine learning technique for simultaneously clustering rows and columns of a data matrix, with widespread applications in genomics, transcriptomic...
- Machine learning modularity : Abstract: Based on a transformer based sequence-to-sequence architecture combined with a dynamic batching algorithm, this work introduces a machine learning framework for automatically simplifying com...
- SRAS: A Lightweight Reinforcement Learning-based Document Selector for Edge-Native RAG Pipelines : Abstract: Retrieval-Augmented Generation (RAG) systems often rely on fixed top-k document selection mechanisms that ignore downstream generation quality and impose computational overheads. We propose ...
- Aspect Extraction from E-Commerce Product and Service Reviews : Abstract: Aspect Extraction (AE) is a key task in Aspect-Based Sentiment Analysis (ABSA), yet it remains difficult to apply in low-resource and code-switched contexts like Taglish, a mix of Tagalog an...
- Random-Matrix-Induced Simplicity Bias in Over-parameterized Variational Quantum Circuits : Abstract: Over-parameterization is commonly used to increase the expressivity of variational quantum circuits (VQCs), yet deeper and more highly parameterized circuits often exhibit poor trainability ...
- SafeLoad: Efficient Admission Control Framework for Identifying Memory-Overloading Queries in Cloud Data Warehouses : Abstract: Memory overload is a common form of resource exhaustion in cloud data warehouses. When database queries fail due to memory overload, it not only wastes critical resources such as CPU time bu...
- Forget Less by Learning from Parents Through Hierarchical Relationships : Abstract: Custom Diffusion Models (CDMs) offer impressive capabilities for personalization in generative modeling, yet they remain vulnerable to catastrophic forgetting when learning new concepts sequ...
- Efficient temporal prediction of compressible flows in irregular domains using Fourier neural operators : Abstract: This paper investigates the temporal evolution of high-speed compressible fluids in irregular flow fields using the Fourier Neural Operator (FNO). We reconstruct the irregular flow field poi...
- Forget Less by Learning Together through Concept Consolidation : Abstract: Custom Diffusion Models (CDMs) have gained significant attention due to their remarkable ability to personalize generative processes. However, existing CDMs suffer from catastrophic forgetti...
- A Multilayered Approach to Classifying Customer Responsiveness and Credit Risk : Abstract: This study evaluates the performance of various classifiers in three distinct models: response, risk, and response-risk, concerning credit card mail campaigns and default prediction. In the ...
- MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics : Abstract: Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming task...
- Car Drag Coefficient Prediction from 3D Point Clouds Using a Slice-Based Surrogate Model : Abstract: The automotive industry's pursuit of enhanced fuel economy and performance necessitates efficient aerodynamic design. However, traditional evaluation methods such as computational fluid dyna...
- Feature-based Inversion of 2.5D Controlled Source Electromagnetic Data using Generative Priors : Abstract: In this study, we investigate feature-based 2.5D controlled source marine electromagnetic (mCSEM) data inversion using generative priors. Two-and-half dimensional modeling using finite diffe...
- QuIC: A Quantum-Inspired Interaction Classifier for Revitalizing Shallow CNNs in Fine-Grained Recognition : Abstract: Deploying deep learning models for Fine-Grained Visual Classification (FGVC) on resource-constrained edge devices remains a significant challenge. While deep architectures achieve high accur...
- Mind the Gap: Continuous Magnification Sampling for Pathology Foundation Models : Abstract: In histopathology, pathologists examine both tissue architecture at low magnification and fine-grained morphology at high magnification. Yet, the performance of pathology foundation models a...
- From Mice to Trains: Amortized Bayesian Inference on Graph Data : Abstract: Graphs arise across diverse domains, from biology and chemistry to social and information networks, as well as in transportation and logistics. Inference on graph-structured data requires me...
- VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation : Abstract: Visual generation is dominated by three paradigms: AutoRegressive (AR), diffusion, and Visual AutoRegressive (VAR) models. Unlike AR and diffusion, VARs operate on heterogeneous input struct...
- Improved Accuracy for Private Continual Cardinality Estimation in Fully Dynamic Streams via Matrix Factorization : Abstract: We study differentially-private statistics in the fully dynamic continual observation model, where many updates can arrive at each time step and updates to a stream can involve both insertio...
- Predicting Early and Complete Drug Release from Long-Acting Injectables Using Explainable Machine Learning : Abstract: Polymer-based long-acting injectables (LAIs) have transformed the treatment of chronic diseases by enabling controlled drug delivery, thus reducing dosing frequency and extending therapeutic...
- Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction : Abstract: Out-of-distribution (OOD) prediction is often approached by restricting models to causal or invariant covariates, avoiding non-causal spurious associations that may be unstable across enviro...
- Hunting for "Oddballs" with Machine Learning: Detecting Anomalous Exoplanets Using a Deep-Learned Low-Dimensional Representation of Transit Spectra with Autoencoders : Abstract: This study explores the application of autoencoder-based machine learning techniques for anomaly detection to identify exoplanet atmospheres with unconventional chemical signatures using a l...
- Meta-Learning Guided Pruning for Few-Shot Plant Pathology on Edge Devices : Abstract: Farmers in remote areas need quick and reliable methods for identifying plant diseases, yet they often lack access to laboratories or high-performance computing resources. Deep learning mode...
- Sample Path Regularity of Gaussian Processes from the Covariance Kernel : Abstract: Gaussian processes (GPs) are the most common formalism for defining probability distributions over spaces of functions. While applications of GPs are myriad, a comprehensive understanding of...
- Stochastic Online Optimization for Cyber-Physical and Robotic Systems : Abstract: We propose a novel gradient-based online optimization framework for solving stochastic programming problems that frequently arise in the context of cyber-physical and robotic systems. Our pr...
- Echo State Networks for Spatio-Temporal Area-Level Data : Abstract: Spatio-temporal area-level datasets play a critical role in official statistics, providing valuable insights for policy-making and regional planning. Accurate modeling and forecasting of the...
- Harvesting AlphaEarth: Benchmarking the Geospatial Foundation Model for Agricultural Downstream Tasks : Abstract: Geospatial foundation models (GFMs) have emerged as a promising approach to overcoming the limitations in existing featurization methods. More recently, Google DeepMind has introduced AlphaE...
- Universal Battery Degradation Forecasting Driven by Foundation Model Across Diverse Chemistries and Conditions : Abstract: Accurate forecasting of battery capacity fade is essential for the safety, reliability, and long-term efficiency of energy storage systems. However, the strong heterogeneity across cell chem...
- Selective Imperfection as a Generative Framework for Analysis, Creativity and Discovery : Abstract: We introduce materiomusic as a generative framework linking the hierarchical structures of matter with the compositional logic of music. Across proteins, spider webs and flame dynamics, vibr...
- Distribution Matching for Graph Quantification Under Structural Covariate Shift : Abstract: Graphs are commonly used in machine learning to model relationships between instances. Consider the task of predicting the political preferences of users in a social network; to solve this t...
- Quantum Machine Learning Approaches for Coordinated Stealth Attack Detection in Distributed Generation Systems : Abstract: Coordinated stealth attacks are a serious cybersecurity threat to distributed generation systems because they modify control and measurement signals while remaining close to normal behavior,...
- Outlier Detection Using Vector Cosine Similarity by Adding a Dimension : Abstract: We propose a new outlier detection method for multi-dimensional data. The method detects outliers based on vector cosine similarity, using a new dataset constructed by adding a dimension wit...
- FANoS: Friction-Adaptive Nos\'e--Hoover Symplectic Momentum for Stiff Objectives : Abstract: We study a physics-inspired optimizer, \emph{FANoS} (Friction-Adaptive Nosé--Hoover Symplectic momentum), which combines (i) a momentum update written as a discretized second-order dynamical...
- Hierarchical topological clustering : Abstract: Topological methods have the potential of exploring data clouds without making assumptions on their the structure. Here we propose a hierarchical topological clustering algorithm that can be...
- When to Ponder: Adaptive Compute Allocation for Code Generation via Test-Time Training : Abstract: Large language models apply uniform computation to all inputs, regardless of difficulty. We propose PonderTTT, a gating strategy using the TTT layer's self-supervised reconstruction loss to ...
- Dichotomous Diffusion Policy Optimization : Abstract: Diffusion-based policies have gained growing popularity in solving a wide range of decision-making tasks due to their superior expressiveness and controllable generation during inference. Ho...
- Latent-Constrained Conditional VAEs for Augmenting Large-Scale Climate Ensembles : Abstract: Large climate-model ensembles are computationally expensive; yet many downstream analyses would benefit from additional, statistically consistent realizations of spatiotemporal climate varia...
- Enhanced Data-Driven Product Development via Gradient Based Optimization and Conformalized Monte Carlo Dropout Uncertainty Estimation : Abstract: Data-Driven Product Development (DDPD) leverages data to learn the relationship between product design specifications and resulting properties. To discover improved designs, we train a neura...
- Reliability Under Randomness: An Empirical Analysis of Sparse and Dense Language Models Across Decoding Temperatures : Abstract: The increasing prevalence of sparse Mixture-of-Experts (MoE) architectures in large language models raises important questions regarding their reliability under stochastic decoding. While co...
- Explainability-Guided Defense: Attribution-Aware Model Refinement Against Adversarial Data Attacks : Abstract: The growing reliance on deep learning models in safety-critical domains such as healthcare and autonomous navigation underscores the need for defenses that are both robust to adversarial per...
- Zero-shot Forecasting by Simulation Alone : Abstract: Zero-shot time-series forecasting holds great promise, but is still in its infancy, hindered by limited and biased data corpora, leakage-prone evaluation, and privacy and licensing constrain...
- Contractive Diffusion Policies: Robust Action Diffusion via Contractive Score-Based Sampling with Differential Equations : Abstract: Diffusion policies have emerged as powerful generative models for offline policy learning, whose sampling process can be rigorously characterized by a score function guiding a Stochastic Dif...
- Expanding the Chaos: Neural Operator for Stochastic (Partial) Differential Equations : Abstract: Stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) are fundamental tools for modeling stochastic dynamics across the natural sciences and modern m...
- Wireless Dataset Similarity: Measuring Distances in Supervised and Unsupervised Machine Learning : Abstract: This paper introduces a task- and model-aware framework for measuring similarity between wireless datasets, enabling applications such as dataset selection/augmentation, simulation-to-real (...
- Coarse-Grained Kullback--Leibler Control of Diffusion-Based Generative AI : Abstract: Diffusion models and score-based generative models provide a powerful framework for synthesizing high-quality images from noise. However, there is still no satisfactory theory that describes...
- Tiny Machine Learning for Real-Time Aquaculture Monitoring: A Case Study in Morocco : Abstract: Aquaculture, the farming of aquatic organisms, is a rapidly growing industry facing challenges such as water quality fluctuations, disease outbreaks, and inefficient feed management. Traditi...
- Revisiting Weighted Strategy for Non-stationary Parametric Bandits and MDPs : Abstract: Non-stationary parametric bandits have attracted much attention recently. There are three principled ways to deal with non-stationarity, including sliding-window, weighted, and restart strat...
- Discount Model Search for Quality Diversity Optimization in High-Dimensional Measure Spaces : Abstract: Quality diversity (QD) optimization searches for a collection of solutions that optimize an objective while attaining diverse outputs of a user-specified, vector-valued measure function. Con...
- Central Dogma Transformer: Towards Mechanism-Oriented AI for Cellular Understanding : Abstract: Understanding cellular mechanisms requires integrating information across DNA, RNA, and protein - the three molecular systems linked by the Central Dogma of molecular biology. While domain-s...
- Community-Based Early-Stage Chronic Kidney Disease Screening using Explainable Machine Learning for Low-Resource Settings : Abstract: Early detection of chronic kidney disease (CKD) is essential for preventing progression to end-stage renal disease. However, existing screening tools - primarily developed using populations ...
- Self-Training the Neurochaos Learning Algorithm : Abstract: In numerous practical applications, acquiring substantial quantities of labelled data is challenging and expensive, but unlabelled data is readily accessible. Conventional supervised learnin...
- Evo-TFS: Evolutionary Time-Frequency Domain-Based Synthetic Minority Oversampling Approach to Imbalanced Time Series Classification : Abstract: Time series classification is a fundamental machine learning task with broad real-world applications. Although many deep learning methods have proven effective in learning time-series data f...
- Sparse Bayesian Message Passing under Structural Uncertainty : Abstract: Semi-supervised learning on real-world graphs is frequently challenged by heterophily, where the observed graph is unreliable or label-disassortative. Many existing graph neural networks eit...
- Adaptive Conformal Prediction via Bayesian Uncertainty Weighting for Hierarchical Healthcare Data : Abstract: Clinical decision-making demands uncertainty quantification that provides both distribution-free coverage guarantees and risk-adaptive precision, requirements that existing methods fail to j...
- The Dependency Divide: An Interpretable Machine Learning Framework for Profiling Student Digital Satisfaction in the Bangladesh Context : Abstract: Background: While digital access has expanded rapidly in resource-constrained contexts, satisfaction with digital learning platforms varies significantly among students with seemingly equal ...
- Accelerated Full Waveform Inversion by Deep Compressed Learning : Abstract: We propose and test a method to reduce the dimensionality of Full Waveform Inversion (FWI) inputs as computational cost mitigation approach. Given modern seismic acquisition systems, the dat...
- The Alchemy of Thought: Understanding In-Context Learning Through Supervised Classification : Abstract: In-context learning (ICL) has become a prominent paradigm to rapidly customize LLMs to new tasks without fine-tuning. However, despite the empirical evidence of its usefulness, we still do n...
- Sobolev Approximation of Deep ReLU Network in Log-weighted Barron Space : Abstract: Universal approximation theorems show that neural networks can approximate any continuous function; however, the number of parameters may grow exponentially with the ambient dimension, so th...
- Towards a Principled Muon under $\mu\mathsf{P}$: Ensuring Spectral Conditions throughout Training : Abstract: The $μ$-parameterization ($μ$P) provides a principled foundation for large language model (LLM) training by prescribing width-independent learning dynamics, which in turn enables predictable...
- Spectral-Window Hybrid (SWH) : Abstract: Scaling sequence modeling to extreme contexts requires balancing computational efficiency with representational expressivity. While Transformers provide precise retrieval via the attention m...
- Towards LLM-enabled autonomous combustion research: A literature-aware agent for self-corrective modeling workflows : Abstract: The rapid evolution of large language models (LLMs) is transforming artificial intelligence into autonomous research partners, yet a critical gap persists in complex scientific domains such ...
- Causal discovery for linear causal model with correlated noise: an Adversarial Learning Approach : Abstract: Causal discovery from data with unmeasured confounding factors is a challenging problem. This paper proposes an approach based on the f-GAN framework, learning the binary causal structure in...
- A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory : Abstract: We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks. We prove a depth hierarchy, wherein width $Ω\big(d^{1+\frac{1}{2^{k-2}-1...
- Unveiling the Heart-Brain Connection: An Analysis of ECG in Cognitive Performance : Abstract: Understanding the interaction of neural and cardiac systems during cognitive activity is critical to advancing physiological computing. Although EEG has been the gold standard for assessing ...
- Leveraging Flatness to Improve Information-Theoretic Generalization Bounds for SGD : Abstract: Information-theoretic (IT) generalization bounds have been used to study the generalization of learning algorithms. These bounds are intrinsically data- and algorithm-dependent so that one c...
- Multi-Subspace Multi-Modal Modeling for Diffusion Models: Estimation, Convergence and Mixture of Experts : Abstract: Recently, diffusion models have achieved a great performance with a small dataset of size $n$ and a fast optimization process. However, the estimation error of diffusion models suffers from ...
- SGD-Based Knowledge Distillation with Bayesian Teachers: Theory and Guidelines : Abstract: Knowledge Distillation (KD) is a central paradigm for transferring knowledge from a large teacher network to a typically smaller student model, often by leveraging soft probabilistic outputs...
- Accelerating Decentralized Optimization via Overlapping Local Steps : Abstract: Decentralized optimization has emerged as a critical paradigm for distributed learning, enabling scalable training while preserving data privacy through peer-to-peer collaboration. However, ...
- Advanced Global Wildfire Activity Modeling with Hierarchical Graph ODE : Abstract: Wildfires, as an integral component of the Earth system, are governed by a complex interplay of atmospheric, oceanic, and terrestrial processes spanning a vast range of spatiotemporal scales...
- Real Time NILM Based Power Monitoring of Identical Induction Motors Representing Cutting Machines in Textile Industry : Abstract: The textile industry in Bangladesh is one of the most energy-intensive sectors, yet its monitoring practices remain largely outdated, resulting in inefficient power usage and high operationa...
- Communication-Efficient Federated AUC Maximization with Cyclic Client Participation : Abstract: Federated AUC maximization is a powerful approach for learning from imbalanced data in federated learning (FL). However, existing methods typically assume full client availability, which is ...
- Who is the Winning Algorithm? Rank Aggregation for Comparative Studies : Abstract: Consider a collection of m competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, w...
- HeurekaBench: A Benchmarking Framework for AI Co-scientist : Abstract: LLM-based reasoning models have enabled the development of agentic systems that act as co-scientists, assisting in multi-step scientific analysis. However, evaluating these systems is challe...
- DiMEx: Breaking the Cold Start Barrier in Data-Free Model Extraction via Latent Diffusion Priors : Abstract: Model stealing attacks pose an existential threat to Machine Learning as a Service (MLaaS), allowing adversaries to replicate proprietary models for a fraction of their training cost. While ...
- Enhanced Multi-model Online Conformal Prediction : Abstract: Conformal prediction is a framework for uncertainty quantification that constructs prediction sets for previously unseen data, guaranteeing coverage of the true label with a specified probab...
- Entropy-Aligned Decoding of LMs for Better Writing and Reasoning : Abstract: Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decod...
- Context-Free Recognition with Transformers : Abstract: Transformers excel on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax....
- UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk : Abstract: The ever-increasing adoption of Large Language Models in critical sectors like finance, healthcare, and government raises privacy concerns regarding the handling of sensitive Personally Iden...
- Distributed Federated Learning by Alternating Periods of Training : Abstract: Federated learning is a privacy-focused approach towards machine learning where models are trained on client devices with locally available data and aggregated at a central server. However, ...
- RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data : Abstract: Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottl...
- FAROS: Robust Federated Learning with Adaptive Scaling against Backdoor Attacks : Abstract: Federated Learning (FL) enables multiple clients to collaboratively train a shared model without exposing local data. However, backdoor attacks pose a significant threat to FL. These attacks...
- Tackling Resource-Constrained and Data-Heterogeneity in Federated Learning with Double-Weight Sparse Pack : Abstract: Federated learning has drawn widespread interest from researchers, yet the data heterogeneity across edge clients remains a key challenge, often degrading model performance. Existing methods...
- High-Order Epistasis Detection Using Factorization Machine with Quadratic Optimization Annealing and MDR-Based Evaluation : Abstract: Detecting high-order epistasis is a fundamental challenge in genetic association studies due to the combinatorial explosion of candidate locus combinations. Although multifactor dimensionali...
- FedBiCross: A Bi-Level Optimization Framework to Tackle Non-IID Challenges in Data-Free One-Shot Federated Learning on Medical Data : Abstract: Data-free knowledge distillation-based one-shot federated learning (OSFL) trains a model in a single communication round without sharing raw data, making OSFL attractive for privacy-sensitiv...
- TT-FSI: Scalable Faithful Shapley Interactions via Tensor-Train : Abstract: The Faithful Shapley Interaction (FSI) index uniquely satisfies the faithfulness axiom among Shapley interaction indices, but computing FSI requires $O(d^\ell \cdot 2^d)$ time and existing i...
- Distorted Distributional Policy Evaluation for Offline Reinforcement Learning : Abstract: While Distributional Reinforcement Learning (DRL) methods have demonstrated strong performance in online settings, its success in offline scenarios remains limited. We hypothesize that a key...
- SynRXN: An Open Benchmark and Curated Dataset for Computational Reaction Modeling : Abstract: We present SynRXN, a unified benchmarking framework and open-data resource for computer-aided synthesis planning (CASP). SynRXN decomposes end-to-end synthesis planning into five task famili...
- SerpentFlow: Generative Unpaired Domain Alignment via Shared-Structure Decomposition : Abstract: Domain alignment refers broadly to learning correspondences between data distributions from distinct domains. In this work, we focus on a setting where domains share underlying structural pa...
- Prior Diffusiveness and Regret in the Linear-Gaussian Bandit : Abstract: We prove that Thompson sampling exhibits $\tilde{O}(σd \sqrt{T} + d r \sqrt{\mathrm{Tr}(Σ_0)})$ Bayesian regret in the linear-Gaussian bandit with a $\mathcal{N}(μ_0, Σ_0)$ prior distributio...
- GDRO: Group-level Reward Post-training Suitable for Diffusion Models : Abstract: Recent advancements adopt online reinforcement learning (RL) from LLMs to text-to-image rectified flow diffusion models for reward alignment. The use of group-level rewards successfully alig...
- Multivariate Time-series Anomaly Detection via Dynamic Model Pool & Ensembling : Abstract: Multivariate time-series (MTS) anomaly detection is critical in domains such as service monitor, IoT, and network security. While multi-model methods based on selection or ensembling outperf...
- Explore the Ideology of Deep Learning in ENSO Forecasts : Abstract: The El Ni{~n}o-Southern Oscillation (ENSO) exerts profound influence on global climate variability, yet its prediction remains a grand challenge. Recent advances in deep learning have signif...
- A Differentiable Adversarial Framework for Task-Aware Data Subsampling : Abstract: The proliferation of large-scale datasets poses a major computational challenge to model training. The traditional data subsampling method works as a static, task independent preprocessing s...
- Horizon Activation Mapping for Neural Networks in Time Series Forecasting : Abstract: Neural networks for time series forecasting have relied on error metrics and architecture-specific interpretability approaches for model selection that don't apply across models of different...
- Prototype-Based Learning for Healthcare: A Demonstration of Interpretable AI : Abstract: Despite recent advances in machine learning and explainable AI, a gap remains in personalized preventive healthcare: predictions, interventions, and recommendations should be both understand...
- Edge-aware GAT-based protein binding site prediction : Abstract: Accurate identification of protein binding sites is crucial for understanding biomolecular interaction mechanisms and for the rational design of drug targets. Traditional predictive methods ...
- Learning with Monotone Adversarial Corruptions : Abstract: We study the extent to which standard machine learning algorithms rely on exchangeability and independence of data by introducing a monotone adversarial corruption model. In this model, an a...
- ACDZero: Graph-Embedding-Based Tree Search for Mastering Automated Cyber Defense : Abstract: Automated cyber defense (ACD) seeks to protect computer networks with minimal or no human intervention, reacting to intrusions by taking corrective actions such as isolating hosts, resetting...
- CORE: Code-based Inverse Self-Training Framework with Graph Expansion for Virtual Agents : Abstract: The development of Multimodal Virtual Agents has made significant progress through the integration of Multimodal Large Language Models. However, mainstream training paradigms face key challe...
- Quantized SO(3)-Equivariant Graph Neural Networks for Efficient Molecular Property Prediction : Abstract: Deploying 3D graph neural networks (GNNs) that are equivariant to 3D rotations (the group SO(3)) on edge devices is challenging due to their high computational cost. This paper addresses the...
- ELLA: Efficient Lifelong Learning for Adapters in Large Language Models : Abstract: Large Language Models (LLMs) suffer severe catastrophic forgetting when adapted sequentially to new tasks in a continual learning (CL) setting. Existing approaches are fundamentally limited:...
- Neuro-Channel Networks: A Multiplication-Free Architecture by Biological Signal Transmission : Abstract: The rapid proliferation of Deep Learning is increasingly constrained by its heavy reliance on high-performance hardware, particularly Graphics Processing Units (GPUs). These specialized acce...
- POSEIDON: Physics-Optimized Seismic Energy Inference and Detection Operating Network : Abstract: Earthquake prediction and seismic hazard assessment remain fundamental challenges in geophysics, with existing machine learning approaches often operating as black boxes that ignore establis...
- Differential Privacy for Transformer Embeddings of Text with Nonparametric Variational Information Bottleneck : Abstract: We propose a privacy-preserving method for sharing text data by sharing noisy versions of their transformer embeddings. It has been shown that hidden representations learned by deep models c...
- Temporal Kolmogorov-Arnold Networks (T-KAN) for High-Frequency Limit Order Book Forecasting: Efficiency, Interpretability, and Alpha Decay : Abstract: High-Frequency trading (HFT) environments are characterised by large volumes of limit order book (LOB) data, which is notoriously noisy and non-linear. Alpha decay represents a significant c...
- On Pitfalls of $\textit{RemOve-And-Retrain}$: Data Processing Inequality Perspective : Abstract: Approaches for appraising feature importance approximations, alternatively referred to as attribution methods, have been established across an extensive array of contexts. The development of...
- Mem-Rec: Memory Efficient Recommendation System using Alternative Representation : Abstract: Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems ...
- GRACE: Discriminator-Guided Chain-of-Thought Reasoning : Abstract: In the context of multi-step reasoning, e.g., with chain-of-thought, language models (LMs) can easily assign a high likelihood to incorrect steps. As a result, decoding strategies that optim...
- HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization : Abstract: Domain Generalization (DG) endeavors to create machine learning models that excel in unseen scenarios by learning invariant features. In DG, the prevalent practice of constraining models to ...
- Beyond Expectations: Learning with Stochastic Dominance Made Practical : Abstract: Stochastic dominance serves as a general framework for modeling a broad spectrum of decision preferences under uncertainty, with risk aversion as one notable example, as it naturally capture...
- Convergence of a L2 regularized Policy Gradient Algorithm for the Multi Armed Bandit : Abstract: Although Multi Armed Bandit (MAB) on one hand and the policy gradient approach on the other hand are among the most used frameworks of Reinforcement Learning, the theoretical properties of t...
- Posets and Bounded Probabilities for Discovering Order-inducing Features in Event Knowledge Graphs : Abstract: Event knowledge graphs (EKG) extend the classical notion of a trace to capture multiple, interacting views of a process execution. In this paper, we tackle the open problem of automating EKG...
- Opportunities and Challenges of Large Language Models for Low-Resource Languages in Humanities Research : Abstract: Low-resource languages serve as invaluable repositories of human history, embodying cultural evolution and intellectual diversity. Despite their significance, these languages face critical c...
- Affordance-Guided Coarse-to-Fine Exploration for Base Placement in Open-Vocabulary Mobile Manipulation : Abstract: In open-vocabulary mobile manipulation (OVMM), task success often hinges on the selection of an appropriate base placement for the robot. Existing approaches typically navigate to proximity-...
- Pedagogical Reflections on the Holistic Cognitive Development (HCD) Framework and AI-Augmented Learning in Creative Computing : Abstract: This paper presents an expanded account of the Holistic Cognitive Development (HCD) framework for reflective and creative learning in computing education. The HCD framework integrates design...
- InfoDecom: Decomposing Information for Defending Against Privacy Leakage in Split Inference : Abstract: Split inference (SI) enables users to access deep learning (DL) services without directly transmitting raw data. However, recent studies reveal that data reconstruction attacks (DRAs) can re...
- Dynamical Mechanisms for Coordinating Long-term Working Memory Based on the Precision of Spike-timing in Cortical Neurons : Abstract: In the last century, most sensorimotor studies of cortical neurons relied on average firing rates. Rate coding is efficient for fast sensorimotor processing that occurs within a few seconds....
- Horizon Reduction as Information Loss in Offline Reinforcement Learning : Abstract: Horizon reduction is a common design strategy in offline reinforcement learning (RL), used to mitigate long-horizon credit assignment, improve stability, and enable scalable learning through...
- ShrimpXNet: A Transfer Learning Framework for Shrimp Disease Classification with Augmented Regularization, Adversarial Training, and Explainable AI : Abstract: Shrimp is one of the most widely consumed aquatic species globally, valued for both its nutritional content and economic importance. Shrimp farming represents a significant source of income ...
- SLO-Conditioned Action Routing for Retrieval-Augmented Generation: Objective Ablation and Failure Modes : Abstract: Retrieval-augmented generation (RAG) introduces a practical control problem: retrieval depth and generation behavior must be chosen per query to satisfy service-level objectives (SLOs) such ...
- You Only Need Your Transformer 25% of the Time: Meaning-First Execution for Eliminating Unnecessary Inference : Abstract: Modern AI inference systems treat transformer execution as mandatory, conflating model capability with execution necessity. We reframe inference as a control-plane decision problem: determin...
- EdgeJury: Cross-Reviewed Small-Model Ensembles for Truthful Question Answering on Serverless Edge Inference : Abstract: Hallucinations hinder reliable question answering, especially in resource-constrained deployments where frontier-scale models or retrieval pipelines may be impractical. We present EdgeJury, ...
- Emergent Introspective Awareness in Large Language Models : Abstract: We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be d...
- ARIES: A Scalable Multi-Agent Orchestration Framework for Real-Time Epidemiological Surveillance and Outbreak Monitoring : Abstract: Global health surveillance is currently facing a challenge of Knowledge Gaps. While general-purpose AI has proliferated, it remains fundamentally unsuited for the high-stakes epidemiological...
- Yukthi Opus: A Multi-Chain Hybrid Metaheuristic for Large-Scale NP-Hard Optimization : Abstract: We present Yukthi Opus (YO), a multi-chain hybrid metaheuristic designed for NP-hard optimization under explicit evaluation budget constraints. YO integrates three complementary mechanisms i...
- RSwinV2-MD: An Enhanced Residual SwinV2 Transformer for Monkeypox Detection from Skin Images : Abstract: In this paper, a deep learning approach for Mpox diagnosis named Customized Residual SwinTransformerV2 (RSwinV2) has been proposed, trying to enhance the capability of lesion classification ...
- The Machine Learning Canvas: Empirical Findings on Why Strategy Matters More Than AI Code Generation : Abstract: Despite the growing popularity of AI coding assistants, over 80% of machine learning (ML) projects fail to deliver real business value. This study creates and tests a Machine Learning Canvas...
- MORE: Multi-Objective Adversarial Attacks on Speech Recognition : Abstract: The emergence of large-scale automatic speech recognition (ASR) models such as Whisper has greatly expanded their adoption across diverse real-world applications. Ensuring robustness against...
- CogFlow: Bridging Perception and Reasoning through Knowledge Internalization for Visual Mathematical Problem Solving : Abstract: Despite significant progress, multimodal large language models continue to struggle with visual mathematical problem solving. Some recent works recognize that visual perception is a bottlene...
- Safety at One Shot: Patching Fine-Tuned LLMs with A Single Instance : Abstract: Fine-tuning safety-aligned large language models (LLMs) can substantially compromise their safety. Previous approaches require many safety samples or calibration sets, which not only incur s...
- Tackling the Inherent Difficulty of Noise Filtering in RAG : Abstract: Retrieval-Augmented Generation (RAG) has become a widely adopted approach to enhance Large Language Models (LLMs) by incorporating external knowledge and reducing hallucinations. However, no...
- Evaluating Feature Dependent Noise in Preference-based Reinforcement Learning : Abstract: Learning from Preferences in Reinforcement Learning (PbRL) has gained attention recently, as it serves as a natural fit for complicated tasks where the reward function is not easily availabl...
- Nodule-DETR: A Novel DETR Architecture with Frequency-Channel Attention for Ultrasound Thyroid Nodule Detection : Abstract: Thyroid cancer is the most common endocrine malignancy, and its incidence is rising globally. While ultrasound is the preferred imaging modality for detecting thyroid nodules, its diagnostic...
- A Defect is Being Born: How Close Are We? A Time Sensitive Forecasting Approach : Abstract: Background. Defect prediction has been a highly active topic among researchers in the Empirical Software Engineering field. Previous literature has successfully achieved the most accurate pr...
- Theoretical Convergence of SMOTE-Generated Samples : Abstract: Imbalanced data affects a wide range of machine learning applications, from healthcare to network security. As SMOTE is one of the most popular approaches to addressing this issue, it is imp...
- MCGI: Manifold-Consistent Graph Indexing for Billion-Scale Disk-Resident Vector Search : Abstract: Graph-based Approximate Nearest Neighbor (ANN) search often suffers from performance degradation in high-dimensional spaces due to the ``Euclidean-Geodesic mismatch,'' where greedy routing d...
- D\'ej\`aQ: Open-Ended Evolution of Diverse, Learnable and Verifiable Problems : Abstract: Recent advances in reasoning models have yielded impressive results in mathematics and coding. However, most approaches rely on static datasets, which have been suggested to encourage memori...
- Visualizing the Structure of Lenia Parameter Space : Abstract: Continuous cellular automata are rocketing in popularity, yet developing a theoretical understanding of their behaviour remains a challenge. In the case of Lenia, a few fundamental open prob...
- The Invisible Hand of AI Libraries Shaping Open Source Projects and Communities : Abstract: In the early 1980s, Open Source Software emerged as a revolutionary concept amidst the dominance of proprietary software. What began as a revolutionary idea has now become the cornerstone of...
- Refinement Provenance Inference: Detecting LLM-Refined Training Prompts from Model Behavior : Abstract: Instruction tuning increasingly relies on LLM-based prompt refinement, where prompts in the training corpus are selectively rewritten by an external refiner to improve clarity and instructio...
- VIT-Ped: Visionary Intention Transformer for Pedestrian Behavior Analysis : Abstract: Pedestrian Intention prediction is one of the key technologies in the transition from level 3 to level 4 autonomous driving. To understand pedestrian crossing behaviour, several elements and...
- Exploring Diversity, Novelty, and Popularity Bias in ChatGPT's Recommendations : Abstract: ChatGPT has emerged as a versatile tool, demonstrating capabilities across diverse domains. Given these successes, the Recommender Systems (RSs) community has begun investigating its applica...
- Exploring Approaches for Detecting Memorization of Recommender System Data in Large Language Models : Abstract: Large Language Models (LLMs) are increasingly applied in recommendation scenarios due to their strong natural language understanding and generation capabilities. However, they are trained on...
- A neural network for modeling human concept formation, understanding and communication : Abstract: A remarkable capability of the human brain is to form more abstract conceptual representations from sensorimotor experiences and flexibly apply them independent of direct sensory inputs. How...
- Surprisal and Metaphor Novelty: Moderate Correlations and Divergent Scaling Effects : Abstract: Novel metaphor comprehension involves complex semantic processes and linguistic creativity, making it an interesting task for studying language models (LMs). This study investigates whether ...
- Enhancing Object Detection with Privileged Information: A Model-Agnostic Teacher-Student Approach : Abstract: This paper investigates the integration of the Learning Using Privileged Information (LUPI) paradigm in object detection to exploit fine-grained, descriptive information available during tra...
- Not All Needles Are Found: How Fact Distribution and Don't Make It Up Prompts Shape Literal Extraction, Logical Inference, and Hallucination Risks in Long-Context LLMs : Abstract: Large language models (LLMs) increasingly support very long input contexts. Yet it remains unclear how reliably they extract and infer information at scale. Performance varies with context l...
- Output Embedding Centering for Stable LLM Pretraining : Abstract: Pretraining of large language models is not only expensive but also prone to certain training instabilities. A specific instability that often occurs for large learning rates at the end of t...
- The New Compiler Stack: A Survey on the Synergy of LLMs and Compilers : Abstract: This survey has provided a systematic overview of the emerging field of LLM-enabled compilation by addressing several key research questions. We first answered how LLMs are being integrated ...
- Agentic Retoucher for Text-To-Image Generation : Abstract: Text-to-image (T2I) diffusion models such as SDXL and FLUX have achieved impressive photorealism, yet small-scale distortions remain pervasive in limbs, face, text and so on. Existing refine...
- Perish or Flourish? A Holistic Evaluation of Large Language Models for Code Generation in Functional Programming : Abstract: Functional programming provides strong foundations for developing reliable and secure software systems, yet its adoption remains not widespread due to the steep learning curve. Recent advanc...
- Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory : Abstract: Access to reliable agricultural advisory remains limited in many developing regions due to a persistent language barrier: authoritative agricultural manuals are predominantly written in Engl...
- Deferred Commitment Decoding for Diffusion Language Models with Confidence-Aware Sliding Windows : Abstract: Diffusion language models (DLMs) have recently emerged as a strong alternative to autoregressive models by enabling parallel text generation. To improve inference efficiency and KV-cache com...
- The Homogeneity Trap: Spectral Collapse in Doubly-Stochastic Deep Networks : Abstract: Doubly-stochastic matrices (DSM) are increasingly utilized in structure-preserving deep architectures -- such as Optimal Transport layers and Sinkhorn-based attention -- to enforce numerical...
- Vision-Based Early Fault Diagnosis and Self-Recovery for Strawberry Harvesting Robots : Abstract: Strawberry harvesting robots faced persistent challenges such as low integration of visual perception, fruit-gripper misalignment, empty grasping, and strawberry slippage from the gripper du...
- LION-DG: Layer-Informed Initialization with Deep Gradient Protocols for Accelerated Neural Network Training : Abstract: Weight initialization remains decisive for neural network optimization, yet existing methods are largely layer-agnostic. We study initialization for deeply-supervised architectures with auxi...
- Inferring Network Evolutionary History via Structure-State Coupled Learning : Abstract: Inferring a network's evolutionary history from a single final snapshot with limited temporal annotations is fundamental yet challenging. Existing approaches predominantly rely on topology a...
- DeCode: Decoupling Content and Delivery for Medical QA : Abstract: Large language models (LLMs) exhibit strong medical knowledge and can generate factually accurate responses. However, existing models often fail to account for individual patient contexts, p...
- SingingBot: An Avatar-Driven System for Robotic Face Singing Performance : Abstract: Equipping robotic faces with singing capabilities is crucial for empathetic Human-Robot Interaction. However, existing robotic face driving research primarily focuses on conversations or mim...
- Remote Sensing Change Detection via Weak Temporal Supervision : Abstract: Semantic change detection in remote sensing aims to identify land cover changes between bi-temporal image pairs. Progress in this area has been limited by the scarcity of annotated datasets,...
- Routing by Analogy: kNN-Augmented Expert Assignment for Mixture-of-Experts : Abstract: Mixture-of-Experts (MoE) architectures scale large language models efficiently by employing a parametric "router" to dispatch tokens to a sparse subset of experts. Typically, this router is ...
- BiPrompt: Bilateral Prompt Optimization for Visual and Textual Debiasing in Vision-Language Models : Abstract: Vision language foundation models such as CLIP exhibit impressive zero-shot generalization yet remain vulnerable to spurious correlations across visual and textual modalities. Existing debia...
- AI-enhanced tuning of quantum dot Hamiltonians toward Majorana modes : Abstract: We propose a neural network-based model capable of learning the broad landscape of working regimes in quantum dot simulators, and using this knowledge to autotune these devices - based on tr...
- Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting : Abstract: Supervised Fine-Tuning (SFT) is the standard paradigm for domain adaptation, yet it frequently incurs the cost of catastrophic forgetting. In sharp contrast, on-policy Reinforcement Learning...
- FormationEval, an open multiple-choice benchmark for petroleum geoscience : Abstract: This paper presents FormationEval, an open multiple-choice question benchmark for evaluating language models on petroleum geoscience and subsurface disciplines. The dataset contains 505 ques...
- Code for Machines, Not Just Humans: Quantifying AI-Friendliness with Code Health Metrics : Abstract: We are entering a hybrid era in which human developers and AI coding agents work in the same codebases. While industry practice has long optimized code for human comprehension, it is increas...
- NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation : Abstract: We present NextFlow, a unified decoder-only autoregressive transformer trained on 6 trillion interleaved text-image discrete tokens. By leveraging a unified vision representation within a un...
- Seeing the Unseen: Zooming in the Dark with Event Cameras : Abstract: This paper addresses low-light video super-resolution (LVSR), aiming to restore high-resolution videos from low-light, low-resolution (LR) inputs. Existing LVSR methods often struggle to rec...
- LLM-Empowered Functional Safety and Security by Design in Automotive Systems : Abstract: This paper presents LLM-empowered workflow to support Software Defined Vehicle (SDV) software development, covering the aspects of security-aware system topology design, as well as event-dri...
- VIBE: Visual Instruction Based Editor : Abstract: Instruction-based image editing is among the fastest developing areas in generative AI. Over the past year, the field has reached a new level, with dozens of open-source models released alon...
- A Comparative Study of Custom CNNs, Pre-trained Models, and Transfer Learning Across Multiple Visual Datasets : Abstract: Convolutional Neural Networks (CNNs) are a standard approach for visual recognition due to their capacity to learn hierarchical representations from raw pixels. In practice, practitioners of...
- TopoLoRA-SAM: Topology-Aware Parameter-Efficient Adaptation of Foundation Segmenters for Thin-Structure and Cross-Domain Binary Semantic Segmentation : Abstract: Foundation segmentation models such as the Segment Anything Model (SAM) exhibit strong zero-shot generalization through large-scale pretraining, but adapting them to domain-specific semantic...
- pdfQA: Diverse, Challenging, and Realistic Question Answering over PDFs : Abstract: PDFs are the second-most used document type on the internet (after HTML). Yet, existing QA datasets commonly start from text sources or only address specific domains. In this paper, we prese...
- Placement Semantics for Distributed Deep Learning: A Systematic Framework for Analyzing Parallelism Strategies : Abstract: Training large language models requires distributing computation across many accelerators, yet practitioners select parallelism strategies (data, tensor, pipeline, ZeRO) through trial and er...
- DatBench: Discriminative, Faithful, and Efficient VLM Evaluations : Abstract: Empirical evaluation serves as the primary compass guiding research progress in foundation models. Despite a large body of work focused on training frontier vision-language models (VLMs), ap...
- DARC: Drum accompaniment generation with fine-grained rhythm control : Abstract: In music creation, rapid prototyping is essential for exploring and refining ideas, yet existing generative tools often fall short when users require both structural control and stylistic fl...
- On the Representation of Pairwise Causal Background Knowledge and Its Applications in Causal Inference : Abstract: Pairwise causal background knowledge about the existence or absence of causal edges and paths is frequently encountered in observational studies. Such constraints allow the shared directed a...
- Geometry-induced Regularization in Deep ReLU Neural Networks : Abstract: Neural networks with a large number of parameters often do not overfit, owing to implicit regularization that favors \lq good\rq{} networks. Other related and puzzling phenomena include prop...
- Uncertainty Quantification of Surrogate Models using Conformal Prediction : Abstract: Data-driven surrogate models offer quick approximations to complex numerical and experimental systems but typically lack uncertainty quantification, limiting their reliability in safety-crit...
- UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models : Abstract: Large language models (LLMs) are shifting from answer providers to intelligent tutors in educational settings, yet current supervised fine-tuning methods only learn surface teaching patterns...
- AI-Powered Deepfake Detection Using CNN and Vision Transformer Architectures : Abstract: The increasing use of artificial intelligence generated deepfakes creates major challenges in maintaining digital authenticity. Four AI-based models, consisting of three CNNs and one Vision ...
- PyBatchRender: A Python Library for Batched 3D Rendering at Up to One Million FPS : Abstract: Reinforcement learning from pixels is often bottlenecked by the performance and complexity of 3D rendered environments. Researchers face a trade-off between high-speed, low-level engines and...
- Diffusion Timbre Transfer Via Mutual Information Guided Inpainting : Abstract: We study timbre transfer as an inference-time editing problem for music audio. Starting from a strong pre-trained latent diffusion model, we introduce a lightweight procedure that requires n...
- Aggressive Compression Enables LLM Weight Theft : Abstract: As frontier AIs become more powerful and costly to develop, adversaries have increasing incentives to steal model weights by mounting exfiltration attacks. In this work, we consider exfiltra...
- ARGUS: Adaptive Rotation-Invariant Geometric Unsupervised System : Abstract: Detecting distributional drift in high-dimensional data streams presents fundamental challenges: global comparison methods scale poorly, projection-based approaches lose geometric structure,...
- Warp-Cortex: An Asynchronous, Memory-Efficient Architecture for Million-Agent Cognitive Scaling on Consumer Hardware : Abstract: Current multi-agent Large Language Model (LLM) frameworks suffer from linear memory scaling, rendering "System 2" parallel reasoning impractical on consumer hardware. We present Warp Cortex,...
- T3C: Test-Time Tensor Compression with Consistency Guarantees : Abstract: We present T3C, a train-once, test-time budget-conditioned compression framework that exposes rank and precision as a controllable deployment knob. T3C combines elastic tensor factorization ...
- Quantifying Local Strain Field and Deformation in Active Contraction of Bladder Using a Pretrained Transformer Model: A Speckle-Free Approach : Abstract: Accurate quantification of local strain fields during bladder contraction is essential for understanding the biomechanics of bladder micturition, in both health and disease. Conventional dig...
- Adaptive Hierarchical Evaluation of LLMs and SAST tools for CWE Prediction in Python : Abstract: Large Language Models have become integral to software development, yet they frequently generate vulnerable code. Existing code vulnerability detection benchmarks employ binary classificatio...
- LinMU: Multimodal Understanding Made Linear : Abstract: Modern Vision-Language Models (VLMs) achieve impressive performance but are limited by the quadratic complexity of self-attention, which prevents their deployment on edge devices and makes t...
- From Classification to Generation: An Open-Ended Paradigm for Adverse Drug Reaction Prediction Based on Graph-Motif Feature Fusion : Abstract: Computational biology offers immense potential for reducing the high costs and protracted cycles of new drug development through adverse drug reaction (ADR) prediction. However, current meth...
- Slot-ID: Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding : Abstract: Producing prompt-faithful videos that preserve a user-specified identity remains challenging: models need to extrapolate facial dynamics from sparse reference while balancing the tension bet...
- UltraEval-Audio: A Unified Framework for Comprehensive Evaluation of Audio Foundation Models : Abstract: The development of audio foundation models has accelerated rapidly since the emergence of GPT-4o. However, the lack of comprehensive evaluation has become a critical bottleneck for further p...
- Data Complexity-aware Deep Model Performance Forecasting : Abstract: Deep learning models are widely used across computer vision and other domains. When working on the model induction, selecting the right architecture for a given dataset often relies on repet...
- ParkGaussian: Surround-view 3D Gaussian Splatting for Autonomous Parking : Abstract: Parking is a critical task for autonomous driving systems (ADS), with unique challenges in crowded parking slots and GPS-denied environments. However, existing works focus on 2D parking slot...
- Scale-Adaptive Power Flow Analysis with Local Topology Slicing and Multi-Task Graph Learning : Abstract: Developing deep learning models with strong adaptability to topological variations is of great practical significance for power flow analysis. To enhance model performance under variable sys...
- A Graph-based Framework for Online Time Series Anomaly Detection Using Model Ensemble : Abstract: With the increasing volume of streaming data in industrial systems, online anomaly detection has become a critical task. The diverse and rapidly evolving data patterns pose significant chall...
- SwinIFS: Landmark Guided Swin Transformer For Identity Preserving Face Super Resolution : Abstract: Face super-resolution aims to recover high-quality facial images from severely degraded low-resolution inputs, but remains challenging due to the loss of fine structural details and identity...
- Reliable Grid Forecasting: State Space Models for Safety-Critical Energy Systems : Abstract: Accurate grid load forecasting is safety-critical: under-predictions risk supply shortfalls, while symmetric error metrics mask this operational asymmetry. We introduce a grid-specific evalu...
- Online Estimation and Manipulation of Articulated Objects : Abstract: From refrigerators to kitchen drawers, humans interact with articulated objects effortlessly every day while completing household chores. For automating these tasks, service robots must be c...
- Bayesian Subspace Gradient Estimation for Zeroth-Order Optimization of Large Language Models : Abstract: Fine-tuning large language models (LLMs) with zeroth-order (ZO) optimization reduces memory by approximating gradients through function evaluations, but existing methods rely on one-step gra...
- Rethinking Multimodal Few-Shot 3D Point Cloud Segmentation: From Fused Refinement to Decoupled Arbitration : Abstract: In this paper, we revisit multimodal few-shot 3D point cloud semantic segmentation (FS-PCS), identifying a conflict in "Fuse-then-Refine" paradigms: the "Plasticity-Stability Dilemma." In ad...
- Accelerating Storage-Based Training for Graph Neural Networks : Abstract: Graph neural networks (GNNs) have achieved breakthroughs in various real-world downstream tasks due to their powerful expressiveness. As the scale of real-world graphs has been continuously ...
- DeepInv: A Novel Self-supervised Learning Approach for Fast and Accurate Diffusion Inversion : Abstract: Diffusion inversion is a task of recovering the noise of an image in a diffusion model, which is vital for controllable diffusion image editing. At present, diffusion inversion still remains...
- Distortion Instead of Hallucination: The Effect of Reasoning Under Strict Constraints : Abstract: With the widespread adoption of large language models (LLMs), hallucinations, which are non-factual fabrications in model outputs, have become serious concerns. Reasoning capabilities have r...
- The Optimal Sample Complexity of Linear Contracts : Abstract: In this paper, we settle the problem of learning optimal linear contracts from data in the offline setting, where agent types are drawn from an unknown distribution and the principal's goal ...
- FastV-RAG: Towards Fast and Fine-Grained Video QA with Retrieval-Augmented Generation : Abstract: Vision-Language Models (VLMs) excel at visual reasoning but still struggle with integrating external knowledge. Retrieval-Augmented Generation (RAG) is a promising solution, but current meth...
- DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving : Abstract: Video generation models, as one form of world models, have emerged as one of the most exciting frontiers in AI, promising agents the ability to imagine the future by modeling the temporal ev...
- Bridging the Data Gap: Creating a Hindi Text Summarization Dataset from the English XSUM : Abstract: Current advancements in Natural Language Processing (NLP) have largely favored resource-rich languages, leaving a significant gap in high-quality datasets for low-resource languages like Hin...
- EscherVerse: An Open World Benchmark and Dataset for Teleo-Spatial Intelligence with Physical-Dynamic and Intent-Driven Understanding : Abstract: The ability to reason about spatial dynamics is a cornerstone of intelligence, yet current research overlooks the human intent behind spatial changes. To address these limitations, we introd...
- MOSS Transcribe Diarize: Accurate Transcription with Speaker Diarization : Abstract: Speaker-Attributed, Time-Stamped Transcription (SATS) aims to transcribe what is said and to precisely determine the timing of each speaker, which is particularly valuable for meeting transc...
- Utilizing Earth Foundation Models to Enhance the Simulation Performance of Hydrological Models with AlphaEarth Embeddings : Abstract: Predicting river flow in places without streamflow records is challenging because basins respond differently to climate, terrain, vegetation, and soils. Traditional basin attributes describe...
- MM-Sonate: Multimodal Controllable Audio-Video Generation with Zero-Shot Voice Cloning : Abstract: Joint audio-video generation aims to synthesize synchronized multisensory content, yet current unified models struggle with fine-grained acoustic control, particularly for identity-preservin...
- OpenNovelty: An LLM-powered Agentic System for Verifiable Scholarly Novelty Assessment : Abstract: Evaluating novelty is critical yet challenging in peer review, as reviewers must assess submissions against a vast, rapidly evolving literature. This report presents OpenNovelty, an LLM-powe...
- HanoiWorld : A Joint Embedding Predictive Architecture BasedWorld Model for Autonomous Vehicle Controller : Abstract: Current attempts of Reinforcement Learning for Autonomous Controller are data-demanding while the results are under-performed, unstable, and unable to grasp and anchor on the concept of safe...
- The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs : Abstract: Self-reflection capabilities emerge in Large Language Models after RL post-training, with multi-turn RL achieving substantial gains over SFT counterparts. Yet the mechanism of how a unified ...
- CONSENT: A Negotiation Framework for Leveraging User Flexibility in Vehicle-to-Building Charging under Uncertainty : Abstract: The growth of Electric Vehicles (EVs) creates a conflict in vehicle-to-building (V2B) settings between building operators, who face high energy costs from uncoordinated charging, and drivers...
- From Theory of Mind to Theory of Environment: Counterfactual Simulation of Latent Environmental Dynamics : Abstract: The vertebrate motor system employs dimensionality-reducing strategies to limit the complexity of movement coordination, for efficient motor control. But when environments are dense with hid...
- REE-TTT: Highly Adaptive Radar Echo Extrapolation Based on Test-Time Training : Abstract: Precipitation nowcasting is critically important for meteorological forecasting. Deep learning-based Radar Echo Extrapolation (REE) has become a predominant nowcasting approach, yet it suffe...
- JMedEthicBench: A Multi-Turn Conversational Benchmark for Evaluating Medical Safety in Japanese Large Language Models : Abstract: As Large Language Models (LLMs) are increasingly deployed in healthcare field, it becomes essential to carefully evaluate their medical safety before clinical use. However, existing safety b...
- Learning Resilient Elections with Adversarial GNNs : Abstract: In the face of adverse motives, it is indispensable to achieve a consensus. Elections have been the canonical way by which modern democracy has operated since the 17th century. Nowadays, the...
- UniCrop: A Universal, Multi-Source Data Engineering Pipeline for Scalable Crop Yield Prediction : Abstract: Accurate crop yield prediction relies on diverse data streams, including satellite, meteorological, soil, and topographic information. However, despite rapid advances in machine learning, ex...
- Length-Aware Adversarial Training for Variable-Length Trajectories: Digital Twins for Mall Shopper Paths : Abstract: We study generative modeling of \emph{variable-length trajectories} -- sequences of visited locations/items with associated timestamps -- for downstream simulation and counterfactual analysi...
- Adversarial Instance Generation and Robust Training for Neural Combinatorial Optimization with Multiple Objectives : Abstract: Deep reinforcement learning (DRL) has shown great promise in addressing multi-objective combinatorial optimization problems (MOCOPs). Nevertheless, the robustness of these learning-based sol...
- EHRSummarizer: A Privacy-Aware, FHIR-Native Architecture for Structured Clinical Summarization of Electronic Health Records : Abstract: Clinicians routinely navigate fragmented electronic health record (EHR) interfaces to assemble a coherent picture of a patient's problems, medications, recent encounters, and longitudinal tr...
- Exposing Hidden Interfaces: LLM-Guided Type Inference for Reverse Engineering macOS Private Frameworks : Abstract: Private macOS frameworks underpin critical services and daemons but remain undocumented and distributed only as stripped binaries, complicating security analysis. We present MOTIF, an agenti...
- Lying with Truths: Open-Channel Multi-Agent Collusion for Belief Manipulation via Generative Montage : Abstract: As large language models (LLMs) transition to autonomous agents synthesizing real-time information, their reasoning capabilities introduce an unexpected attack surface. This paper introduces...
- FALCON: Few-Shot Adversarial Learning for Cross-Domain Medical Image Segmentation : Abstract: Precise delineation of anatomical and pathological structures within 3D medical volumes is crucial for accurate diagnosis, effective surgical planning, and longitudinal disease monitoring. D...
- Digital Twin-Driven Communication-Efficient Federated Anomaly Detection for Industrial IoT : Abstract: Anomaly detection is increasingly becoming crucial for maintaining the safety, reliability, and efficiency of industrial systems. Recently, with the advent of digital twins and data-driven d...
- Beyond Homophily: Community Search on Heterophilic Graphs : Abstract: Community search aims to identify a refined set of nodes that are most relevant to a given query, supporting tasks ranging from fraud detection to recommendation. Unlike homophilic graphs, m...
- Explicit World Models for Reliable Human-Robot Collaboration : Abstract: This paper addresses the topic of robustness under sensing noise, ambiguous instructions, and human-robot interaction. We take a radically different tack to the issue of reliable embodied AI...
- RelayGR: Scaling Long-Sequence Generative Recommendation via Cross-Stage Relay-Race Inference : Abstract: Real-time recommender systems execute multi-stage cascades (retrieval, pre-processing, fine-grained ranking) under strict tail-latency SLOs, leaving only tens of milliseconds for ranking. Ge...
- K-EXAONE Technical Report : Abstract: This technical report presents K-EXAONE, a large-scale multilingual language model developed by LG AI Research. K-EXAONE is built on a Mixture-of-Experts architecture with 236B total paramet...
- Multi-granularity Interactive Attention Framework for Residual Hierarchical Pronunciation Assessment : Abstract: Automatic pronunciation assessment plays a crucial role in computer-assisted pronunciation training systems. Due to the ability to perform multiple pronunciation tasks simultaneously, multi-...
- Crafting Adversarial Inputs for Large Vision-Language Models Using Black-Box Optimization : Abstract: Recent advancements in Large Vision-Language Models (LVLMs) have shown groundbreaking capabilities across diverse multimodal tasks. However, these models remain vulnerable to adversarial jai...
- Query-Document Dense Vectors for LLM Relevance Judgment Bias Analysis : Abstract: Large Language Models (LLMs) have been used as relevance assessors for Information Retrieval (IR) evaluation collection creation due to reduced cost and increased scalability as compared to ...
- MergeRec: Model Merging for Data-Isolated Cross-Domain Sequential Recommendation : Abstract: Modern recommender systems trained on domain-specific data often struggle to generalize across multiple domains. Cross-domain sequential recommendation has emerged as a promising research di...
- LIA: Supervised Fine-Tuning of Large Language Models for Automatic Issue Assignment : Abstract: Issue assignment is a critical process in software maintenance, where new issue reports are validated and assigned to suitable developers. However, manual issue assignment is often inconsist...
- Subimage Overlap Prediction: Task-Aligned Self-Supervised Pretraining For Semantic Segmentation In Remote Sensing Imagery : Abstract: Self-supervised learning (SSL) methods have become a dominant paradigm for creating general purpose models whose capabilities can be transferred to downstream supervised learning tasks. Howe...
- HyperCLOVA X 8B Omni : Abstract: In this report, we present HyperCLOVA X 8B Omni, the first any-to-any omnimodal model in the HyperCLOVA X family that supports text, audio, and vision as both inputs and outputs. By consolid...
- VerLM: Explaining Face Verification Using Natural Language : Abstract: Face verification systems have seen substantial advancements; however, they often lack transparency in their decision-making processes. In this paper, we introduce an innovative Vision-Langu...
- Sparse Threats, Focused Defense: Criticality-Aware Robust Reinforcement Learning for Safe Autonomous Driving : Abstract: Reinforcement learning (RL) has shown considerable potential in autonomous driving (AD), yet its vulnerability to perturbations remains a critical barrier to real-world deployment. As a prim...
- Moments Matter:Stabilizing Policy Optimization using Return Distributions : Abstract: Deep Reinforcement Learning (RL) agents often learn policies that achieve the same episodic return yet behave very differently, due to a combination of environmental (random transitions, ini...
- Adaptive Hybrid Optimizer based Framework for Lumpy Skin Disease Identification : Abstract: Lumpy Skin Disease (LSD) is a contagious viral infection that significantly deteriorates livestock health, thereby posing a serious threat to the global economy and food security. Owing to i...
- FedSCAM (Federated Sharpness-Aware Minimization with Clustered Aggregation and Modulation): Scam-resistant SAM for Robust Federated Optimization in Heterogeneous Environments : Abstract: Federated Learning (FL) enables collaborative model training across decentralized edge devices while preserving data privacy. However, statistical heterogeneity among clients, often manifest...
- Path Integral Solution for Dissipative Generative Dynamics : Abstract: Can purely mechanical systems generate intelligent language? We prove that dissipative quantum dynamics with analytically tractable non-local context aggregation produce coherent text genera...
- A-PINN: Auxiliary Physics-informed Neural Networks for Structural Vibration Analysis in Continuous Euler-Bernoulli Beam : Abstract: Recent advancements in physics-informed neural networks (PINNs) and their variants have garnered substantial focus from researchers due to their effectiveness in solving both forward and inv...
- The Silicon Psyche: Anthropomorphic Vulnerabilities in Large Language Models : Abstract: Large Language Models (LLMs) are rapidly transitioning from conversational assistants to autonomous agents embedded in critical organizational functions, including Security Operations Center...
- SmartFlow Reinforcement Learning and Agentic AI for Bike-Sharing Optimisation : Abstract: SmartFlow is a multi-layered framework that integrates Reinforcement Learning and Agentic AI to address the dynamic rebalancing problem in urban bike-sharing services. Its architecture separ...
- LLMize: A Framework for Large Language Model-Based Numerical Optimization : Abstract: Large language models (LLMs) have recently shown strong reasoning capabilities beyond traditional language tasks, motivating their use for numerical optimization. This paper presents LLMize,...
- LearnAD: Learning Interpretable Rules for Brain Networks in Alzheimer's Disease Classification : Abstract: We introduce LearnAD, a neuro-symbolic method for predicting Alzheimer's disease from brain magnetic resonance imaging data, learning fully interpretable rules. LearnAD applies statistical m...
- Enhancing Retrieval-Augmented Generation with Topic-Enriched Embeddings: A Hybrid Approach Integrating Traditional NLP Techniques : Abstract: Retrieval-augmented generation (RAG) systems rely on accurate document retrieval to ground large language models (LLMs) in external knowledge, yet retrieval quality often degrades in corpora...
- CornViT: A Multi-Stage Convolutional Vision Transformer Framework for Hierarchical Corn Kernel Analysis : Abstract: Accurate grading of corn kernels is critical for seed certification, directional seeding, and breeding, yet it is still predominantly performed by manual inspection. This work introduces Cor...
- Evaluating Contextual Intelligence in Recyclability: A Comprehensive Study of Image-Based Reasoning Systems : Abstract: While the importance of efficient recycling is widely acknowledged, accurately determining the recyclability of items and their proper disposal remains a complex task for the general public....
- Placenta Accreta Spectrum Detection using Multimodal Deep Learning : Abstract: Placenta Accreta Spectrum (PAS) is a life-threatening obstetric complication involving abnormal placental invasion into the uterine wall. Early and accurate prenatal diagnosis is essential t...
- Conformal Prediction Under Distribution Shift: A COVID-19 Natural Experiment : Abstract: Conformal prediction guarantees degrade under distribution shift. We study this using COVID-19 as a natural experiment across 8 supply chain tasks. Despite identical severe feature turnover ...
- Device-Native Autonomous Agents for Privacy-Preserving Negotiations : Abstract: Automated negotiations in insurance and business-to-business (B2B) commerce encounter substantial challenges. Current systems force a trade-off between convenience and privacy by routing sen...
- The Discovery Gap: How Product Hunt Startups Vanish in LLM Organic Discovery Queries : Abstract: When someone asks ChatGPT to recommend a project management tool, which products show up in the response? And more importantly for startup founders: will their newly launched product ever ap...
- Attention Needs to Focus: A Unified Perspective on Attention Allocation : Abstract: The Transformer architecture, a cornerstone of modern Large Language Models (LLMs), has achieved extraordinary success in sequence modeling, primarily due to its attention mechanism. However...
- MODE: Efficient Time Series Prediction with Mamba Enhanced by Low-Rank Neural ODEs : Abstract: Time series prediction plays a pivotal role across diverse domains such as finance, healthcare, energy systems, and environmental modeling. However, existing approaches often struggle to bal...
- Practical Geometric and Quantum Kernel Methods for Predicting Skeletal Muscle Outcomes in chronic obstructive pulmonary disease : Abstract: Skeletal muscle dysfunction is a clinically relevant extra-pulmonary manifestation of chronic obstructive pulmonary disease (COPD) and is closely linked to systemic and airway inflammation. ...
- Complexity-based code embeddings : Abstract: This paper presents a generic method for transforming the source code of various algorithms to numerical embeddings, by dynamically analysing the behaviour of computer programs against diffe...
- Application of deep learning techniques in non-contrast computed tomography pulmonary angiogram for pulmonary embolism diagnosis : Abstract: Pulmonary embolism is a life-threatening disease, early detection and treatment can significantly reduce mortality. In recent years, many studies have been using deep learning in the diagnos...
- MACA: A Framework for Distilling Trustworthy LLMs into Efficient Retrievers : Abstract: Modern enterprise retrieval systems must handle short, underspecified queries such as ``foreign transaction fee refund'' and ``recent check status''. In these cases, semantic nuance and meta...
- Measuring Social Media Polarization Using Large Language Models and Heuristic Rules : Abstract: Understanding affective polarization in online discourse is crucial for evaluating the societal impact of social media interactions. This study presents a novel framework that leverages larg...
- Analyzing the Shopping Journey: Computing Shelf Browsing Visits in a Physical Retail Store : Abstract: Motivated by recent challenges in the deployment of robots into customer-facing roles within retail, this work introduces a study of customer activity in physical stores as a step toward aut...
- AlignUSER: Human-Aligned LLM Agents via World Models for Recommender System Evaluation : Abstract: Evaluating recommender systems remains challenging due to the gap between offline metrics and real user behavior, as well as the scarcity of interaction data. Recent work explores large lang...
- LOFA: Online Influence Maximization under Full-Bandit Feedback using Lazy Forward Selection : Abstract: We study the problem of influence maximization (IM) in an online setting, where the goal is to select a subset of nodes$\unicode{x2014}$called the seed set$\unicode{x2014}$at each time step ...
- Improving Code-Switching Speech Recognition with TTS Data Augmentation : Abstract: Automatic speech recognition (ASR) for conversational code-switching speech remains challenging due to the scarcity of realistic, high-quality labeled speech data. This paper explores multil...
- Emoji-Based Jailbreaking of Large Language Models : Abstract: Large Language Models (LLMs) are integral to modern AI applications, but their safety alignment mechanisms can be bypassed through adversarial prompt engineering. This study investigates emo...
- Comparative Analysis of Formula and Structure Prediction from Tandem Mass Spectra : Abstract: Liquid chromatography mass spectrometry (LC-MS)-based metabolomics and exposomics aim to measure detectable small molecules in biological samples. The results facilitate hypothesis-generatin...
- Adapting Feature Attenuation to NLP : Abstract: Transformer classifiers such as BERT deliver impressive closed-set accuracy, yet they remain brittle when confronted with inputs from unseen categories--a common scenario for deployed NLP sy...
- Value Vision-Language-Action Planning & Search : Abstract: Vision-Language-Action (VLA) models have emerged as powerful generalist policies for robotic manipulation, yet they remain fundamentally limited by their reliance on behavior cloning, leadin...
- WildIng: A Wildlife Image Invariant Representation Model for Geographical Domain Shift : Abstract: Wildlife monitoring is crucial for studying biodiversity loss and climate change. Camera trap images provide a non-intrusive method for analyzing animal populations and identifying ecologica...
- VEAT Quantifies Implicit Associations in Text-to-Video Generator Sora and Reveals Challenges in Bias Mitigation : Abstract: Text-to-Video (T2V) generators such as Sora raise concerns about whether generated content reflects societal bias. We extend embedding-association tests from words and images to video by int...
- Scale-aware Adaptive Supervised Network with Limited Medical Annotations : Abstract: Medical image segmentation faces critical challenges in semi-supervised learning scenarios due to severe annotation scarcity requiring expert radiological knowledge, significant inter-annota...
- An Explainable Agentic AI Framework for Uncertainty-Aware and Abstention-Enabled Acute Ischemic Stroke Imaging Decisions : Abstract: Artificial intelligence models have shown strong potential in acute ischemic stroke imaging, particularly for lesion detection and segmentation using computed tomography and magnetic resonan...
- Data-Driven Assessment of Concrete Mixture Compositions on Chloride Transport via Standalone Machine Learning Algorithms : Abstract: This paper employs a data-driven approach to determine the impact of concrete mixture compositions on the temporal evolution of chloride in concrete structures. This is critical for assessin...
- Intention Collapse: Intention-Level Metrics for Reasoning in Language Models : Abstract: Every act of language generation compresses a rich internal state into a single token sequence. We call this process intention collapse: a many-to-one projection from a high dimensional inte...
- Geometric and Dynamic Scaling in Deep Transformers : Abstract: Despite their empirical success, pushing Transformer architectures to extreme depth often leads to a paradoxical failure: representations become increasingly redundant, lose rank, and ultima...
- Improving Variational Autoencoder using Random Fourier Transformation: An Aviation Safety Anomaly Detection Case-Study : Abstract: In this study, we focus on the training process and inference improvements of deep neural networks (DNNs), specifically Autoencoders (AEs) and Variational Autoencoders (VAEs), using Random F...
- Decoupling Amplitude and Phase Attention in Frequency Domain for RGB-Event based Visual Object Tracking : Abstract: Existing RGB-Event visual object tracking approaches primarily rely on conventional feature-level fusion, failing to fully exploit the unique advantages of event cameras. In particular, the ...
- ITSELF: Attention Guided Fine-Grained Alignment for Vision-Language Retrieval : Abstract: Vision Language Models (VLMs) have rapidly advanced and show strong promise for text-based person search (TBPS), a task that requires capturing fine-grained relationships between images and ...
- Enhanced Leukemic Cell Classification Using Attention-Based CNN and Data Augmentation : Abstract: We present a reproducible deep learning pipeline for leukemic cell classification, focusing on system architecture, experimental robustness, and software design choices for medical image ana...
- A Platform for Interactive AI Character Experiences : Abstract: From movie characters to modern science fiction - bringing characters into interactive, story-driven conversations has captured imaginations across generations. Achieving this vision is high...
- Beyond Demand Estimation: Consumer Surplus Evaluation via Cumulative Propensity Weights : Abstract: This paper develops a practical framework for using observational data to audit the consumer surplus effects of AI-driven decisions, specifically in targeted pricing and algorithmic lending....
- Multi-Dimensional Prompt Chaining to Improve Open-Domain Dialogue Generation : Abstract: Small language models (SLMs) offer significant deployment advantages but often struggle to match the dialogue quality of larger models in open-domain settings. In this paper, we propose a mu...
- EgoGrasp: World-Space Hand-Object Interaction Estimation from Egocentric Videos : Abstract: We propose EgoGrasp, the first method to reconstruct world-space hand-object interactions (W-HOI) from egocentric monocular videos with dynamic cameras in the wild. Accurate W-HOI reconstruc...
- Enhancing Histopathological Image Classification via Integrated HOG and Deep Features with Robust Noise Performance : Abstract: The era of digital pathology has advanced histopathological examinations, making automated image analysis essential in clinical practice. This study evaluates the classification performance ...
- A UCB Bandit Algorithm for General ML-Based Estimators : Abstract: We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary machine learning models into multi-armed bandit frameworks. A fundamental challenge in deploying s...
- SPoRC-VIST: A Benchmark for Evaluating Generative Natural Narrative in Vision-Language Models : Abstract: Vision-Language Models (VLMs) have achieved remarkable success in descriptive tasks such as image captioning and visual question answering (VQA). However, their ability to generate engaging,...
- Gendered Pathways in AI Companionship: Cross-Community Behavior and Toxicity Patterns on Reddit : Abstract: AI-companionship platforms are rapidly reshaping how people form emotional, romantic, and parasocial bonds with non-human agents, raising new questions about how these relationships intersec...
- Flow Equivariant World Models: Memory for Partially Observed Dynamic Environments : Abstract: Embodied systems experience the world as 'a symphony of flows': a combination of many continuous streams of sensory input coupled to self-motion, interwoven with the dynamics of external obj...
- Scalable Data-Driven Reachability Analysis and Control via Koopman Operators with Conformal Coverage Guarantees : Abstract: We propose a scalable reachability-based framework for probabilistic, data-driven safety verification of unknown nonlinear dynamics. We use Koopman theory with a neural network (NN) lifting ...
- Luminark: Training-free, Probabilistically-Certified Watermarking for General Vision Generative Models : Abstract: In this paper, we introduce \emph{Luminark}, a training-free and probabilistically-certified watermarking method for general vision generative models. Our approach is built upon a novel wate...
- Harm in AI-Driven Societies: An Audit of Toxicity Adoption on Chirper.ai : Abstract: Large Language Models (LLMs) are increasingly embedded in autonomous agents that participate in online social ecosystems, where interactions are sequential, cumulative, and only partially co...
- ks-lit-3m: A 3.1 million word kashmiri text dataset for large language model pretraining : Abstract: Large Language Models (LLMs) demonstrate remarkable fluency across high-resource languages yet consistently fail to generate coherent text in Kashmiri, a language spoken by approximately sev...
- SoulSeek: Exploring the Use of Social Cues in LLM-based Information Seeking : Abstract: Social cues, which convey others' presence, behaviors, or identities, play a crucial role in human information seeking by helping individuals judge relevance and trustworthiness. However, ex...
- Evolving CNN Architectures: From Custom Designs to Deep Residual Models for Diverse Image Classification and Detection Tasks : Abstract: This paper presents a comparative study of a custom convolutional neural network (CNN) architecture against widely used pretrained and transfer learning CNN models across five real-world ima...
- ScienceDB AI: An LLM-Driven Agentic Recommender System for Large-Scale Scientific Data Sharing Services : Abstract: The rapid growth of AI for Science (AI4S) has underscored the significance of scientific datasets, leading to the establishment of numerous national scientific data centers and sharing platf...
- Learning from Historical Activations in Graph Neural Networks : Abstract: Graph Neural Networks (GNNs) have demonstrated remarkable success in various domains such as social networks, molecular chemistry, and more. A crucial component of GNNs is the pooling proced...
- Wittgenstein's Family Resemblance Clustering Algorithm : Abstract: This paper, introducing a novel method in philomatics, draws on Wittgenstein's concept of family resemblance from analytic philosophy to develop a clustering algorithm for machine learning. ...
- RovoDev Code Reviewer: A Large-Scale Online Evaluation of LLM-based Code Review Automation at Atlassian : Abstract: Large Language Models (LLMs)-powered code review automation has the potential to transform code review workflows. Despite the advances of LLM-powered code review comment generation approache...
- Generating Diverse TSP Tours via a Combination of Graph Pointer Network and Dispersion : Abstract: We address the Diverse Traveling Salesman Problem (D-TSP), a bi-criteria optimization challenge that seeks a set of $k$ distinct TSP tours. The objective requires every selected tour to have...
- AI-Powered Hybrid Intrusion Detection Framework for Cloud Security Using Novel Metaheuristic Optimization : Abstract: Cybersecurity poses considerable problems to Cloud Computing (CC), especially regarding Intrusion Detection Systems (IDSs), facing difficulties with skewed datasets and suboptimal classifica...
- Bridging the Semantic Gap for Categorical Data Clustering via Large Language Models : Abstract: Categorical data are prevalent in domains such as healthcare, marketing, and bioinformatics, where clustering serves as a fundamental tool for pattern discovery. A core challenge in categori...
- RefSR-Adv: Adversarial Attack on Reference-based Image Super-Resolution Models : Abstract: Single Image Super-Resolution (SISR) aims to recover high-resolution images from low-resolution inputs. Unlike SISR, Reference-based Super-Resolution (RefSR) leverages an additional high-res...
- MentalGame: Predicting Personality-Job Fitness for Software Developers Using Multi-Genre Games and Machine Learning Approaches : Abstract: Personality assessment in career guidance and personnel selection traditionally relies on self-report questionnaires, which are susceptible to response bias, fatigue, and intentional distort...
- Correctness isnt Efficiency: Runtime Memory Divergence in LLM-Generated Code : Abstract: Large language models (LLMs) can generate programs that pass unit tests, but passing tests does not guarantee reliable runtime behavior. We find that different correct solutions to the same ...
- Improved Object-Centric Diffusion Learning with Registers and Contrastive Alignment : Abstract: Slot Attention (SA) with pretrained diffusion models has recently shown promise for object-centric learning (OCL), but suffers from slot entanglement and weak alignment between object slots ...
- Stylometry Analysis of Human and Machine Text for Academic Integrity : Abstract: This work addresses critical challenges to academic integrity, including plagiarism, fabrication, and verification of authorship of educational content, by proposing a Natural Language Proce...
- Benchmarking the Computational and Representational Efficiency of State Space Models against Transformers on Long-Context Dyadic Sessions : Abstract: State Space Models (SSMs) have emerged as a promising alternative to Transformers for long-context sequence modeling, offering linear $O(N)$ computational complexity compared to the Transfor...
- Seamlessly Natural: Image Stitching with Natural Appearance Preservation : Abstract: This paper introduces SENA (SEamlessly NAtural), a geometry-driven image stitching approach that prioritizes structural fidelity in challenging real-world scenes characterized by parallax an...
- MambaFormer: Token-Level Guided Routing Mixture-of-Experts for Accurate and Efficient Clinical Assistance : Abstract: The deployment of large language models (LLMs) in real-world clinical applications is constrained by the fundamental trade-off between computational cost and the efficiency of linear-time mo...
- From Policy to Logic for Efficient and Interpretable Coverage Assessment : Abstract: Large Language Models (LLMs) have demonstrated strong capabilities in interpreting lengthy, complex legal and policy language. However, their reliability can be undermined by hallucinations ...
- LLM Collusion : Abstract: We study how delegating pricing to large language models (LLMs) can facilitate collusion in a duopoly when both sellers rely on the same pre-trained model. The LLM is characterized by (i) a ...
- Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory : Abstract: Graph structures are increasingly used in dialog memory systems, but empirical findings on their effectiveness remain inconsistent, making it unclear which design choices truly matter. We pr...
- Semantic Alignment of Multilingual Knowledge Graphs via Contextualized Vector Projections : Abstract: The paper presents our work on cross-lingual ontology alignment system which uses embedding based cosine similarity matching. The ontology entities are made contextually richer by creating d...
- MathLedger: A Verifiable Learning Substrate with Ledger-Attested Feedback : Abstract: Contemporary AI systems achieve extraordinary performance yet remain opaque and non-verifiable, creating a crisis of trust for safety-critical deployment. We introduce MathLedger, a substrat...
- Agentic AI for Autonomous, Explainable, and Real-Time Credit Risk Decision-Making : Abstract: Significant digitalization of financial services in a short period of time has led to an urgent demand to have autonomous, transparent and real-time credit risk decision making systems. The ...
- CogCanvas: Compression-Resistant Cognitive Artifacts for Long LLM Conversations : Abstract: Large language models face a fundamental tension between context window limits and information fidelity in long conversations. Existing approaches--truncation and summarization--either disca...
- Energy-Aware Routing to Large Reasoning Models : Abstract: Large reasoning models (LRMs) have heterogeneous inference energy costs based on which model is used and how much it reasons. To reduce energy, it is important to choose the right LRM and op...
- Decomposing LLM Self-Correction: The Accuracy-Correction Paradox and Error Depth Hypothesis : Abstract: Large Language Models (LLMs) are widely believed to possess self-correction capabilities, yet recent studies suggest that intrinsic self-correction--where models correct their own outputs wi...
- Can We Trust AI Explanations? Evidence of Systematic Underreporting in Chain-of-Thought Reasoning : Abstract: When AI systems explain their reasoning step-by-step, practitioners often assume these explanations reveal what actually influenced the AI's answer. We tested this assumption by embedding hi...
- OmniNeuro: A Multimodal HCI Framework for Explainable BCI Feedback via Generative AI and Sonification : Abstract: While Deep Learning has improved Brain-Computer Interface (BCI) decoding accuracy, clinical adoption is hindered by the "Black Box" nature of these algorithms, leading to user frustration an...
- Enhancing Temporal Awareness in LLMs for Temporal Point Processes : Abstract: Temporal point processes (TPPs) are crucial for analyzing events over time and are widely used in fields such as finance, healthcare, and social systems. These processes are particularly val...
- Temporal Attack Pattern Detection in Multi-Agent AI Workflows: An Open Framework for Training Trace-Based Security Models : Abstract: We present an openly documented methodology for fine-tuning language models to detect temporal attack patterns in multi-agent AI workflows using OpenTelemetry trace analysis. We curate a dat...
- Comment on: Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Tasks : Abstract: Recently published work titled Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task by Kosmyna et al. (2025) has sparked a vivid debate on ...
- Cultural Encoding in Large Language Models: The Existence Gap in AI-Mediated Brand Discovery : Abstract: As artificial intelligence systems increasingly mediate consumer information discovery, brands face algorithmic invisibility. This study investigates Cultural Encoding in Large Language ...
- Universal Conditional Logic: A Formal Language for Prompt Engineering : Abstract: We present Universal Conditional Logic (UCL), a mathematical framework for prompt optimization that transforms prompt engineering from heuristic practice into systematic optimization. Throug...
- Counterfactual Self-Questioning for Stable Policy Optimization in Language Models : Abstract: Recent work on language model self-improvement shows that models can refine their own reasoning through reflection, verification, debate, or self-generated rewards. However, most existing ap...
- Context Collapse: In-Context Learning and Model Collapse : Abstract: This thesis investigates two key phenomena in large language models (LLMs): in-context learning (ICL) and model collapse. We study ICL in a linear transformer with tied weights trained on li...
- ElecTwit: A Framework for Studying Persuasion in Multi-Agent Social Systems : Abstract: This paper introduces ElecTwit, a simulation framework designed to study persuasion within multi-agent systems, specifically emulating the interactions on social media platforms during a pol...
- Reinforcement Learning Enhanced Multi-hop Reasoning for Temporal Knowledge Question Answering : Abstract: Temporal knowledge graph question answering (TKGQA) involves multi-hop reasoning over temporally constrained entity relationships in the knowledge graph to answer a given question. However, ...
- Accelerating Monte-Carlo Tree Search with Optimized Posterior Policies : Abstract: We introduce a recursive AlphaZero-style Monte--Carlo tree search algorithm, "RMCTS". The advantage of RMCTS over AlphaZero's MCTS-UCB is speed. In RMCTS, the search tree is explored in a br...
- Digital Twin AI: Opportunities and Challenges from Large Language Models to World Models : Abstract: Digital twins, as precise digital representations of physical systems, have evolved from passive simulation tools into intelligent and autonomous entities through the integration of artifici...
- Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale : Abstract: Large Language Models (LLMs) have rapidly advanced, with Gemini-3-Pro setting a new performance milestone. In this work, we explore collective intelligence as an alternative to monolithic sc...
- A unified multimodal understanding and generation model for cross-disciplinary scientific research : Abstract: Scientific discovery increasingly relies on integrating heterogeneous, high-dimensional data across disciplines nowadays. While AI models have achieved notable success across various scienti...
- KGCE: Knowledge-Augmented Dual-Graph Evaluator for Cross-Platform Educational Agent Benchmarking with Multimodal Language Models : Abstract: With the rapid adoption of multimodal large language models (MLMs) in autonomous agents, cross-platform task execution capabilities in educational settings have garnered significant attentio...
- Empowering Small Language Models with Factual Hallucination-Aware Reasoning for Financial Classification : Abstract: Small language models (SLMs) are increasingly used for financial classification due to their fast inference and local deployability. However, compared with large language models, SLMs are mo...
- A construction of an optimal base for conditional attribute and attributional condition implications in triadic contexts : Abstract: This article studies implications in triadic contexts. Specifically, we focus on those introduced by Ganter and Obiedkov, namely conditional attribute and attributional condition implication...
- Reading Between the Lines: Deconfounding Causal Estimates using Text Embeddings and Deep Learning : Abstract: Estimating causal treatment effects in observational settings is frequently compromised by selection bias arising from unobserved confounders. While traditional econometric methods struggle ...
- Bayesian Orchestration of Multi-LLM Agents for Cost-Aware Sequential Decision-Making : Abstract: Large language models (LLMs) are increasingly deployed as autonomous decision agents in settings with asymmetric error costs: hiring (missed talent vs wasted interviews), medical triage (mis...
- Aletheia: Quantifying Cognitive Conviction in Reasoning Models via Regularized Inverse Confusion Matrix : Abstract: In the progressive journey toward Artificial General Intelligence (AGI), current evaluation paradigms face an epistemological crisis. Static benchmarks measure knowledge breadth but fail to ...
- Improving Behavioral Alignment in LLM Social Simulations via Context Formation and Navigation : Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in experimental settings, but they systematically diverge from human decisions in complex decision-making enviro...
- Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement : Abstract: We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale ope...
- CaveAgent: Transforming LLMs into Stateful Runtime Operators : Abstract: LLM-based agents are increasingly capable of complex task execution, yet current agentic systems remain constrained by text-centric paradigms. Traditional approaches rely on procedural JSON-...
- Structured Decomposition for LLM Reasoning: Cross-Domain Validation and Semantic Web Integration : Abstract: Rule-based reasoning over natural language input arises in domains where decisions must be auditable and justifiable: clinical protocols specify eligibility criteria in prose, evidence rules...
- Yuan3.0 Flash: An Open Multimodal Large Language Model for Enterprise Applications : Abstract: We introduce Yuan3.0 Flash, an open-source Mixture-of-Experts (MoE) MultiModal Large Language Model featuring 3.7B activated parameters and 40B total parameters, specifically designed to enh...
- AI Agent Systems: Architectures, Applications, and Evaluation : Abstract: AI agents -- systems that combine foundation models with reasoning, planning, memory, and tool use -- are rapidly becoming a practical interface between natural-language intent and real-worl...
- A New Benchmark for the Appropriate Evaluation of RTL Code Optimization : Abstract: The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for genera...
- Can Large Language Models Solve Engineering Equations? A Systematic Comparison of Direct Prediction and Solver-Assisted Approaches : Abstract: Transcendental equations requiring iterative numerical solution pervade engineering practice, from fluid mechanics friction factor calculations to orbital position determination. We systemat...
- PsychEval: A Multi-Session and Multi-Therapy Benchmark for High-Realism and Comprehensive AI Psychological Counselor : Abstract: To develop a reliable AI for psychological assessment, we introduce \texttt{PsychEval}, a multi-session, multi-therapy, and highly realistic benchmark designed to address three key challenge...
- Admissibility Alignment : Abstract: This paper introduces Admissibility Alignment: a reframing of AI alignment as a property of admissible action and decision selection over distributions of outcomes under uncertainty, evaluat...
- COMPASS: A Framework for Evaluating Organization-Specific Policy Alignment in LLMs : Abstract: As large language models are deployed in high-stakes enterprise applications, from healthcare to finance, ensuring adherence to organization-specific policies has become essential. Yet exist...
- Clinical Knowledge Graph Construction and Evaluation with Multi-LLMs via Retrieval-Augmented Generation : Abstract: Large language models (LLMs) offer new opportunities for constructing knowledge graphs (KGs) from unstructured clinical narratives. However, existing approaches often rely on structured inpu...
- Jenius Agent: Towards Experience-Driven Accuracy Optimization in Real-World Scenarios : Abstract: As agent systems powered by large language models (LLMs) advance, improving the task performance of an autonomous agent, especially in context understanding, tool usage, and response generat...
- Toward Auditable Neuro-Symbolic Reasoning in Pathology: SQL as an Explicit Trace of Evidence : Abstract: Automated pathology image analysis is central to clinical diagnosis, but clinicians still ask which slide features drive a model's decision and why. Vision-language models can produce natura...
- Theory Trace Card: Theory-Driven Socio-Cognitive Evaluation of LLMs : Abstract: Socio-cognitive benchmarks for large language models (LLMs) often fail to predict real-world behavior, even when models achieve high benchmark scores. Prior work has attributed this evaluati...
- MMP-A*: Multimodal Perception Enhanced Incremental Heuristic Search on Path Planning : Abstract: Autonomous path planning requires a synergy between global reasoning and geometric precision, especially in complex or cluttered environments. While classical A* is valued for its optimality...
- OpenSocInt: A Multi-modal Training Environment for Human-Aware Social Navigation : Abstract: In this paper, we introduce OpenSocInt, an open-source software package providing a simulator for multi-modal social interactions and a modular architecture to train social agents. We descri...
- CNC-TP: Classifier Nominal Concept Based on Top-Pertinent Attributes : Abstract: Knowledge Discovery in Databases (KDD) aims to exploit the vast amounts of data generated daily across various domains of computer applications. Its objective is to extract hidden and meanin...
- ChaosBench-Logic: A Benchmark for Logical and Symbolic Reasoning on Chaotic Dynamical Systems : Abstract: Large language models (LLMs) excel at natural language tasks but remain brittle in domains requiring precise logical and symbolic reasoning. Chaotic dynamical systems provide an especially d...
- MindChat: A Privacy-preserving Large Language Model for Mental Health Support : Abstract: Large language models (LLMs) have shown promise for mental health support, yet training such models is constrained by the scarcity and sensitivity of real counseling dialogues. In this artic...
- XAI-MeD: Explainable Knowledge Guided Neuro-Symbolic Framework for Domain Generalization and Rare Class Detection in Medical Imaging : Abstract: Explainability domain generalization and rare class reliability are critical challenges in medical AI where deep models often fail under real world distribution shifts and exhibit bias again...
- Simulated Reasoning is Reasoning : Abstract: Reasoning has long been understood as a pathway between stages of understanding. Proper reasoning leads to understanding of a given subject. This reasoning was conceptualized as a process of...
- Higher-Order Action Regularization in Deep Reinforcement Learning: From Continuous Control to Building Energy Management : Abstract: Deep reinforcement learning agents often exhibit erratic, high-frequency control behaviors that hinder real-world deployment due to excessive energy consumption and mechanical wear. We syste...
- FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations : Abstract: Pharmaceutical three-dimensional (3D) printing is an advanced fabrication technology with the potential to enable truly personalised dosage forms. Recent studies have integrated artificial i...
- EverMemOS: A Self-Organizing Memory Operating System for Structured Long-Horizon Reasoning : Abstract: Large Language Models (LLMs) are increasingly deployed as long-term interactive agents, yet their limited context windows make it difficult to sustain coherent behavior over extended interac...
- Streaming Hallucination Detection in Long Chain-of-Thought Reasoning : Abstract: Long chain-of-thought (CoT) reasoning improves the performance of large language models, yet hallucinations in such settings often emerge subtly and propagate across reasoning steps. We sugg...
- Project Ariadne: A Structural Causal Framework for Auditing Faithfulness in LLM Agents : Abstract: As Large Language Model (LLM) agents are increasingly tasked with high-stakes autonomous decision-making, the transparency of their reasoning processes has become a critical safety concern. ...
- Falcon-H1R: Pushing the Reasoning Frontiers with a Hybrid Model for Efficient Test-Time Scaling : Abstract: This work introduces Falcon-H1R, a 7B-parameter reasoning-optimized model that establishes the feasibility of achieving competitive reasoning performance with small language models (SLMs). F...
- The Qualitative Laboratory: Theory Prototyping and Hypothesis Generation with Large Language Models : Abstract: A central challenge in social science is to generate rich qualitative hypotheses about how diverse social groups might interpret new information. This article introduces and illustrates a no...
- A Modular Reference Architecture for MCP-Servers Enabling Agentic BIM Interaction : Abstract: Agentic workflows driven by large language models (LLMs) are increasingly applied to Building Information Modelling (BIM), enabling natural-language retrieval, modification and generation of...
- Can Large Language Models Improve Venture Capital Exit Timing After IPO? : Abstract: Exit timing after an IPO is one of the most consequential decisions for venture capital (VC) investors, yet existing research focuses mainly on describing when VCs exit rather than evaluatin...
- Free Energy-Based Modeling of Emotional Dynamics in Video Advertisements : Abstract: Emotional responses during advertising video viewing are recognized as essential for understanding media effects because they have influenced attention, memory, and purchase intention. To es...
- Speak the Art: A Direct Speech to Image Generation Framework : Abstract: Direct speech-to-image generation has recently shown promising results. However, compared to text-to-image generation, there is still a large gap to enclose. Current approaches use two stage...
- A Knowledge Graph and Deep Learning-Based Semantic Recommendation Database System for Advertisement Retrieval and Personalization : Abstract: In modern digital marketing, the growing complexity of advertisement data demands intelligent systems capable of understanding semantic relationships among products, audiences, and advertisi...
- Intrinsic-Metric Physics-Informed Neural Networks (IM-PINN) for Reaction-Diffusion Dynamics on Complex Riemannian Manifolds : Abstract: Simulating nonlinear reaction-diffusion dynamics on complex, non-Euclidean manifolds remains a fundamental challenge in computational morphogenesis, constrained by high-fidelity mesh generat...
- Pediatric Pneumonia Detection from Chest X-Rays:A Comparative Study of Transfer Learning and Custom CNNs : Abstract: Pneumonia is a leading cause of mortality in children under five, with over 700,000 deaths annually. Accurate diagnosis from chest X-rays is limited by radiologist availability and variabili...
- A Global Atlas of Digital Dermatology to Map Innovation and Disparities : Abstract: The adoption of artificial intelligence in dermatology promises democratized access to healthcare, but model reliability depends on the quality and comprehensiveness of the data fueling thes...
- Value-guided action planning with JEPA world models : Abstract: Building deep learning models that can reason about their environment requires capturing its underlying dynamics. Joint-Embedded Predictive Architectures (JEPA) provide a promising framework...
Research Sources: 642 | Generated: 1/6/2026
