Schedule released Program schedule released for Jun 3, 2026, 7:30 AM-12:30 PM in Room 506, Denver.

Accepted papers

GRAIL-V 2026 Papers

Accepted papers for GRAIL-V at CVPR 2026 are listed below. Links open the corresponding OpenReview pages.

#4 Long paper

CompAgent: An Agentic Framework for Visual Compliance Verification

Rahul Ghosh, Baishali Chaudhury, Hari Prasanna Das, Meghana Ashok, Ryan Razkenari, Long Chen, Sungmin Hong, Chun-Hao Liu

OpenReview
#5 Long paper

A Sanity Check on Composed Image Retrieval

Yikun Liu, Jiangchao Yao, Weidi Xie, Yanfeng Wang

OpenReview
#7 Long paper

Emotional Vocabulary as Semantic Grounding: How Language Register Affects Diffusion Efficiency in Video Generation

Scott Boudreaux

OpenReview
#9 Long paper

EFSA: Episodic Few-Shot Adaptation for Text-to-Image Retrieval

Muhammad Huzaifa, Yova Kementchedjhieva

OpenReview
#10 Long paper

ViSS-R1: Self-Supervised Reinforcement Video Reasoning

Bo Fang, YuXin Song, Haoyuan Sun, Xinyao Zhang, Qiangqiang Wu, Wenhao Wu, Antoni B. Chan

OpenReview
#11 Long paper

HIVE: Query, Hypothesize, Verify — A LLM Framework for Multimodal Reasoning-Intensive Retrieval

Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Abdelrahman Abdallah, Hyun Soo Kang

OpenReview
#12 Non-archival submission

MentalBlackboard: Evaluating Spatial Visualization via Mathematical Transformations

Nilay Yilmaz, Maitreya Patel, Naga Sai Abhiram kusumba, Yixuan He, Yezhou Yang

OpenReview
#13 Long paper

RePlan-Bot: Multi-Level Replanning for Embodied Instruction Following

Xicheng Gong, Guozheng Sun, Peiran Xu, Yadong MU

OpenReview
#20 Long paper

Towards Context-Aware Image Anonymization with Multi-Agent Reasoning

Robert Aufschläger, Jakob Folz, Gautam Savaliya, Manjitha D Vidanalage, Michael Heigl, Martin Schramm

OpenReview
#21 Long paper

CoCoA-DVC: Consistency and Concept Aware Training for Dense Video Captioning

Jay Nitin Paranjape, Yue Guo, sankar venkataraman, Vishal M. Patel, Nataraj Jammalamadaka

OpenReview
#23 Long paper

DualProc: Dual-Process Prompting Reduces Confident Errors in Vision-Language Models for Grounded Retrieval and Agentic Pipelines

Aayam Bansal, Ishaan Gangwani

OpenReview
#24 Long paper

CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception

Miguel Carvalho, Helder Dias, Bruno Martins

OpenReview
#27 Long paper

BRIDGE: Multimodal-to-Text Retrieval via Reinforcement-Learned Query Alignment

Mohamed Darwish Mounis, Mohamed Mahmoud, Shaimaa Sedek, Mahmoud Abdalla, Mahmoud SalahEldin Kasem, Abdelrahman Abdallah, Hyun Soo Kang

OpenReview
#29 Long paper

Memory in Multimodal AI Agents: Hardware-Software Co-Design for KV Caches, Attention IO, and Retrieval Stores

Shubham Khandelwal

OpenReview
#30 Short paper

ChatUMM: Robust Context Tracking for Conversational Interleaved Generation

Wenxun Dai, Zhiyuan Zhao, Yule Zhong, Yiji Cheng, Jian-Wei Zhang, LinqingWang, Shiyi Zhang, Yunlong Lin, Runze He, Fellix Song, Wayne Zhuang, Yong Liu, Haoji Zhang, Yansong Tang, Chunyu Wang

OpenReview
#31 Long paper

Gaze-Regularized Vision-Language-Action Models for Robotic Manipulation

Anupam Pani, Yanchao Yang

OpenReview
#33 Non-archival submission

WIST: Web-Grounded Iterative Self-Play Tree for Domain-Targeted Reasoning Improvement

Fangyuan Li, Pengfei Li, Shijie Wang, Junqi Gao, Jianxing Liu, Biqing Qi, Yuqiang Li

OpenReview
#35 Non-archival submission

Investigating VLM Hallucination from a Cognitive Psychology Perspective: A First Step Toward Interpretation with Intriguing Observations

Xiangrui Liu, Man Luo, Agneet Chatterjee, Hua Wei, Chitta Baral, Yezhou Yang

OpenReview
#36 Long paper

HTEF: Holistic Brand-Theme Alignment Scoring as a Catalog Gate for Grounded Conversational Recommendation

Md Mahmudur Rahman, Dhruv Garg, Rishabh Rathod, Sanket Bindle

OpenReview
#40 Long paper

The Race between Agentic AI Capabilities and Data Quality Control in Online Surveys

Sourav Panda, Hillmer Chona, Rupak Kumar Das, Shreyash Kale, Shikha Soneji, Jonathan Dodge

OpenReview
#43 Non-archival submission

Evaluating Reasoning Fidelity in Visual Text Generation

Jiajun Hong, Jiawei Zhou

OpenReview
#46 Non-archival submission

Lightweight and Production-Ready PDF Visual Element Parsing

Meizhu Liu, Yassi Abbasi, Matthew Rowe, M. Avendi, Paul Li

OpenReview
#51 Non-archival submission

M3Grounder: Mask-Based Multi-Span and Multi-Granular Grounding for Document QA

Venkata Kesav Venna, Sai Madhusudan Gunda, Jyothi Swaroopa Jinka, Hrithik Sagar Rachakonda, Anirudh Srinivasan, Ravi Kiran Sarvadevabhatla

OpenReview
#52 Long paper

SHOE - Semantic HOI Open-vocabulary Evaluation metric

Maja Noack, Qinqian Lei, Taipeng Tian, Bihan Dong, Robby T. Tan, Yixin Chen, John Young, Saijun Zhang, Bo Wang

OpenReview
#53 Long paper

RAGENT: Robust Optimization for Grounded Vision-Language Retrieval

Kathy Wu, Sarthak Srivastava

OpenReview
#55 Short paper

Learning to Mix Flat and Curved Representations for Vision-Language Retrieval

Kathy Wu, Sarthak Srivastava

OpenReview
#56 Long paper

Neural-Symbolic Intention Refinement with User Feedback for Text-to-Image Retrieval

BAI YU, Lei Zhang, Xiaoyan Hu, Feng Zhu, Rui Zhao

OpenReview
#59 Long paper

Knowledge or Action? Automation Boundary Prediction with Intent Discovery and Knowledge Use-Case Enablement for Agentic Enterprise Support

Kumar Mayank, Ipseeta Sahu, Sajeetha Jaganathan

OpenReview
#61 Long paper

Negation Matters: Training-Free Negation-Aware Image Retrieval

Aashish Pokhrel, Shivanand Venkanna Sheshappanavar

OpenReview
#63 Long paper

Seeing without Looking: Do Vision-Language Benchmarks Really Test Vision?

Zixuan Lan, Luzhe Sun, Matthew Walter, Jiawei Zhou

OpenReview
#64 Long paper

CMAG: Concept-Scaffolded Retrieval for Marketplace Avatar Generation

Rajeev Goel, Jason Ding, Phani Harish Wajjala, Pavan K. Turaga, Tejaswi Gowda, Krishna C. Garikipati

OpenReview
#65 Long paper

A Multi-Agent Framework for Grounding Medical AI in Expert Clinical Knowledge under Domain Shift

Midhat Urooj, Ayan Banerjee, Sandeep Gupta

OpenReview
#68 Long paper

SlotVTG: Object-Centric Adapter for Generalizable Video Temporal Grounding

Jiwook Han, Geo Ahn, Youngrae Kim, Jinwoo Choi

OpenReview
#70 Long paper

Towards Robust Zero-Shot Video Temporal Grounding

Nutthadech Banditakkarakul, Bo Chen, Stephen Gould

OpenReview
#71 Long paper

AgenticRAG-Driven Floorplan Parsing for Assistive Indoor Navigation for Blind and Low-Vision Users

Aydin Ayanzadeh, Tim Oates

OpenReview
#72 Long paper

EVICT: Evidence-Sufficiency Verification via Counterfactual Dropout for Visually-Grounded Selective Question Answering

Varun Kotte

OpenReview
#73 Long paper

CALIBRA: Calibration-Aware Multi-Agent Verification for Contactless Physiological Monitoring

Shadman Sakib, Gaurav Shinde, Nirmalya Roy

OpenReview

Stay connected

General inquiries

Reach out to the organizers with questions about submissions, sponsorship, or program.