GRAIL-V 2026 Thank you for joining us at CVPR 2026. Slides, award winners, accepted papers, and proceedings are now available.

Grounded Retrieval and Agentic Intelligence for Vision-Language plan > retrieve > reason > verify

A CVPR 2026 workshop for researchers and practitioners building grounded multimodal retrieval, reranking, and verification systems that can be deployed with confidence.

Workshop starts Jun 3, 2026 · 7:30 AM

P-R-R-V Loop

Plan tasks and tool use, retrieve candidates, rerank with grounded reasoning, then verify with calibrated evidence.

GRAIL-V CVPR 2026

Confirmed program

June 3 Agenda

7:30 AM-12:30 PM Room 506 Half-day workshop
07:30 AM - 07:45 AM

Welcome & Opening Remarks

Organizing Committee

07:45 AM - 08:15 AM

Invited Talk 1

Dan Roth Dan Roth University of Pennsylvania / Oracle AI

AI for Data and Data for AI

08:15 AM - 08:45 AM

Invited Talk 2

Kristen Grauman Kristen Grauman University of Texas at Austin

Grounding Temporal Reasoning in Video Evidence

08:45 AM - 09:00 AM

Coffee Break

Coffee available during this short break.

09:00 AM - 09:30 AM

Invited Talk 3

MetaCLIP: Open, Scalable Data Curation for Vision-Language Models

09:30 AM - 10:00 AM

Invited Talk 4

Mohit Bansal Mohit Bansal University of North Carolina at Chapel Hill

Long-Horizon Video Reasoning and Generation

10:00 AM - 11:00 AM

Coffee Break & Poster Session

Coffee available during this window.

Posters will be in Exhibit Hall A, boards 188-207.

11:00 AM - 11:15 AM

Paper Presentation

11:15 AM - 12:15 PM

Industry Panel

Sujith Ravi Sujith Ravi GVP, Oracle AI
Vijay Krishnan Vijay Krishnan Founder and CTO, Turing
Kenneth Marino Kenneth Marino University of Utah / Ex-DeepMind
Ming-Hsuan Yang Ming-Hsuan Yang UC Merced / DeepMind

From Understanding to Action: Building the next AI frontier with Multimodal Agents, World Models, and Real-world Intelligence

Moderated by Sujith Ravi with panelists Vijay Krishnan, Scott Wen-Tau Yih, Kenneth Marino, and Ming-Hsuan Yang.

12:15 PM - 12:30 PM

Thanks & Closing Remarks

Organizing Committee

Speakers

Kristen Grauman

Kristen Grauman

University of Texas at Austin

Leads work on egocentric video understanding and retrieval at scale, with a focus on long-horizon grounding and efficient perception.
Mohit Bansal

Mohit Bansal

University of North Carolina at Chapel Hill

Researches multimodal language/vision agents, grounded reasoning, and controllable generation for real-world tasks.
Dan Roth

Dan Roth

University of Pennsylvania / Oracle AI

Pioneer in structured, grounded reasoning and robust inference for language/vision systems deployed in real settings.
Scott Wen-Tau Yih

Scott Wen-Tau Yih

Meta

Research Scientist at Meta FAIR and affiliate professor at the University of Washington. His work spans NLP, ML, and information retrieval, including DPR and RAG, and he was named an ACL Fellow in 2024.

Focus areas at a glance

We’re looking for work that ties mutlimodal agentic tools,evidence to decisions, scales retrieval and reranking, and evaluates real deployment constraints.

Primary Track

Vision-language models Grounded reasoning Multimodal agents Agentic memory

Secondary Track

Detection & retrieval Video understanding Document & chart analysis Foundation models

Visual Grounding & Evidence

Region grounding Temporal grounding Evidence overlays Citation provenance

Retrieval & Ranking

Hybrid search Dense/sparse retrieval Structured & unstructured sources Rerankers Long-context retrieval

Agentic Tools & Planning

Tool routing Structured queries Layout parsing Guardrails

Evaluation & Efficiency

Benchmarks Reproducibility Latency & cost Energy & memory

Why now

Grounded evidence is the missing piece for agentic vision

Vision-language agents increasingly rely on a loop of plan, retrieve, rerank, and verify before acting. But how we measure evidence grounding, calibration, and end-to-end efficiency is still fragmented across communities.

GRAIL-V brings together CV, IR, NLP, HCI, and systems researchers and practitioners working on evidence-centric retrieval and verification for deployable agentic systems.

Why submit or attend

  • Get feedback on grounded retrieval, reranking, and verification.
  • Share real-world evaluations, demos, and deployment lessons.
  • Connect with researchers building agentic VLM pipelines.
  • Help shape community benchmarks and best practices.

Important dates

Timeline (Anywhere on Earth)

Mar 7, 2026 Mar 8, 2026

CVPR 2026 workshop submission deadline

Extended deadline. OpenReview submission closes at 23:59 AoE.

Mar 18, 2026

Notification to authors

Decisions released via OpenReview.

Apr 5, 2026

Camera-ready due

Final versions for CVPR workshop proceedings.

Jun 3, 2026 · 7:30 AM

Workshop day in Room 506, Denver

Confirmed half-day program with invited talks, paper presentations, posters, and an industry panel.

Updates

Latest announcements

Jun 3, 2026
GRAIL-V Best Paper and Outstanding Paper announced

Congratulations to SHOE - Semantic HOI Open-vocabulary Evaluation metric, winner of the $2,000 Best Paper award, and HTEF: Holistic Brand-Theme Alignment Scoring as a Catalog Gate for Grounded Conversational Recommendation, winner of the $1,500 Outstanding Paper award.

Jun 1, 2026
Poster session location and boards posted

The poster session will run on June 3, 2026 from 10:00 AM to 11:00 AM local time in Exhibit Hall A, boards 188-207.

In-person with hybrid support

Location

Room 506, Denver, USA. Confirmed workshop start: Jun 3, 2026 at 7:30 AM local time.

Hybrid plan

Hybrid participation follows CVPR guidance; details will be posted when available.

Registration

Registration is handled via CVPR. We will link the official registration page when it opens.

Accessibility

Reserved front-row seating, mic runners, captioning support.

Stay connected

General inquiries

Reach out to the organizers with questions about submissions, sponsorship, or program.