Kristen Grauman
University of Texas at Austin
Leads work on egocentric video understanding and retrieval at scale, with a focus on long-horizon grounding and efficient perception.A CVPR 2026 workshop for researchers and practitioners building grounded multimodal retrieval, reranking, and verification systems that can be deployed with confidence.
Plan tasks and tool use, retrieve candidates, rerank with grounded reasoning, then verify with calibrated evidence.
Confirmed program
Organizing Committee
Kristen Grauman
University of Texas at Austin
Grounding Temporal Reasoning in Video Evidence
Coffee available during this short break.
Scott Wen-Tau Yih
Meta
MetaCLIP: Open, Scalable Data Curation for Vision-Language Models
Mohit Bansal
University of North Carolina at Chapel Hill
Long-Horizon Video Reasoning and Generation
Coffee available during this window.
Posters will be in Exhibit Hall A, boards 188-207.
Sujith Ravi
GVP, Oracle AI
Vijay Krishnan
Founder and CTO, Turing
Scott Wen-Tau Yih
Meta
Kenneth Marino
University of Utah / Ex-DeepMind
Ming-Hsuan Yang
UC Merced / DeepMind
From Understanding to Action: Building the next AI frontier with Multimodal Agents, World Models, and Real-world Intelligence
Moderated by Sujith Ravi with panelists Vijay Krishnan, Scott Wen-Tau Yih, Kenneth Marino, and Ming-Hsuan Yang.
Organizing Committee
University of Texas at Austin
Leads work on egocentric video understanding and retrieval at scale, with a focus on long-horizon grounding and efficient perception.University of North Carolina at Chapel Hill
Researches multimodal language/vision agents, grounded reasoning, and controllable generation for real-world tasks.University of Pennsylvania / Oracle AI
Pioneer in structured, grounded reasoning and robust inference for language/vision systems deployed in real settings.Meta
Research Scientist at Meta FAIR and affiliate professor at the University of Washington. His work spans NLP, ML, and information retrieval, including DPR and RAG, and he was named an ACL Fellow in 2024.We’re looking for work that ties mutlimodal agentic tools,evidence to decisions, scales retrieval and reranking, and evaluates real deployment constraints.
Why now
Vision-language agents increasingly rely on a loop of plan, retrieve, rerank, and verify before acting. But how we measure evidence grounding, calibration, and end-to-end efficiency is still fragmented across communities.
GRAIL-V brings together CV, IR, NLP, HCI, and systems researchers and practitioners working on evidence-centric retrieval and verification for deployable agentic systems.
Important dates
Extended deadline. OpenReview submission closes at 23:59 AoE.
Decisions released via OpenReview.
Final versions for CVPR workshop proceedings.
Confirmed half-day program with invited talks, paper presentations, posters, and an industry panel.
Updates
Congratulations to SHOE - Semantic HOI Open-vocabulary Evaluation metric, winner of the $2,000 Best Paper award, and HTEF: Holistic Brand-Theme Alignment Scoring as a Catalog Gate for Grounded Conversational Recommendation, winner of the $1,500 Outstanding Paper award.
The poster session will run on June 3, 2026 from 10:00 AM to 11:00 AM local time in Exhibit Hall A, boards 188-207.
Room 506, Denver, USA. Confirmed workshop start: Jun 3, 2026 at 7:30 AM local time.
Hybrid participation follows CVPR guidance; details will be posted when available.
Registration is handled via CVPR. We will link the official registration page when it opens.
Reserved front-row seating, mic runners, captioning support.
Reach out to the organizers with questions about submissions, sponsorship, or program.