Multimodal Retrieval
Scaling search across images, video, charts, and UI with hybrid dense/lexical methods.
At GRAIL-V we bring together CV, IR, NLP, and HCI communities to advance unified methods and evaluation practices for visual–text search, multimodal tool use, and calibrated decision-making. This is the CVPR 2026 workshop CFP for multimodal generative/understanding models and agents that plan, retrieve, reason and verify with an evidence-centric systems.
We invite papers and demos on grounded vision-language systems that plan, retrieve, reason, and verify with evidence. We welcome submissions that bridge multimodal perception and actionable decision-making. Please follow the CVPR 2026 author guidelines.
Scaling search across images, video, charts, and UI with hybrid dense/lexical methods.
Understanding tool routing, memory tracking, and safe multi-step workflows.
Incorporating image/video/text understanding, generation and editing tools into agentic systems.
Citation provenance, evidence overlays, and audit-ready faithfulness.
Benchmarking reproducibility, latency, and cost efficiency.
We invite archival papers and demos on grounded multimodal retrieval, reranking, and verification for agentic vision systems. We especially welcome work that reports grounded evidence (region/page/moment), calibration or abstention behavior, and real deployment constraints (latency, memory, cost).
Technical merit, grounded evaluation, efficiency, reproducibility, and broader safety considerations.
Accepted papers appear as posters or short orals. At least one author presents in person.
Same-employer, advisor-advisee, recent co-author, or close personal conflicts.
Important dates
OpenReview submission closes at 23:59 AoE.
Decisions released via OpenReview.
Final versions for CVPR workshop proceedings.
Full-day program with keynotes, panels, and posters.
Topics of interest
Work on grounded multimodal retrieval, reranking, and verification for agentic vision-language systems: region/page/moment evidence, hybrid structured+unstructured retrieval, tool use, calibration/abstention, robustness, and deployment efficiency.
Clear evidence grounding (citations or region/page/moment provenance), realistic efficiency reporting (latency/memory/cost), and reproducible artifacts with licenses when possible.
Yes. Accepted papers appear in the CVPR 2026 Workshop Proceedings (archival).
See the Important Dates section on this page for the exact deadline and timeline updates.
Yes. We welcome demos and industry showcases. Indicate this in your submission.
CVPR policy prefers in-person presentations. Remote exceptions require documentation.
Strongly encouraged. Please include artifact links and licenses when possible.
Reach out to the organizers with questions about submissions, sponsorship, or program.