CReLeRI: Explainable, Concept-centric, Representation, Learning, Reasoning, and Interaction Video Analysis System

Michael Francis Perez, Yichi Yang, Yuheng Zha, Enze Ma, Danish Nisar Ahmed Tamboli, Haodi Ma, Reza Shahriari, Vyom Pathak, Dzmitry Kasinets, Rohith Venkatakrishnan, Daisy Zhe Wang, Jaime Ruiz, Eric Ragan, Zhiting Hu, Eric P. Xing, Jun-Yan Zhu  ACM MM 2025 (Demo/Video Track), 2025


Abstract

Existing video analysis models often lack explainability, struggle with long videos, and hallucinate. Commercial solutions are closed-source and costly. We introduce CReLeRI, an open-source system for action detection in untrimmed videos. CReLeRI integrates segmentation, action detection, argument detection, and grounding to improve interpretability and reduce hallucinations, enhancing transparency and trust in AI-driven video analysis. This paper is accompanied by a demonstration video.