Understanding how video is classified

master Assignment

Type: Master CS

Student: Unassigned

Duration: TBD

If you are interested please contact:

Background:

Explainable AI (XAI) is helpful for finding input attributions to obtain results. XAI methods for video models are somewhat limited, with some recent works examining XAI in video, such as:

Leaping Into Memories: Space-Time Deep Feature Synthesis, A. Stergiou, N. Deligiannis, ICCV 2023.

These XAI methods can be extended and adapted to better understand why some methods fail and which aspects of the data are more challenging for accurate recognition/quality assessment for the fine-grained video classification examined in this work.

Recently, methods have been developed to understand SoA SSL models, such as (among others):

References:

Yingwei Li, Yi Li, and Nuno Vasconcelos. Resound: Towards action recognition without representation bias. In ECCV, pages 513–528, 2018 (Diving48 dataset)
Jinglin Xu, Yongming Rao, Xumin Yu, Guangyi Chen, Jie Zhou, and Jiwen Lu. FineDiving: A Fine-Grained Dataset for Procedure-Aware Action Quality Assessment. In CVPR, pages 2949–2958, 2022
Dian Shao, Yue Zhao, Bo Dai, and Dahua Lin. FineGym: A Hierarchical Video Dataset for Fine-Grained Action Understanding. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
Yuyang Gao, Siyi Gu, Junji Jiang, Sungsoo Ray Hong, Dazhou Yu, and Liang Zhao. Going Beyond XAI: A Systematic Survey for Explanation-Guided Learning. ACM Computing Surveys, 56(7):1–39, 2024.
Expresso-AI: A framework for Explainable Video Based Deep Learning Models through gestures and expressions.
Ex-VAD: Explainable Fine-grained Video Anomaly Detection Based on Visual-Language Models.