← Back to work
Works cited by this work
34 works
Work: Video Captioning with Spatio-Temporal Graph Transformers
ActivityNet: A large-scale video benchmark for human activity understanding
Fabian Caba Heilbron, Víctor Escorcia, Bernard Ghanem +1
Article20153 citationsABITowards Automatic Learning of Procedures From Web Instructional Videos
Luowei Zhou, Chenliang Xu, Jason J. Corso
Article20183 citationsABIVid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo +5
Article20233 citationsABIVideo Captioning Using Large Language Models
Priyanshu Malaviya, Dhruvit Patel, Santosh Kumar Bharti
Article20243 citationsABICIDEr: Consensus-based image description evaluation
Ramakrishna Vedantam, C. Lawrence Zitnick, Devi Parikh
Article20152 citationsABIVideo ReCap: Recursive Captioning of Hour-Long Videos
Md Mohaiminul Islam, Ngan Ho, Xitong Yang +3
Article20242 citationsABI