Awesome-Multimodal-Large-Language-Models https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
https://github.com/DAMO-NLP-SG/VideoLLaMA2
AskVideos-VideoCLIP https://github.com/AskYoutubeAI/AskVideos-VideoCLIP
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding https://github.com/OpenGVLab/InternVideo
INTERNVIDEO2: SCALING VIDEO FOUNDATION MODELS FOR MULTIMODAL VIDEO UNDERSTANDING https://github.com/OpenGVLab/InternVideo2
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVid
ViCLIP: a video-text representation learning model trained on InternVid https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Pretrain/ViCLIP