Skip to main content

Tiu MoLess than 1 minute

Awesome-Multimodal-Large-Language-Models https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

https://github.com/DAMO-NLP-SG/VideoLLaMA2

AskVideos-VideoCLIP https://github.com/AskYoutubeAI/AskVideos-VideoCLIP

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding https://github.com/OpenGVLab/InternVideo

INTERNVIDEO2: SCALING VIDEO FOUNDATION MODELS FOR MULTIMODAL VIDEO UNDERSTANDING https://github.com/OpenGVLab/InternVideo2

InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation https://github.com/OpenGVLab/InternVideo/tree/main/Data/InternVid

ViCLIP: a video-text representation learning model trained on InternVid https://github.com/OpenGVLab/InternVideo/tree/main/InternVideo1/Pretrain/ViCLIP