Video

[x] Masked Autoencoders As Spatiotemporal Learners
[ ] Unmasked Teacher: Towards Training-Efficient Video Foundation Models
[ ] Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
[ ] Flamingo: a Visual Language Model for Few-Shot Learning
[ ] Visual Instruction Tuning
[ ] InternVideo: General Video Foundation Models via Generative and Discriminative Learning
[x] VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

VideoMAE: Masked Autoencoders are Data-Efficient Learners for...
[ ] BEVT: BERT Pretraining of Video Transformers -ing

[ ] FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models

[ ] Never Traina from Scratch : Fair Comparision of Long-Sequence Models requires data-driven priors