【論文まとめ】Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

【論文まとめ】Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

【論文まとめ】Multimodal Humor Dataset: Predicting Laughter tracks for Sitcoms

【論文まとめ】Multimodal Humor Dataset: Predicting Laughter tracks for Sitcoms