Vision-Language Pre-training Generalization: From Image-Text Pairs to Diverse Vision-Text Tasks
Studies how task-specific vision-language pre-training can transfer from image-text supervision into broader image and video reasoning tasks.
Presented at WACV 2024.