Research Interests

My research broadly lies at the intersection of vision and language. Specifically, I am interested in grounding language in images and videos which entails associating language phrases to visual concepts. Such visual-linguistic associations encompass objects, actions, their relations and are crucial to rich image and video understanding.

Publications

Google Scholar Semantic Scholar

  1. Vision-Language Pre-training Generalization: From Image-Text Pairs to Diverse Vision-Text Tasks
    Arka Sadhu, Ram Nevatia
    Accepted to WACV’24
    Paper

  2. Unaligned Video-Text Pre-training using Iterative Alignment
    Arka Sadhu, Licheng Yu, Animesh Sinha, Yu Chen, Ram Nevatia, Ning Zhang

    Paper

  3. Gradient-based Memory Editing for Task-Free Continual Learning
    Xisen Jin, Arka Sadhu, Junyi Du, Xiang Ren
    Neurips 2021
    ArXiv Code

  4. Visual Semantic Role Labeling for Video Understanding
    Arka Sadhu, Tanmay Gupta, Mark Yatskar, Ram Nevatia, Aniruddha Kembhavi
    CVPR 2021
    ArXiv Code Website

  5. Video Question Answering with Phrases via Semantic Roles
    Arka Sadhu, Kan Chen, Ram Nevatia
    NAACL 2021
    ArXiv Code

  6. Improving Object Detection and Attribute Recognition by Feature Entanglement Reduction
    Zhaoheng Zheng, Arka Sadhu, Ram Nevatia
    ICIP 2021
    ArXiv

  7. Utilizing Every Image Object for Semi-supervised Phrase Grounding
    Haidong Zhu, Arka Sadhu, Zhaoheng Zheng, Ram Nevatia
    WACV 2021
    ArXiv

  8. Visually Grounded Continual Learning of Compositional Phrases
    Xisen Jin, Junyi Du, Arka Sadhu, Ram Nevatia, Xiang Ren
    EMNLP 2020
    ArXiv Code Website

  9. Video Object Grounding using Semantic Roles in Language Description
    Arka Sadhu, Kan Chen, Ram Nevatia
    CVPR 2020
    ArXiv Code

  10. Zero-Shot Grounding of Objects from Natural Language Queries
    Arka Sadhu, Kan Chen, Ram Nevatia
    ICCV 2019 (Oral)
    ArXiv Code