Research Interests

My research broadly lies at the intersection of vision and language. Specifically, I am interested in grounding language in images and videos which entails associating language phrases to visual concepts. Such visual-linguistic associations encompass objects, actions, their relations and are crucial to rich image and video understanding.

Papers

  • Video Object Grounding using Semantic Roles in Language Description [Paper] [Code]
    Arka Sadhu, Kan Chen, Ram Nevatia
    CVPR 2020
    arXiv:2003.10606 [cs.CV]

  • Zero-Shot Grounding of Objects from Natural Language Queries [Paper] [Code]
    Arka Sadhu, Kan Chen, Ram Nevatia
    ICCV 2019 (Oral)
    arXiv:1908.07129 [cs.CV]

Research Experience

(Not updated.)

Research Internships

  • Image Matching for Media Forensic application
    Viterbi Scholar, University of Southern California (May’17-Jul’17)
    Prof. Ram Nevatia
    • Media forensics in general involves detection of the tampered media, identification of the tampered portion as well as trying to recover the original media. This work mainly aims at detecting the base image given a probe image. Some additional experiments augmenting the base detection have also been carried out. Finally an attempt has been made towards extending the ideas to donor image as well.
      [Code] [Report]
  • Robust Loop Closures
    Research Assistant at Aalto University, Finland (May’16-Jul’16)
    Prof. Juho Kannala
    • Indoor environments were 3d-modeled using point cloud data from Google Tango. Loop closures using a new cost function were enforced to automatically refine and improve the geometry estimations. Also analyzed the role of switch variable to understand the contributions by different parts of the loss function.
      [Report]

Key Course Projects

  • Difference-Based Image Noise Modeling Using Skellam Distribution Course Project : Advanced Image Processing (March’17-April’17)
    Prof. Ajit Rajwade
  • Visible Light based Communication using LEDs Course Project : Electronic Design Lab (Jan’17 - Apr’17)
    Prof. Kumar Appaiah
    • In this project we used a simple led torch and photodiode receptor to transmit and receive using Visible light Communication. Successfully transferred 324 bits with 2 bit error over a distance of 50 cm. Applied Manchester coding and designed a Phase locked loop circuit to automatically detect the incoming frequency.
      [Code] [Report]
  • Document Scanner using Image Stitching Course Project : Digital Image Processing (Nov’16-Nov’16)
    Prof. Ajit Rajwade
    • Implemented the paper : Mobile Page Scanner. The project was aimed at image stitching of 2d documents to get a higher resolution picture. Algorithms like Homography Transformation, Multi Band Blending and Bundle Adjustment were written from scratch in python. We were able to achieve realistic stitching with very small artifacts.
      [Code] [Report]
  • Stereo Matching and Structure from Motion using Optical Flow Course Project : Computer Vision (Feb’17 - Mar’17)
    Prof. Subhasis Chaudhuri
    • Standard Algorithms like NCC, SAD, SSD, census transform for getting the disparity map and hence the 3D point cloud from stereo images were analyzed. Further used optical flow information using Horn-Schunk method to extract the Structure from Motion.
      [Code] [Report]
  • Multi-Cycle and Pipeline Implementation of IITB-RISC ISA Course Project : Microprocessors (Oct’16 - Nov’16)
    Prof. Virendra Singh
    • IITB-RISC is a small architecture for doing basic 16bit operations. We implemented a 29 stage multi cycle datapath and a 6 stage Pipeline on a FPGA and showed the results using the Signal Tap Analyzer.
      [Code] [Report]
  • Animation of Bicycle in a Room Course Project : Computer Graphics (Sep’16-Oct’16)
    Prof. Parag Chaudhuri
    • A bicycle is hierarchially modeled to constrain the degrees of motion. The room is modeled with light sources and the floor is texture mapped to give realistic effects. All code is written in OpenGL
      [Code]