Research Interests
My research broadly lies at the intersection of vision and language. Specifically, I am interested in grounding language in images and videos which entails associating language phrases to visual concepts. Such visual-linguistic associations encompass objects, actions, their relations and are crucial to rich image and video understanding.
Papers
-
Video Object Grounding using Semantic Roles in Language Description [Paper] [Code]
Arka Sadhu, Kan Chen, Ram Nevatia
CVPR 2020
arXiv:2003.10606 [cs.CV] -
Zero-Shot Grounding of Objects from Natural Language Queries [Paper] [Code]
Arka Sadhu, Kan Chen, Ram Nevatia
ICCV 2019 (Oral)
arXiv:1908.07129 [cs.CV]
Research Experience
(Not updated.)
Research Internships
- Image Matching for Media Forensic application
Viterbi Scholar, University of Southern California (May’17-Jul’17)
Prof. Ram Nevatia
- Media forensics in general involves detection of the tampered media, identification of the tampered portion as well as trying to recover the original media. This work mainly aims at detecting the base image given a probe image. Some additional experiments augmenting the base detection have also been carried out. Finally an attempt has been made towards extending the ideas to donor image as well.
[Code] [Report]
- Media forensics in general involves detection of the tampered media, identification of the tampered portion as well as trying to recover the original media. This work mainly aims at detecting the base image given a probe image. Some additional experiments augmenting the base detection have also been carried out. Finally an attempt has been made towards extending the ideas to donor image as well.
- Robust Loop Closures
Research Assistant at Aalto University, Finland (May’16-Jul’16)
Prof. Juho Kannala
- Indoor environments were 3d-modeled using point cloud data from Google Tango. Loop closures using a new cost function were enforced to automatically refine and improve the geometry estimations. Also analyzed the role of switch variable to understand the contributions by different parts of the loss function.
[Report]
- Indoor environments were 3d-modeled using point cloud data from Google Tango. Loop closures using a new cost function were enforced to automatically refine and improve the geometry estimations. Also analyzed the role of switch variable to understand the contributions by different parts of the loss function.
Key Course Projects
- Difference-Based Image Noise Modeling Using Skellam Distribution
Course Project : Advanced Image Processing (March’17-April’17)
Prof. Ajit Rajwade- Implemented the paper Difference-Based Image Noise Modeling Using Skellam Distribution. Modeled a DSLR-camera using its noise characteristics and used the model to generate synthetic data which is used to estimated the skellam parameters. Further showed its application to edge detection and background subtraction.
[Code] [Report]
- Implemented the paper Difference-Based Image Noise Modeling Using Skellam Distribution. Modeled a DSLR-camera using its noise characteristics and used the model to generate synthetic data which is used to estimated the skellam parameters. Further showed its application to edge detection and background subtraction.
- Visible Light based Communication using LEDs
Course Project : Electronic Design Lab (Jan’17 - Apr’17)
Prof. Kumar Appaiah- In this project we used a simple led torch and photodiode receptor to transmit and receive using
Visible light Communication. Successfully transferred 324 bits with 2 bit error over a distance of 50
cm. Applied Manchester coding and designed a Phase locked loop circuit to automatically detect the
incoming frequency.
[Code] [Report]
- In this project we used a simple led torch and photodiode receptor to transmit and receive using
Visible light Communication. Successfully transferred 324 bits with 2 bit error over a distance of 50
cm. Applied Manchester coding and designed a Phase locked loop circuit to automatically detect the
incoming frequency.
- Document Scanner using Image Stitching
Course Project : Digital Image Processing (Nov’16-Nov’16)
Prof. Ajit Rajwade- Implemented the paper : Mobile Page Scanner. The project was aimed at image stitching of 2d
documents to get a higher resolution picture. Algorithms like Homography Transformation, Multi
Band Blending and Bundle Adjustment were written from scratch in python. We were able to achieve
realistic stitching with very small artifacts.
[Code] [Report]
- Implemented the paper : Mobile Page Scanner. The project was aimed at image stitching of 2d
documents to get a higher resolution picture. Algorithms like Homography Transformation, Multi
Band Blending and Bundle Adjustment were written from scratch in python. We were able to achieve
realistic stitching with very small artifacts.
- Stereo Matching and Structure from Motion using Optical Flow
Course Project : Computer Vision (Feb’17 - Mar’17)
Prof. Subhasis Chaudhuri - Multi-Cycle and Pipeline Implementation of IITB-RISC ISA
Course Project : Microprocessors (Oct’16 - Nov’16)
Prof. Virendra Singh - Animation of Bicycle in a Room
Course Project : Computer Graphics (Sep’16-Oct’16)
Prof. Parag Chaudhuri- A bicycle is hierarchially modeled to constrain the degrees of motion. The room is modeled with light
sources and the floor is texture mapped to give realistic effects. All code is written in OpenGL
[Code]
- A bicycle is hierarchially modeled to constrain the degrees of motion. The room is modeled with light
sources and the floor is texture mapped to give realistic effects. All code is written in OpenGL