IIIT-H’s Computer Vision Lab In Focus

IIIT-H researchers from the Centre for Visual Information Technology made their mark in the international arena by presenting papers at 3 of the most distinguished conferences in the field of computer vision. Here’s a brief look at the conferences and some of the path-breaking work presented there.

One of the most prestigious events earmarked on the Computer Vision calendar is the British Machine Vision Conference (BMVC). It is the British Machine Vision Association’s (BMVA) annual conference on machine vision, image processing, and pattern recognition. According to its homepage, this year, BMVC had a record high of 862 high-quality research paper submissions from Europe, Asia and North America with an acceptance rate of 29.5%. Apart from international researchers, the conference also attracted a wide spectrum of international companies, such as Amazon, Microsoft, Nvidia, and Apple which sponsored the conference and demonstrated their products in the conference. There was a sizeable delegation from IIIT-H with six students from the Centre for Visual Information Technology (CVIT) participating in the conference and presenting their papers.

Learning Human Poses

4th year-Ph.D student, Aditya Arun who is working under Prof. C. V. Jawahar and co-supervisor Prof. M. Pawan Kumar, presented a paper on “Learning Human Poses From Actions”. By training a machine to accurately identify and detect human joints in images and videos, Aditya’s research aims to train a deep network for pose estimation using only low-cost action annotations instead of expensive-to-obtain joint locations. “Indeed, a simple keyword search for an action such as ‘running’ on an image search engine results in hundreds of freely available images. Plus, it offers useful ‘pose’ information. For example, the action ‘running’ eliminates the possibility of a human lying on the ground or sitting, thereby narrowing down the number of putative poses,” says Aditya. He goes on to explain how such an estimation of human poses can help in correctly modelling a person in a virtual world. Think avataars. It also has fitness applications where the model can guide persons whenever they deviate from the correct posture while following online exercises.

Intrinsic Image Decomposition

Saurabh Saini, who is a 6th year Ph.D. student under the guidance of Prof. P. J. Narayanan presented a paper on what he terms, “a classic and fundamental but not a mainstream problem in Computer Vision research called ‘Intrinsic Image Decomposition’ (IID)”. IID is useful in several computer vision and image editing applications like image colorization, shadow removal, image re-texturing, and scene relighting. Saurabh says his work introduced a novel prior and new method for IID. “As IID is an ill-posed and under-constrained problem, we have presented scene semantic information as a new prior in our paper. Additionally we built a hierarchical framework to use this prior information and presented a single algorithm for direct IID computation without requiring additional steps,” he explains. Speaking of the high quality of keynote speakers at the conference, comprising of prominent Computer Vision researchers, he says, “I especially liked the Symmetry Detection talk by Prof. Sven Dickinson from the University of Toronto”.

Textured 3D Reconstruction of Humans

Abbhinav Venkat’s paper dealt with the problem of obtaining a textured 3D reconstruction of a human from a single imageHe says that it has applications in the entertainment industry, e-commerce, health care (physiotherapy), mobile based AR/VR platforms, biomedical analysis and so on. According to him, “This is a severely ill-posed problem and is challenging due to issues such as self-occlusions caused by complex body poses and shapes, clothing obstructions, lack of surface texture, background clutter, limited views, sensor noise and so on”. Not only did his model overcome prohibitive costs associated with traditional approaches, but also demonstrated how it can partially handle non-rigid deformations induced by free form clothing, because there are no model constraints while training the volumetric reconstruction network. For Abbhinav, who was at the crossroads wondering if he should pursue a career in research or  industry, the conference not only gave him the right exposure but served as an affirmation “that we are no longer just a cog in the wheel, but, that we are changing the world, one experiment at a time.” He has also been invited to give a talk in November at the Max Planck Institute (MPI) for Informatics in Saarbrucken, Germany on the work presented at BMVC.

Nagender, Kalpit Thakkar and Suriya Singh were other students who presented their papers as well.

 

The European Conference on Computer Vision (ECCV), is a biennial research conference with the proceedings published by Springer Science+Business Media. According to Wikipedia, it is considered an important conference in computer vision, with an ‘A’ rating from the Australian Ranking of ICT Conferences and an ‘A1’ rating from the Brazilian Ministry of Education. It is similar to the International Conference on Computer Vision (ICCV) in scope and quality, and held in the years when ICCV is not.

Deep Networks

Ameya Prabhu, a dual-degree student who will soon go into his 5th year, presented his paper on “Deep Expander Networks: Efficient Deep Networks From Graph Theory”. Explaining his work very simplistically, Ameya says that neural networks are designed to mimic our brain. “However, they are far too lacking in comparison to our brain and simply too inefficient currently. For example, each neuron in a layer is connected to every other neuron in the next layer.” In his paper, the problem of “How to intelligently connect neurons”, making neural networks significantly more efficient has been explored. Ameya who has been attending such conferences since his first year as an undergraduate says, “This is my 7th publication, however this conference is one of the most prestigious in our field and I had an oral presentation (which is top 1% of all papers submitted) here, so this was a high point for me.” His trip was completely sponsored by Google.

Scene Refocusing

Parikshit Sakurikar, who continued to pursue a PhD after completing his BTech at IIIT-H presented a paper on “RefocusGAN – Scene Refocusing using a Single Image”. Explaining his work, Parikshit says that with the help of deep neural networks, they have made it possible for a user to change the focus of a photograph after it has been captured. “If you took a picture that was incorrectly focused on the background instead of the foreground (or maybe halfway between the two), then you can change this to your desired focus position,” says Parikshit. Currently the network has been trained on natural images of flowers and plants and works very well. With more training, it can be extended to human subjects and all other types of images. Speaking about his experiences at the conference, Parikshit says that specifically in the space of AI, industry and academia are growing by leaps and bounds. “Thus, being able to witness so many inventions within the umbrella of a single conference is quite fascinating.”

Another researcher, Rajvi Shah presented her work on “View graph selection framework for SfM”

The International Conference on 3D Vision (under the name 3DV) since 2013 is another prestigious event that has provided a premier platform for disseminating research results covering a broad variety of topics in the area of 3D research in computer vision and graphics, from novel optical sensors, signal processing, geometric modeling, representation and transmission, to visualization and interaction, and a variety of applications. Dual-degree student Ishit Mehta whose thesis advisor is Prof. P. J. Narayanan presented a paper on ‘Structured Adversarial Training For Unsupervised Monocular Depth Estimation’. Ishit says, “To give a high-level idea, the problem that we are trying to solve is estimating how far (depth) an object is in a given image. The problem is severely ill-posed since we want to give an extra “dimension” to a 2D image. Earlier methods use supervised techniques to solve this problem where a large number of images are captured with depth cameras to “train” a function to “predict” depth from an image.” To overcome several problems associated with depth cameras, such as high cost, Ishit’s work proposes an unsupervised method, which does not have access to a large number of image-depth pairs, drawing inspiration from human visual system. According to Ishit, “Humans have the remarkable ability to navigate in a scene with several obstacles even with one eye closed (single images). Our method uses a stereo-camera (two adjacent images) as a substitute for image-depth camera pairs”. For someone who says he collects stories like others collect coins, Ishit’s experience at the conference was very pleasant. He says it was an incredible opportunity to meet and interact with prominent figures and other graduate students from all over the world.