Parallel Backpropagation for Inverse of a Convolution with Application to Normalizing Flows
@inproceedings{bib_Para_2025, AUTHOR = {Sandeep Kumar, Girish Varma}, TITLE = {Parallel Backpropagation for Inverse of a Convolution with Application to Normalizing Flows}, BOOKTITLE = {International Conference on Artificial Intelligence and Statistics}. YEAR = {2025}}
Inverse of an invertible convolution is an important operation that comes up in Normalizing Flows, Image Deblurring, etc. The naive algorithm for backpropagation of this operation using Gaussian elimination has running time where is the number of pixels in the image. We give a fast parallel backpropagation algorithm with running time for a square image and provide a GPU implementation of the same.
Inverse Convolutions are usually used in Normalizing Flows in the sampling pass, making them slow. We propose to use Inverse Convolutions in the forward (image to latent vector) pass of the Normalizing flow. Since the sampling pass is the inverse of the forward pass, it will use convolutions only, resulting in efficient sampling times. We use our parallel backpropagation algorithm for optimizing the inverse convolution layer resulting in fast training times also. We implement this approach in various Normalizing Flow backbones, resulting in our Inverse-Flow models. We benchmark Inverse-Flow on standard datasets and show significantly improved sampling times with similar bits per dimension compared to previous models.
@inproceedings{bib_Geno_2024, AUTHOR = {Girish Varma}, TITLE = {Genome-wide Association Study of Foveal Hypoplasia}, BOOKTITLE = {Investigative Ophthalmology and Visual Scienceopen access}. YEAR = {2024}}
Purpose : Foveal Hypoplasia (FH) is a macular abnormality characterised by incomplete development of the foveal region. This abnormality is most commonly observed alongside rare inherited disorders such as albinism, but the precise mechanisms behind abnormal foveal development remain unclear. Thus, we aimed to identify genetic loci associated with FH, to provide insight into the genetic architecture of normal and abnormal foveal development.
Methods : We developed a deep learning model to analyse retinal OCT scans from a subset of UK Biobank participants, classifying individuals of European descent into FH (n=4403) and control groups (n=30069). FH classification was then used to conduct a genome-wide association study (GWAS) using these 34,472 individuals, and associations were mapped to causative genes using single nuclei multi-omics data obtained from embryonic and foetal retinal tissue. Further prioritisation of candidate genes was sought by performing CRISPR-Cas9 mediated knockout of target genes in Zebrafish. Genes were prioritised based on the impact of gene knockouts on visual function in Zebrafish, assessed using the optokinetic response.
Results : Our GWAS identified 44 genetic variants independently associated with FH (P<5×10-8), including 31 novel variants not previously linked to FH or related traits. These novel associations include variants in genes such as TYR and OCA2, known for their roles in FH in a Mendelian context, and 28 novel genes not previously associated with FH. The identified genes operate in various biological processes such as the melanin biosynthesis pathway, melanosome function, cell fate and temporal patterning. This aligns with the functional characteristics of causative FH genes.
Conclusions : In the first GWAS study of FH, we uncovered novel genetic associations critical to understanding foveal development. The identified associations offer new avenues for research, providing mechanistic insight into the genetic factors underpinning the foveal region's development and the pathogenesis of FH.
International Multi-Centre Validation of Unsupervised Domain Adaptation for Precise Discrimination between Normal and Abnormal Retinal Development
Zhanhan Tu,Prateek Pani,Nikhil Reddy Billa,Helen Kuht,Ha-Jun Yoon,Gail DE Maconachie,Girish Varma,Mervyn Thomas
@inproceedings{bib_Inte_2024, AUTHOR = {Zhanhan Tu, Prateek Pani, Nikhil Reddy Billa, Helen Kuht, Ha-Jun Yoon, Gail DE Maconachie, Girish Varma, Mervyn Thomas}, TITLE = {International Multi-Centre Validation of Unsupervised Domain Adaptation for Precise Discrimination between Normal and Abnormal Retinal Development}, BOOKTITLE = {Investigative Ophthalmology & Visual Science}. YEAR = {2024}}
Purpose : This study aims to develop an artificial intelligence (AI)-based system for accurately grading arrested retinal development using optical coherence tomography (OCT) across various manufacturers. We employ deep learning techniques, specifically Unsupervised Domain Adaptation (UDA), to create a device-agnostic classification model distinguishing normal and abnormal retinal development.
Methods : Foveal scans from three OCT manufacturers (TM-OCT1, HH-OCT1, TM-OCT2, TM-OCT3) were collected and annotated from datasets exceeding 20,000 OCT scans. The dataset was divided into training (80%) and testing (20%) sets. We utilised Convolutional Neural Networks (CNN) with ResNet50 backbone, assessing each pair as the source (for supervised training) and target (for unsupervised training). The diagnostic accuracy of the AI models was compared to seven clinician graders with varying experience (1 to 10 years), evaluating sensitivity, specificity, and overall accuracy.
Results : The cross-domain binary classification demonstrated exceptional diagnostic accuracy ranging from 87.75% to 96.06%. Sensitivity and specificity metrics further validated the robustness of our AI system (sensitivity: 88.68% to 98.38%, specificity: 78.00% to 98.02%). Notably, the model trained on TM-OCT1 and HH-OCT1 achieved 96.06% accuracy, 98.38% sensitivity, and 89.11% specificity on the HH-OCT1 test set, with an Area Under the Curve (AUC) of 99.4 (95% CI). These outcomes highlight the system's ability to distinguish normal and abnormal retinal development across diverse OCT devices.
Conclusions : Our study demonstrates, for the first time, the feasibility of employing a device-agnostic AI system in paediatric OCT interpretation without additional labelled data. The AI system's diagnostic performance is comparable to a clinician with over 10 years of experience, showcasing its potential to reduce inter-examiner variability and enhance clinical care pathways. With integration of paediatric OCT into routine clinical assessment, our AI system serves as a robust foundation for the development of a real-time, frontline diagnostic tool for retinal developmental disorders.
IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather
Shaik Furqan Ahmed,Abhishek Reddy Malreddy,Nikhil Reddy Billa,Sunny Manchanda,Kunal Chaudhary,Girish Varma
@inproceedings{bib_IDD-_2024, AUTHOR = {Shaik Furqan Ahmed, Abhishek Reddy Malreddy, Nikhil Reddy Billa, Sunny Manchanda, Kunal Chaudhary, Girish Varma}, TITLE = {IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2024}}
Large-scale deployment of fully autonomous vehicles requires a very high degree of robustness to unstructured traffic, weather conditions, and should prevent unsafe mispredictions. While there are several datasets and benchmarks focusing on segmentation for drive scenes, they are not specifically focused on safety and robustness issues. We introduce the IDD-AW dataset, which provides 5000 pairs of high-quality images with pixel-level annotations, captured under rain, fog, low light, and snow in unstructured driving conditions. As compared to other adverse weather datasets, we provide i.) more annotated images, ii.) paired Near-Infrared (NIR) image for each frame, iii.) larger label set with a 4-level label hierarchy to capture unstructured traffic conditions. We benchmark state-of-the-art models for semantic segmentation in IDD-AW. We also propose a new metric called “Safe mean Intersection over Union (Safe mIoU)” for hierarchical datasets which penalizes dangerous mispredictions that are not captured in the traditional definition of mean Intersection over Union (mIoU). The results show that IDD-AW is one of the most challenging datasets to date for these tasks. The dataset and code will be available here: http://iddaw.github.io.
DeepSPInN - Deep reinforcement learning for molecular Structure Prediction from Infrared and 13C NMR spectra
Devata Sriram,S Bhuvanesh,Sarvesh Mehta,Yashaswi Pathak,Siddhartha Laghuvarapu,Girish Varma,Deva Priyakumar U
@inproceedings{bib_Deep_2024, AUTHOR = {Devata Sriram, S Bhuvanesh, Sarvesh Mehta, Yashaswi Pathak, Siddhartha Laghuvarapu, Girish Varma, Deva Priyakumar U}, TITLE = {DeepSPInN - Deep reinforcement learning for molecular Structure Prediction from Infrared and 13C NMR spectra}, BOOKTITLE = {Digital Discovery}. YEAR = {2024}}
Molecular spectroscopy studies the interaction of molecules with electromagnetic radiation, and interpreting the resultant spectra is invaluable for deducing the molecular structures. However, predicting the molecular structure from spectroscopic data is a strenuous task that requires highly specific domain knowledge. DeepSPInN is a deep reinforcement learning method that predicts the molecular structure when given Infrared and 13C Nuclear magnetic resonance spectra by formulating the molecular structure prediction problem as a Markov decision process (MDP) and employs Monte-Carlo tree search to explore and choose the actions in the formulated MDP. On the QM9 dataset, DeepSPInN is able to predict the correct molecular structure for 91.5% of the input spectra in an average time of 77 seconds for molecules with less than 10 heavy atoms. This study is the first of its kind that uses only infrared and 13C nuclear magnetic resonance spectra for molecular structure prediction without referring to any pre-existing spectral databases or molecular fragment knowledge bases, and is a leap forward in automated molecular spectral analysis.
Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines
Karthik Viswanathan,Manan Goel,Siddhartha Laghuvarapu,Girish Varma,Deva Priyakumar U
@inproceedings{bib_Stre_2023, AUTHOR = {Karthik Viswanathan, Manan Goel, Siddhartha Laghuvarapu, Girish Varma, Deva Priyakumar U}, TITLE = {Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines}, BOOKTITLE = {NPG Nature Scientific Reports}. YEAR = {2023}}
The discovery of potential therapeutic agents for life‑threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug‑like molecules that can be used as potential candidates for novel targets. Existing techniques like high‑throughput screening and virtual screening are time‑consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time‑consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre‑trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning‑based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with ∼a 70% improvement in runtime compared to the baseline model by labeling only ∼30% of molecules compared to the baseline oracle.
City-scale Pollution Aware Traffic Routing by Sampling Multiple Max Flows using MCMC
@inproceedings{bib_City_2023, AUTHOR = {S Shreevignesh, Praveen Paruchuri, Girish Varma}, TITLE = {City-scale Pollution Aware Traffic Routing by Sampling Multiple Max Flows using MCMC}, BOOKTITLE = {International Conferene on Intelligent Transportation Systems}. YEAR = {2023}}
Air pollution is a growing concern across the world. Road traffic is one of the major contributors to air pollution in urban areas. One of the approaches to solve this problem is to design a transportation policy that i) avoids extreme pollution in any area, ii) enables short transit times, and iii) makes effective use of the road capacities. Previous work to address this problem named MaxFlow-MCMC algorithm, proposed a novel sampling-based approach for this problem. In this work, we propose a significantly faster extension to the algorithm without compromising on the performance involving the following contributions: (a) We provide the first construction of a Markov Chain to sample a set of k-optimal max flow solutions directly from a planar graph. (b) We simulate traffic on large-scale real-world roadmaps using the SUMO traffic simulator. We observe a significant speed improvement in the range of 22 to 242 times in our experiments while obtaining lesser average pollution.
@inproceedings{bib_Towa_2023, AUTHOR = {Ashutosh Mishra, Shyam Nandan Rai, Girish Varma, Jawahar C V}, TITLE = {Towards Efficient Semantic Segmentation via Meta Pruning}, BOOKTITLE = {International Conference on Computer vision and Image Processing}. YEAR = {2023}}
Semantic segmentation provides a pixel-level understanding of an image essential for various scene-understanding vision tasks. However, semantic segmentation models demand significant computational resources during training and inference. These requirements pose a challenge in resource-constraint scenarios. To address this issue, we present a compression algorithm based on differentiable meta-pruning through hypernetwork: MPHyp. Our proposed method MPHyp utilizes hypernetworks that take latent vectors as input and output weight matrices for the segmentation model. L1 sparsification follows the proximal gradient optimizer, updates the latent vectors and introduces sparsity leading to automatic model pruning. The proposed method offers the benefit of achieving controllable compression during the training and significantly reducing the training time. We compare our methodology with a popular pruning approach and demonstrate its efficacy by reducing the number of parameters and floating point operations while maintaining the mean Intersection over Union (mIoU) metric. We conduct experiments on two widely accepted semantic segmentation architectures: UNet and ERFNet. Our experiments and ablation study demonstrate the effectiveness of our proposed methodology by achieving efficient and reasonable segmentation results.
IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather
Shaik Furqan Ahmed,Abhishek Reddy Malreddy,Nikhil Reddy Billa,Kunal Chaudhary,Sunny Manchanda,Girish Varma,Shaik Furqan Ahmed
@inproceedings{bib_IDD-_2023, AUTHOR = {Shaik Furqan Ahmed, Abhishek Reddy Malreddy, Nikhil Reddy Billa, Kunal Chaudhary, Sunny Manchanda, Girish Varma, Shaik Furqan Ahmed}, TITLE = {IDD-AW: A Benchmark for Safe and Robust Segmentation of Drive Scenes in Unstructured Traffic and Adverse Weather}, BOOKTITLE = {Technical Report}. YEAR = {2023}}
Large-scale deployment of fully autonomous vehicles re- quires a very high degree of robustness to unstructured traf- fic, weather conditions, and should prevent unsafe mispre- dictions. While there are several datasets and benchmarks focusing on segmentation for drive scenes, they are not specifically focused on safety and robustness issues. We introduce the IDD-AW dataset, which provides 5000 pairs of high-quality images with pixel-level annotations, cap- tured under rain, fog, low light, and snow in unstructured driving conditions. As compared to other adverse weather datasets, we provide i.) more annotated images, ii.) paired Near-Infrared (NIR) image for each frame, iii.) larger label set with a 4-level label hierarchy to capture unstructured traffic conditions. We benchmark state-of-the-art models for semantic segmentation in IDD-AW. We also propose a new metric called “Safe mean Intersection over Union (Safe mIoU)” for hierarchical datasets which penalizes danger- ous mispredictions that are not captured in the traditional definition of mean Intersection over Union (mIoU). The re- sults show that IDD-AW is one of the most challenging datasets to date for these tasks. The dataset and code will be available here: http://iddaw.github.io
Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines
Viswanath Kasturi,Manan Goel,Siddhartha Laghuvarapu,Girish Varma,Deva Priyakumar U
@inproceedings{bib_Stre_2023, AUTHOR = {Viswanath Kasturi, Manan Goel, Siddhartha Laghuvarapu, Girish Varma, Deva Priyakumar U}, TITLE = {Streamlining pipeline efficiency: a novel model-agnostic technique for accelerating conditional generative and virtual screening pipelines}, BOOKTITLE = {Scientific Reports}. YEAR = {2023}}
The discovery of potential therapeutic agents for life-threatening diseases has become a significant problem. There is a requirement for fast and accurate methods to identify drug-like molecules that can be used as potential candidates for novel targets. Existing techniques like high-throughput screening and virtual screening are time-consuming and inefficient. Traditional molecule generation pipelines are more efficient than virtual screening but use time-consuming docking software. Such docking functions can be emulated using Machine Learning models with comparable accuracy and faster execution times. However, we find that when pre-trained machine learning models are employed in generative pipelines as oracles, they suffer from model degradation in areas where data is scarce. In this study, we propose an active learning-based model that can be added as a supplement to enhanced molecule generation architectures. The proposed method uses uncertainty sampling on the molecules created by the generator model and dynamically learns as the generator samples molecules from different regions of the chemical space. The proposed framework can generate molecules with high binding affinity with ∼ a 70% improvement in runtime compared to the baseline model by labeling only ∼ 30% of molecules compared to the baseline oracle.
City-Scale Pollution Aware Traffic Routing by Sampling Max Flows Using MCMC
S.shreevignesh,Praveen Paruchuri,Girish Varma
Association for the Advancement of Artificial Intelligence, AAAI, 2023
@inproceedings{bib_City_2023, AUTHOR = {S.shreevignesh, Praveen Paruchuri, Girish Varma}, TITLE = {City-Scale Pollution Aware Traffic Routing by Sampling Max Flows Using MCMC}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence}. YEAR = {2023}}
A significant cause of air pollution in urban areas worldwide is the high volume of road traffic. Long-term exposure to severe pollution can cause serious health issues. One approach towards tackling this problem is to design a pollution-aware traffic routing policy that balances multiple objectives of i) avoiding extreme pollution in any area ii) enabling short transit times, and iii) making effective use of the road capacities. We propose a novel sampling-based approach for this problem. We provide the first construction of a Markov Chain that can sample integer max flow solutions of a planar graph, with theoretical guarantees that the probabilities depend on the aggregate transit length. We designed a traffic policy using diverse samples and simulated traffic on real-world road maps using the SUMO traffic simulator. We observe a considerable decrease in areas with severe pollution when experimented with maps of large cities across the world compared to other approaches.
Ramanujan bipartite graph products for efficient block sparse neural networks
V DHARMA TEJA,Girish Varma,Kishore Kothapalli
Concurrency and Computation: Practice and Experience, CCPE, 2023
@inproceedings{bib_Rama_2023, AUTHOR = {V DHARMA TEJA, Girish Varma, Kishore Kothapalli}, TITLE = {Ramanujan bipartite graph products for efficient block sparse neural networks}, BOOKTITLE = {Concurrency and Computation: Practice and Experience}. YEAR = {2023}}
Sparse neural networks are shown to give accurate predictions competitive to denser versions, while also minimizing the number of arithmetic operations performed. However current GPU hardware can only exploit structured sparsity patterns for better efficiency. We propose a framework for generating structured multilevel block sparse neural networks by using the theory of graph products. Our Ramanujan bipartite graph product (RBGP) framework uses products of Ramanujan graphs to obtain the best connectivity for a given level of sparsity. This essentially ensures that the i.) the networks has the structured block sparsity for which runtime efficient algorithms exists, ii.) the model gives high prediction accuracy, due to the better expressive power derived from the connectivity of the graph, iii.) the graph data structure has a succinct representation that can be stored efficiently in memory. We use our framework to design a specific connectivity pattern called RBGP4 which makes efficient use of the memory hierarchy available on GPU. We benchmark our approach on image classification and machine translation tasks with an edge (Jetson Nano 2GB) as well as server (V100) GPUs. When compared with commonly used sparsity patterns like unstructured and block, we obtain significant speedups while achieving the same level of accuracy.
FInC Flow: Fast and Invertible k×k Convolutions for Normalizing Flows
Aditya V Kallappa,Sandeep Kumar,Girish Varma
International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applicat, VISIGRAPP, 2023
@inproceedings{bib_FInC_2023, AUTHOR = {Aditya V Kallappa, Sandeep Kumar, Girish Varma}, TITLE = {FInC Flow: Fast and Invertible k×k Convolutions for Normalizing Flows}, BOOKTITLE = {International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applicat}. YEAR = {2023}}
Invertible convolutions have been an essential element for building expressive normalizing flow-based generative models since their introduction in Glow. Several attempts have been made to design invertible k × k convolutions that are efficient in training and sampling passes. Though these attempts have improved the expressivity and sampling efficiency, they severely lagged behind Glow which used only 1×1 convolutions in terms of sampling time. Also, many of the approaches mask a large number of parameters of the underlying convolution, resulting in lower expressivity on a fixed run-time budget. We propose a k × k convolutional layer and Deep Normalizing Flow architecture which i.) has a fast parallel inversion algorithm with running time O(nk2 ) (n is height and width of the input image and k is kernel size), ii.) masks the minimal amount of learnable parameters in a layer. iii.) gives better forward pass and sampling times comparable to other k ×k convolution-based models on real-world benchmarks. We provide an implementation of the proposed parallel algorithm for sampling using our invertible convolutions on GPUs. Benchmarks on CIFAR-10, ImageNet, and CelebA datasets show comparable performance to previous works regarding bits per dimension while significantly improving the sampling time.
Accelerating Computer Vision Tasks on GPUs using Ramanujan Graph Product Framework
Shaik Furqan Ahmed,Konduru Thejasvi,Girish Varma,Kishore Kothapalli
Joint International Conference on Data Science & Management of Data, CODS-COMAD, 2023
@inproceedings{bib_Acce_2023, AUTHOR = {Shaik Furqan Ahmed, Konduru Thejasvi, Girish Varma, Kishore Kothapalli}, TITLE = {Accelerating Computer Vision Tasks on GPUs using Ramanujan Graph Product Framework}, BOOKTITLE = {Joint International Conference on Data Science & Management of Data}. YEAR = {2023}}
Sparse neural networks have been proven to generate efficient and better runtimes when compared to dense neural networks. Accelera- tion in runtime is better achieved with structured sparsity. However, generating an efficient sparsity structure to maintain both runtime and accuracy is a challenging task. In this paper, we implement the RBGP4 sparsity pattern derived from the Ramanujan Bipartite Graph Product (RBGP) framework on various Computer Vision tasks and test how well it performs w.r.t accuracy and runtime. Us- ing this approach, we generate structured sparse neural networks which has multiple levels of block sparsity that generates good connectivity due to the presence of Ramanujan bipartite graphs. We benchmark our approach on Semantic Segmentation and Pose Estimation tasks on an edge device (Jetson Nano 2GB) as well as server (V100) GPUs. We compare the results obtained for RBGP4 sparsity pattern with the unstructured and block sparsity patterns. When compared to sparsity patterns like unstructured and block, we obtained significant speedups while maintaining accuracy. KEYWORDS Sparse Neural Networks, Ramanujan Graphs, Semantic Segmenta- tion, Pose Detection ACM Reference Format: Thejasvi Konduru, Furqan Ahmed Shaik, Girish Varma, and Kishore Kotha- palli. 2023. Accelerating Computer Vision Tasks on GPUs using Ramanujan Graph Product Framework. In 6th Joint International Conference on Data Science & Management of Data (10th ACM IKDD CODS and 28th COMAD) (CODS-COMAD 2023), January 4–7, 2023, Mumbai, India. ACM, New York, NY, USA, 5 pages. https://doi.org/10.1145/3570991.3571044
Variability of outer retinal hyper-reflective bands detected using optical coherence tomography
Ayesha Girach,Helen Kuht,Jinu Han,Garima Nishad,Prateek Pani,Yu-Dong Zhang,Zhanhan Tu,Girish Varma,Mervyn Thomas
Investigative Ophthalmology & Visual Science, IOVS, 2022
Abs | | bib Tex
@inproceedings{bib_Vari_2022, AUTHOR = {Ayesha Girach, Helen Kuht, Jinu Han, Garima Nishad, Prateek Pani, Yu-Dong Zhang, Zhanhan Tu, Girish Varma, Mervyn Thomas}, TITLE = {Variability of outer retinal hyper-reflective bands detected using optical coherence tomography}, BOOKTITLE = {Investigative Ophthalmology & Visual Science}. YEAR = {2022}}
Melanin is a major contributor to the intensity and width of the retinal pigment epithelium (RPE) on optical coherence tomography (OCT). It is hypothesized, high melanin concentrations cause scattering making the RPE appear as a thick band and obscures adjacent layers. A split band appearance of the outermost hyperreflective band has been demonstrated in Albinism, meaning the RPE and Bruch’s Membrane appear as two separate bands. We performed a cross-sectional observational study to test the hypothesis that a reduction in melanin causes an increased ability to differentiate the hyperreflective bands seen on OCT images of the outer retina.
Variability of outer retinal hyper-reflective bands detected using optical coherence tomography
Ayesha Girach,Helen Kuht,Jinu Han,Garima Nishad,Prateek Pani,Yu-Dong Zhang,Zhanhan Tu,Girish Varma,Mervyn Thomas
Investigative Ophthalmology & Visual Science, IOVS, 2022
Abs | | bib Tex
@inproceedings{bib_Vari_2022, AUTHOR = {Ayesha Girach, Helen Kuht, Jinu Han, Garima Nishad, Prateek Pani, Yu-Dong Zhang, Zhanhan Tu, Girish Varma, Mervyn Thomas}, TITLE = {Variability of outer retinal hyper-reflective bands detected using optical coherence tomography}, BOOKTITLE = {Investigative Ophthalmology & Visual Science}. YEAR = {2022}}
urpose: Melanin is a major contributor to the intensity and width of the retinal pigment epithelium (RPE) on optical coherence tomography (OCT). It is hypothesized, high melanin concentrations cause scattering making the RPE appear as a thick band and obscures adjacent layers. A split band appearance of the outermost hyperreflective band has been demonstrated in Albinism, meaning the RPE and Bruch’s Membrane appear as two separate bands. We performed a cross-sectional observational study to test the hypothesis that a reduction in melanin causes an increased ability to differentiate the hyperreflective bands seen on OCT images of the outer retina. Methods: Using the UK Biobank dataset, we generated a randomised sample of 300 participants with specific inclusion and exclusion criteria. Inclusion criteria was an age between 40 to 65, visual acuity better than 0.60 logMAR, no ocular pathology, and
CInC Flow: Characterizable Invertible 3x3 Convolution
Sandeep Kumar,Marius Dufraisse,Girish Varma
Technical Report, arXiv, 2021
@inproceedings{bib_CInC_2021, AUTHOR = {Sandeep Kumar, Marius Dufraisse, Girish Varma}, TITLE = {CInC Flow: Characterizable Invertible 3x3 Convolution}, BOOKTITLE = {Technical Report}. YEAR = {2021}}
Normalizing flows are an essential alternative to GANs for generative modelling, which can be optimized directly on the maximum likelihood of the dataset. They also allow computation of the exact latent vector corresponding to an image since they are composed of invertible transformations. However, the requirement of invertibility of the transformation prevents standard and expressive neural network models such as CNNs from being directly used. Emergent convolutions were proposed to construct an invertible 3×3 CNN layer using a pair of masked CNN layers, making them inefficient. We study conditions such that 3×3 CNNs are invertible, allowing them to construct expressive normalizing flows. We derive necessary and sufficient conditions on a padded CNN for it to be invertible. Our conditions for invertibility are simple, can easily be maintained during the training process. Since we require only a single CNN layer for every effective invertible CNN layer, our approach is more efficient than emerging convolutions. We also proposed a coupling method, Quad-coupling. We benchmark our approach and show similar performance results to emergent convolutions while improving the model's efficiency
Automated Seed Quality Testing System using GAN & Active Learning
Sandeep Kumar,Prateek Pani,Raj Nair,Girish Varma
Technical Report, arXiv, 2021
@inproceedings{bib_Auto_2021, AUTHOR = {Sandeep Kumar, Prateek Pani, Raj Nair, Girish Varma}, TITLE = {Automated Seed Quality Testing System using GAN & Active Learning}, BOOKTITLE = {Technical Report}. YEAR = {2021}}
Quality assessment of agricultural produce is a crucial step in minimizing food stock wastage. However, this is currently done manually and often requires expert supervision, especially in smaller seeds like corn. We propose a novel computer vision-based system for automating this process. We build a novel seed image acquisition setup, which captures both the top and bottom views. Dataset collection for this problem has challenges of data annotation costs/time and class imbalance. We address these challenges by i.) using a Conditional Generative Adversarial Network (CGAN) to generate real-looking images for the classes with lesser images and ii.) annotate a large dataset with minimal expert human intervention by using a Batch Active Learning (BAL) based annotation tool. We benchmark different image classification models on the dataset obtained. We are able to get accuracies of up to 91.6% for testing the physical purity of seed samples.
Generalized Parametric Path Problems
Kshitij Gajjar,Girish Varma,Prerona Chatterjee,Jaikumar Radhakrishnan
Uncertainty in Artificial Intelligence, UAI, 2021
@inproceedings{bib_Gene_2021, AUTHOR = {Kshitij Gajjar, Girish Varma, Prerona Chatterjee, Jaikumar Radhakrishnan}, TITLE = {Generalized Parametric Path Problems}, BOOKTITLE = {Uncertainty in Artificial Intelligence}. YEAR = {2021}}
Parametric path problems arise independently in diverse domains, ranging from transportation to finance, where they are studied under various assumptions. We formulate a general path problem with relaxed assumptions, and describe how this formulation is applicable in these domains. We study the complexity of the general problem, and a variant of it where preprocessing is allowed. We show that when the parametric weights are linear functions, algorithms remain tractable even under our relaxed assumptions. Furthermore, we show that if the weights are allowed to be non-linear, the problem becomes NP-hard. We also study the mutli-dimensional version of the problem where the weight functions are parameterized by multiple parameters. We show that even with two parameters, the problem is NP-hard.
A Characterization of Hard-to-Cover CSPs
Amey Bhangale,Prahladh Harsha,Girish Varma
Theory of Computing, TOC, 2020
@inproceedings{bib_A_Ch_2020, AUTHOR = {Amey Bhangale, Prahladh Harsha, Girish Varma}, TITLE = {A Characterization of Hard-to-Cover CSPs}, BOOKTITLE = {Theory of Computing}. YEAR = {2020}}
We continue the study of the covering complexity of constraint satisfaction problems (CSPs) initiated by Guruswami, Håstad and Sudan [SIAM J. Comp. 2002] and Dinur and Kol [CCC’13]. The covering number of a CSP instance Φ is the smallest number of assignments to the variables of Φ, such that each constraint of Φ is satisfied by at least one of the assignments. We show the following results: 1. Assuming a covering variant of the Unique Games Conjecture, introduced by Dinur and Kol, we show that for every non-odd predicate P over any constant-size alphabet and every integer K, it is NP-hard to approximate the covering number within a factor of K. This yields a complete characterization of CSPs over constant-size alphabets that are hard to cover.
Deep Learning Enabled Inorganic Material Generator
Yashaswi Pathak,Karandeep Singh Juneja,Girish Varma,Masahiro Ehara,Deva Priyakumar U
Physical Chemistry Chemical Physics, PCCP, 2020
@inproceedings{bib_Deep_2020, AUTHOR = {Yashaswi Pathak, Karandeep Singh Juneja, Girish Varma, Masahiro Ehara, Deva Priyakumar U}, TITLE = {Deep Learning Enabled Inorganic Material Generator}, BOOKTITLE = {Physical Chemistry Chemical Physics}. YEAR = {2020}}
Recent years have witnessed utilization of modern machine learning approaches for predicting properties of material using available datasets. However, to identify potential candidates for material discovery, one has to systematically scan through a large chemical space and subsequently calculate the properties of all such samples. On the other hand, generative methods are capable of efficiently sampling the chemical space and can generate molecules/materials with desired properties. In this study, we report a deep learning based inorganic material generator (DING) framework consisting of a generator module and a predictor module. The generator module is developed based upon conditional variational autoencoders (CVAE) and the predictor module consists of three deep neural networks trained for predicting enthalpy of formation, volume per atom and energy per atom chosen to demonstrate the proposed method. The predictor and generator modules have been developed using a one hot key representation of the material composition. A series of tests were done to examine the robustness of the predictor models, to demonstrate the continuity of the latent material space, and its ability to generate materials exhibiting target property values. The DING architecture proposed in this paper can be extended to other properties based on which the chemical space can be efficiently explored for interesting materials/molecules.
Dynamic Block Sparse Reparameterization of Convolutional Neural Networks
V DHARMA TEJA,Girish Varma,Kishore Kothapalli
International Conference on Computer Vision Workshops, ICCV-W, 2019
@inproceedings{bib_Dyna_2019, AUTHOR = {V DHARMA TEJA, Girish Varma, Kishore Kothapalli}, TITLE = {Dynamic Block Sparse Reparameterization of Convolutional Neural Networks}, BOOKTITLE = {International Conference on Computer Vision Workshops}. YEAR = {2019}}
Sparse neural networks are efficient in both memory and compute when compared to dense neural networks. But on parallel hardware such as GPU, sparse neural networks result in small or no runtime performance gains. On the other hand, structured sparsity patterns like filter, channel and block sparsity result in large performance gains due to regularity induced by structure. Among structured sparsities, block sparsity is a generic structured sparsity pattern with filter and channel sparsity being sub cases of block sparsity. In this work, we focus on block sparsity and generate efficient block sparse convolutional neural networks using our approach DBSR (Dynamic block sparse reparameterization). Our DBSR approach, when applied on image classification task over Imagenet dataset, decreases parameters and FLOPS of ResneXt50 by a factor of 2x with only increase of 0.48 in Top-1 error. And when extended to the task of semantic segmentation, our approach reduces parameters and FLOPS by 30% and 20% respectively with only 1% decrease in mIoU for ERFNet over Cityscapes dataset
Universal semi-supervised semantic segmentation
Tarun Kalluri,Girish Varma,Manmohan Chandraker,Jawahar C V
International Conference on Computer Vision, ICCV, 2019
@inproceedings{bib_Univ_2019, AUTHOR = {Tarun Kalluri, Girish Varma, Manmohan Chandraker, Jawahar C V}, TITLE = {Universal semi-supervised semantic segmentation}, BOOKTITLE = {International Conference on Computer Vision}. YEAR = {2019}}
In recent years, the need for semantic segmentation has arisen across several different applications and environments. However, the expense and redundancy of annotation often limits the quantity of labels available for training in any domain, while deployment is easier if a single model works well across domains. In this paper, we pose the novel problem of universal semi-supervised semantic segmentation and propose a solution framework, to meet the dual needs of lower annotation and deployment costs. In contrast to counterpoints such as fine tuning, joint training or unsupervised domain adaptation, universal semi-supervised segmentation ensures that across all domains: (i) a single model is deployed, (ii) unlabeled data is used, (iii) performance is improved, (iv) only a few labels are needed and (v) label spaces may differ. To address this, we minimizesupervised as well as within and cross-domain unsupervised losses, introducing a novel feature alignment objective based on pixel-aware entropy regularization for the latter. We demonstrate quantitative advantages over other approaches on several combinations of segmentation datasets across different geographies (Germany, England, India) and environments (outdoors, indoors), as well as qualitative insights on the aligned representations.
Semantic Segmentation Datasets for Resource Constrained Training
Ashutosh Mishra,Sudhir Kumar,Tarun Kalluri,Girish Varma,Anbumani Subramaian,Manmohan Chandraker,Jawahar C V
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2019
@inproceedings{bib_Sema_2019, AUTHOR = {Ashutosh Mishra, Sudhir Kumar, Tarun Kalluri, Girish Varma, Anbumani Subramaian, Manmohan Chandraker, Jawahar C V}, TITLE = {Semantic Segmentation Datasets for Resource Constrained Training}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2019}}
Semantic Segmentation Datasets for ResourceConstrained TrainingAshutosh Mishra1?, Sudhir Kumar1,2?, Tarun Kalluri1,3?, Girish Varma1,Anbumani Subramaian4, Manmohan Chandraker3, and CV Jawahar11IIIT Hyderabad2University at Buffalo, State University of New York3University of California, San Diego4Intel BangaloreAbstract.Several large scale datasets, coupled with advances in deepneural network architectures have been greatly successful in pushingthe boundaries of performance in semantic segmentation in recent years.However, the scale and magnitude of such datasets prohibits ubiquitoususe and widespread adoption of such models, especially in settings withserious hardware and software resource constraints. Through this work,we propose two simple variants of the recently proposed IDD dataset,namelyIDD-miniandIDD-lite, for scene understanding in unstructuredenvironments. Our main objective is to enable research and benchmark-ing in training segmentation models. We believe that this will enablequick prototyping useful in applications like optimum parameter andarchitecture search, and encourage deployment on low resource hardwaresuch as Raspberry Pi. We show qualitatively and quantitatively that withonly 1 hour of training on 4GB GPU memory, we can achieve satisfactorysemantic segmentation performance on the proposed datasets.
IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments
Girish Varma,Anbumani Subramanian,Manmohan Chandraker,Anoop Namboodiri,Jawahar C V
Winter Conference on Applications of Computer Vision, WACV, 2019
@inproceedings{bib_IDD:_2019, AUTHOR = {Girish Varma, Anbumani Subramanian, Manmohan Chandraker, Anoop Namboodiri, Jawahar C V}, TITLE = {IDD: A dataset for exploring problems of autonomous navigation in unconstrained environments}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2019}}
While several datasets for autonomous navigation have become available in recent years, they have tended to focus on structured driving environments. This usually corresponds to well-delineated infrastructure such as lanes, a small number of well-defined categories for traffic participants, low variation in object or background appearance and strong adherence to traffic rules. We propose DS, a novel dataset for road scene understanding in unstructured environments where the above assumptions are largely not satisfied. It consists of 10,004 images, finely annotated with 34 classes collected from 182 drive sequences on Indian roads. The label set is expanded in comparison to popular benchmarks such as Cityscapes, to account for new classes. It also reflects label distributions of road scenes significantly different from existing datasets, with most classes displaying greater within-class diversity. Consistent with …
Efficient semantic segmentation using gradual grouping
V NIKITHA,A SAI KRISHANA SRIHARSHA,Girish Varma,Jawahar C V,Manu Mathew,Soyeb Nagori
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2018
@inproceedings{bib_Effi_2018, AUTHOR = {V NIKITHA, A SAI KRISHANA SRIHARSHA, Girish Varma, Jawahar C V, Manu Mathew, Soyeb Nagori}, TITLE = {Efficient semantic segmentation using gradual grouping}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2018}}
Deep CNNs for semantic segmentation have high memory and run time requirements. Various approaches have been proposed to make CNNs efficient like grouped, shuffled, depth-wise separable convolutions. We study the effectiveness of these techniques on a real-time semantic segmentation architecture like ERFNet for improving runtime by over 5X. We apply these techniques to CNN layers partially or fully and evaluate the testing accuracies on Cityscapes dataset. We obtain accuracy vs parameters/FLOPs trade offs, giving accuracy scores for models that can run under specified runtime budgets. We further propose a novel training procedure which starts out with a dense convolution but gradually evolves towards a grouped convolution. We show that our proposed training method and efficient architecture design can improve accuracies by over 8% with depthwise separable convolutions applied on the encoder of ERFNet and attaching a light weight decoder. This results in a model which has a 5X improvement in FLOPs while only suffering a 4% degradation in accuracy with respect to ERFNet.
Class2str: End to end latent hierarchy learning
Soham Saha,Girish Varma,Jawahar C V
International conference on Pattern Recognition, ICPR, 2018
@inproceedings{bib_Clas_2018, AUTHOR = {Soham Saha, Girish Varma, Jawahar C V}, TITLE = {Class2str: End to end latent hierarchy learning}, BOOKTITLE = {International conference on Pattern Recognition}. YEAR = {2018}}
t—Deep neural networks for image classification typically consists of a convolutional feature extractor followed by a fully connected classifier network. The predicted and the ground truth labels are represented as one hot vectors. Such a representation assumes that all classes are equally dissimilar. However, classes have visual similarities and often form a hierarchy. Learning this latent hierarchy explicitly in the architecture could provide invaluable insights. We propose an alternate architecture to the classifier network called the Latent Hierarchy (LH) Classifier and an end to end learned Class2Str mapping which discovers a latent hierarchy of the classes. We show that for some of the best performing architectures on CIFAR and Imagenet datasets, the proposed replacement and training by LH classifier recovers the accuracy, with a fraction of the number of parameters in the classifier part. Compared to the previous work of HDCNN, which also learns a 2 level hierarchy, we are able to learn a hierarchy at an arbitrary number of levels as well as obtain an accuracy improvement on the Imagenet classification task over them. We also verify that many visually similar classes are grouped together, under the learnt hierarchy
City-scale road audit system using deep learning
YARRAM SUDHIR KUMAR REDDY,Girish Varma,Jawahar C V
International Conference on Intelligent Robots and Systems, IROS, 2018
@inproceedings{bib_City_2018, AUTHOR = {YARRAM SUDHIR KUMAR REDDY, Girish Varma, Jawahar C V}, TITLE = {City-scale road audit system using deep learning}, BOOKTITLE = {International Conference on Intelligent Robots and Systems}. YEAR = {2018}}
Abstract— Road networks in cities are massive and is a critical component of mobility. Fast response to defects, that can occur not only due to regular wear and tear but also because of extreme events like storms, is essential. Hence there is a need for an automated system that is quick, scalable and costeffective for gathering information about defects. We propose a system for city-scale road audit, using some of the most recent developments in deep learning and semantic segmentation. For building and benchmarking the system, we curated a dataset which has annotations required for road defects. However, many of the labels required for road audit have high ambiguity which we overcome by proposing a label hierarchy. We also propose a multi-step deep learning model that segments the road, subdivide the road further into defects, tags the frame for each defect and finally localizes the defects on a map gathered using GPS. We analyze and evaluate the models on image tagging as well as segmentation at different levels of the label hierarchy
Improved visual relocalization by discovering anchor points
SOHAM SAHA,Girish Varma,Jawahar C V
British Machine Vision Conference, BMVC, 2018
@inproceedings{bib_Impr_2018, AUTHOR = {SOHAM SAHA, Girish Varma, Jawahar C V}, TITLE = {Improved visual relocalization by discovering anchor points}, BOOKTITLE = {British Machine Vision Conference}. YEAR = {2018}}
We address the visual relocalization problem of predicting the location and camera orientation or pose (6DOF) of the given input scene. We propose a method based on how humans determine their location using the visible landmarks. We define anchor points uniformly across the route map and propose a deep learning architecture which predicts the most relevant anchor point present in the scene as well as the relative offsets with respect to it. The relevant anchor point need not be the nearest anchor point to the ground truth location, as it might not be visible due to the pose. Hence we propose a multi task loss function, which discovers the relevant anchor point, without needing the ground truth for it. We validate the effectiveness of our approach by experimenting on Cambridge Landmarks ( large scale outdoor scenes) as well as 7 Scenes (indoor scenes) using various CNN feature extractors. Our method improves the median error in indoor as well as outdoor localization datasets compared to the previous best deep learning model known as PoseNet (with geometric re-projection loss) using the same feature extractor. We improve the median error in localization in the specific case of Street scene, by over 8m.
Deep expander networks: Efficient deep networks from graph theory
PRABHU AMEYA PANDURANG,Girish Varma,Anoop Namboodiri
European Conference on Computer Vision, ECCV, 2018
@inproceedings{bib_Deep_2018, AUTHOR = {PRABHU AMEYA PANDURANG, Girish Varma, Anoop Namboodiri}, TITLE = {Deep expander networks: Efficient deep networks from graph theory}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2018}}
Efficient CNN designs like ResNets and DenseNet were proposed to improve accuracy vs efficiency trade-offs. They essentially increased the connectivity, allowing efficient information flow across layers. Inspired by these techniques, we propose to model connections between filters of a CNN using graphs which are simultaneously sparse and well connected. Sparsity results in efficiency while well connectedness can preserve the expressive power of the CNNs. We use a well-studied class of graphs from theoretical computer science that satisfies these properties known as Expander graphs. Expander graphs are used to model connections between filters in CNNs to design networks called X-Nets. We present two guarantees on the connectivity of X-Nets: Each node influences every node in a layer in logarithmic steps, and the number of paths between two sets of nodes is proportional to the product of their sizes. We also propose efficient training and inference algorithms, making it possible to train deeper and wider X-Nets effectively. Expander based models give a 4% improvement in accuracy on MobileNet over grouped convolutions, a popular technique, which has the same sparsity but worse connectivity. X-Nets give better performance trade-offs than the original ResNet and DenseNet-BC architectures. We achieve model sizes comparable to state-of-the-art pruning techniques using our simple architecture design, without any pruning. We hope that this work motivates other approaches to utilize results from graph theory to develop efficient network architectures.
Compressing Deep Neural Networks for Recognizing Places
SOHAM SAHA,Girish Varma,Jawahar C V
Asian Conference on Pattern Recognition, ACPR, 2017
@inproceedings{bib_Comp_2017, AUTHOR = {SOHAM SAHA, Girish Varma, Jawahar C V}, TITLE = {Compressing Deep Neural Networks for Recognizing Places}, BOOKTITLE = {Asian Conference on Pattern Recognition}. YEAR = {2017}}
Visual place recognition on low memory devices such as mobile phones and robotics systems is a challenging problem. The state of the art models for this task uses deep learning architectures having close to 100 million parameters which takes over 400MB of memory. This makes these models infeasible to be deployed in low memory devices and gives rise to the need of compressing them. Hence we study the effectiveness of model compression techniques like trained quantization and pruning for reducing the number of parameters on one of the best performing image retrieval models called NetVLAD. We show that a compressed network can be created by starting with a model pre-trained for the task of visual place recognition and then fine-tuning it via trained pruning and quantization. The compressed model is able to produce the same mAP as the original uncompressed network. We achieve almost 50% parameter pruning with no loss in mAP and 70% pruning with close to 2% mAP reduction, while also performing 8-bit quantization. Furthermore, together with 5-bit quantization, we perform about 50% parameter reduction by pruning and get only about 3% reduction in mAP. The resulting compressed networks have sizes of around 30MB and 65MB which makes them easily usable in memory constrained devices.