System and method for retrieving a three-dimensional (3D) object using a self-supervised model
@inproceedings{bib_Syst_2025, AUTHOR = {Narayanan P J, Kajal Mohan Sanklecha, Prayushi Mathur}, TITLE = {System and method for retrieving a three-dimensional (3D) object using a self-supervised model}, BOOKTITLE = {United States Patent}. YEAR = {2025}}
A system and processor-implemented method for three-dimensional (3D) object retrieval using a self-supervised model is provided. The present system learns an embedding space of the 3D mesh objects in a self-supervised manner without the need for objects annotated with their class or other properties. Effective embeddings of 3D mesh objects are learned using the self-supervised method for ranked retrieval from a large collection of 3D objects. A simple representation of mesh objects and a standard neural network model is used to learn the embedding. The results are retrieved on the basis of the shape of the object which may not belong to the same category but look similar in shape using the embeddings generated by self-supervised model. The system is independent of class labels and uses the entire 3D model for better information extraction.
@inproceedings{bib_Fast_2025, AUTHOR = {Kajal Mohan Sanklecha, Prayushi Mathur, Narayanan P J}, TITLE = {Fast self-supervised 3D mesh object retrieval for geometric similarity}, BOOKTITLE = {Computer Vision and Image Understanding}. YEAR = {2025}}
Digital 3D models play a pivotal role in engineering, entertainment, education, and various domains. However, the search and retrieval of these models have not received adequate attention compared to other digital assets like documents and images. Traditional supervised methods face challenges in scalability due to the impracticality of creating large, labeled collections of 3D objects. In response, this paper introduces a self-supervised approach to generate efficient embeddings for 3D mesh objects, facilitating ranked retrieval of similar objects. The proposed method employs a straightforward representation of mesh objects and utilizes an encoder–decoder architecture to learn the embedding. Extensive experiments demonstrate the competitiveness of our approach compared to supervised methods, showcasing its scalability across diverse object collections. Notably, the method exhibits transferability across datasets, implying its potential for broader applicability beyond the training dataset. The robustness and generalization capabilities of the proposed method are substantiated through experiments conducted on varied datasets. These findings underscore the efficacy of the approach in capturing underlying patterns and features, independent of dataset-specific nuances. This self-supervised framework offers a promising solution for enhancing the search and retrieval of 3D models, addressing key challenges in scalability and dataset transferability.
@inproceedings{bib_Prot_2025, AUTHOR = {Narayanan P J, Amula Venkat Adithya, Sunayana Samavedam, SAURABH SAINI, Avani Gupta}, TITLE = {Prototype Guided Backdoor Defense}, BOOKTITLE = {International Conference on Computer Vision}. YEAR = {2025}}
Deep learning models are susceptible to backdoor attacks
involving malicious attackers perturbing a small subset of
training data with a trigger to causes misclassifications.
Various triggers have been used including semantic triggers that are easily realizable without requiring attacker
to manipulate the image. The emergence of generative AI
has eased generation of varied poisoned samples. Robustness across types of triggers is crucial to effective defense.
We propose Prototype Guided Backdoor Defense (PGBD),
a robust post-hoc defense that scales across different trigger types, including previously unsolved semantic triggers.
PGBD exploits displacements in the geometric spaces of activations to penalize movements towards the trigger. This
is done using a novel sanitization loss of a post-hoc finetuning step. The geometric approach scales easily to all
types of attacks. PGBD achieves better performance across
all settings. We also present the first defense against a new
semantic attack on celebrity face images.
Pranav Manu,Astitva Srivastava,Amit Raj,VARUN JAMPANI,Avinash Sharma,Narayanan P J
@inproceedings{bib_Ligh_2025, AUTHOR = {Pranav Manu, Astitva Srivastava, Amit Raj, VARUN JAMPANI, Avinash Sharma, Narayanan P J}, TITLE = {LightHeadEd: Relightable & Editable Head Avatars from a Smartphone}, BOOKTITLE = {Technical Report}. YEAR = {2025}}
Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.
Linearly Transformed Spherical Distributions for Interactive Single Scattering with Area Lights
@inproceedings{bib_Line_2025, AUTHOR = {K T Aakash Ajit, Shah Ishaan Nikhil, Narayanan P J}, TITLE = {Linearly Transformed Spherical Distributions for Interactive Single Scattering with Area Lights}, BOOKTITLE = {European Association for Computer Graphics}. YEAR = {2025}}
Linearly Transformed Spherical Distributions (LTSDs), a superset of the commonly known Linearly Transformed Cosines (LTCs) for analytic area light rendering, applied in the context of analytic area light rendering with participating media.
@inproceedings{bib_Real_2024, AUTHOR = {Rahul Goel, MARKUS SCHÜTZ, Narayanan P J, BERNHARD KERBL}, TITLE = {Real-Time Decompression and Rasterization of Massive Point Clouds}, BOOKTITLE = {Proceedings of the ACM on Computer Graphics and Interactive Techniques}. YEAR = {2024}}
Large-scale capturing of real-world scenes as 3D point clouds (e.g., using LIDAR scanning) generates billions
of points that are challenging to visualize. High storage requirements prevent the quick and easy inspection
of captured datasets on user-grade hardware. The fastest real-time rendering methods are limited by the
available GPU memory and render only around 1 billion points interactively. We show that we can achieve
state-of-the-art in both while simultaneously supporting datasets that surpass the capabilities of other methods.
We present an on-the-fly point cloud decompression scheme that tightly integrates with software rasterization
to reduce on-chip memory requirements by more than 4×. Our method compresses geometry losslessly and
provides high visual quality at real-time framerates. We use a GPU-friendly, clipped Huffman encoding for
compression. Point clouds are divided into equal-sized batches, which are Huffman-encoded independently.
Batches are further subdivided to form easy-to-consume streams of data for massively parallel execution.
The compressed point clouds are stored in an access-aware manner to achieve coherent GPU memory access
Neural Histogram‐Based Glint Rendering of Surfaces With Spatially Varying Roughness
@inproceedings{bib_Neur_2024, AUTHOR = {Shah Ishaan Nikhil, Gamboa, Gruson, Narayanan P J}, TITLE = {Neural Histogram‐Based Glint Rendering of Surfaces With Spatially Varying Roughness}, BOOKTITLE = {Computer Graphics Forum}. YEAR = {2024}}
The complex, glinty appearance of detailed normal-mapped surfaces at different scales requires expensive per-pixel Normal Distribution Function computations. Moreover, large light sources further compound this integration and increase the noise in the Monte Carlo renderer. Specialized rendering techniques that explicitly express the underlying normal distribution have been developed to improve performance for glinty surfaces controlled by a fixed material roughness. We present a new method that supports spatially varying roughness based on a neural histogram that computes per-pixel NDFs with arbitrary positions and sizes. Our representation is both memory and compute efficient. Additionally, we fully integrate direct illumination for all light directions in constant time. Our approach decouples roughness and normal distribution, allowing the live editing of the spatially varying roughness of complex normal-mapped objects. We demonstrate that our approach improves on previous work by achieving smaller footprints while offering GPU-friendly computation and compact representation.
@inproceedings{bib_A_su_2024, AUTHOR = {Avani Gupta, Narayanan P J}, TITLE = {A survey on Concept-based Approaches For Model Improvement}, BOOKTITLE = {Technical Report}. YEAR = {2024}}
The focus of recent research has shifted from merely improving the metrics based performance of Deep Neural Networks (DNNs) to DNNs which are more interpretable to humans. The field of eXplainable Artificial Intelligence (XAI) has observed various techniques, including saliency-based and concept-based approaches. These approaches explain the model's decisions in simple human understandable terms called Concepts. Concepts are known to be the thinking ground of humans}. Explanations in terms of concepts enable detecting spurious correlations, inherent biases, or clever-hans. With the advent of concept-based explanations, a range of concept representation methods and automatic concept discovery algorithms have been introduced. Some recent works also use concepts for model improvement in terms of interpretability and generalization. We provide a systematic review and taxonomy of various concept representations and their discovery algorithms in DNNs, specifically in vision. We also provide details on concept-based model improvement literature marking the first comprehensive survey of these methods.
@inproceedings{bib_Spec_2024, AUTHOR = {SAURABH SAINI, Narayanan P J}, TITLE = {Specularity Factorization for Low-Light Enhancement}, BOOKTITLE = {Computer Vision and Pattern Recognition}. YEAR = {2024}}
We present a new additive image factorization technique that treats images to be composed of multiple latent specular components which can be simply estimated recursively by modulating the sparsity during decomposition. Our model-driven {\em RSFNet} estimates these factors by unrolling the optimization into network layers requiring only a few scalars to be learned. The resultant factors are interpretable by design and can be fused for different image enhancement tasks via a network or combined directly by the user in a controllable fashion. Based on RSFNet, we detail a zero-reference Low Light Enhancement (LLE) application trained without paired or unpaired supervision. Our system improves the state-of-the-art performance on standard benchmarks and achieves better generalization on multiple other datasets. We also integrate our factors with other task specific fusion networks for applications like deraining, deblurring and dehazing with negligible overhead thereby highlighting the multi-domain and multi-task generalizability of our proposed RSFNet. The code and data is released for reproducibility on the project homepage.
@inproceedings{bib_GSN:_2024, AUTHOR = {Vinayak Gupta, Rahul Goel, Dhawal Sirikonda, Narayanan P J}, TITLE = {GSN: Generalisable Segmentation in Neural Radiance Fields}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence}. YEAR = {2024}}
Radiance Fields are being widely explored for 3D scene reconstruction and several downstream tasks, such as segmentation. Prior radiance field segmentation methods require scene-specific training to enable segmentation. We propose distilling semantic features into Radiance Fields in a generalisable fashion using GNT, a transformer-based architecture, enabling 3D reconstruction and multi-view segmentation on arbitrarily new scenes. By fine-tuning our method, any set of 2D features can be distilled into a radiance field, providing better multi-view consistency than the original features. We show multi-view segmentation results on standard datasets and compare our method against existing NeRF-based segmentation methods. We perform on par with the state-of-the-art scene-specific segmentation methods. Our approach and experiments bring generalisable NeRF methods one step closer to the contemporary NeRF literature.
Self-Supervised 3D Mesh Object Retrieval
Kajal Mohan Sanklecha,Prayushi Mathur,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2023
@inproceedings{bib_Self_2023, AUTHOR = {Kajal Mohan Sanklecha, Prayushi Mathur, Narayanan P J}, TITLE = {Self-Supervised 3D Mesh Object Retrieval}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2023}}
Digital representations of 3D objects are increasingly being used for engineering, entertainment, education, etc. Efforts to search and retrieve digital 3D models from a collection have not attracted sufficient attention, unlike digital representations of documents, images, etc. Supervised methods are not feasible to solve this problem as a large collection of labelled 3D objects is difficult to create. This paper presents a self-supervised method to learn efficient embeddings of 3D mesh objects for ranked retrieval of similar objects. We propose a simple representation of mesh objects and an encoder-decoder architecture to learn the embedding. Extensive experiments show that our method is competitive with methods that need supervision while being more scalable to different object collections.
Interactive Segmentation of Radiance Fields
Rahul Goel,Dhawal Sirikonda,SAURABH SAINI,Narayanan P J
Computer Vision and Pattern Recognition, CVPR, 2023
@inproceedings{bib_Inte_2023, AUTHOR = {Rahul Goel, Dhawal Sirikonda, SAURABH SAINI, Narayanan P J}, TITLE = {Interactive Segmentation of Radiance Fields}, BOOKTITLE = {Computer Vision and Pattern Recognition}. YEAR = {2023}}
Radiance Fields (RF) are popular to represent casually-captured scenes for new view synthesis and several applications beyond it. Mixed reality on personal spaces needs understanding and manipulating scenes represented as RFs, with semantic segmentation of objects as an important step. Prior segmentation efforts show promise but don't scale to complex objects with diverse appearance. We present the ISRF method to interactively segment objects with fine structure and appearance. Nearest neighbor feature matching using distilled semantic features identifies high-confidence seed regions. Bilateral search in a joint spatio-semantic space grows the region to recover accurate segmentation. We show state-of-the-art results of segmenting objects from RFs and compositing them to another scene, changing appearance, etc., and an interactive segmentation tool that others can use.
Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
Avani Gupta,SAURABH SAINI,Narayanan P J
Neural Information Processing Systems, NeurIPS, 2023
@inproceedings{bib_Conc_2023, AUTHOR = {Avani Gupta, SAURABH SAINI, Narayanan P J}, TITLE = {Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement}, BOOKTITLE = {Neural Information Processing Systems}. YEAR = {2023}}
Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model’s sensitivity and possible biases to a given concept. In this paper, we extend CAVs from post- hoc analysis to ante-hoc training in order to reduce model bias through fine-tuning using an additional Concept Loss. Concepts were defined on the final layer of the network in the past. We generalize it to intermediate layers using class prototypes. This facilitates class learning in the last convolution layer, which is known to be most informative. We also introduce Concept Distillation to create richer concepts using a pre-trained knowledgeable model as the teacher. Our method can sensitize or desensitize a model towards concepts. We show applications of concept-sensitive training to debias several classification problems. We also use concepts to induce prior knowledge into IID, a reconstruction problem. Concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge. Please visit https://avani17101.github.io/Concept-Distilllation/ for code and more details
Combining Resampled Importance and Projected Solid Angle Samplings for Many Area Light Rendering
Shah Ishaan Nikhil,K T Aakash Ajit,Narayanan P J
SIGGRAPH ASIA Technical Briefs, SATB, 2023
@inproceedings{bib_Comb_2023, AUTHOR = {Shah Ishaan Nikhil, K T Aakash Ajit, Narayanan P J}, TITLE = {Combining Resampled Importance and Projected Solid Angle Samplings for Many Area Light Rendering}, BOOKTITLE = {SIGGRAPH ASIA Technical Briefs}. YEAR = {2023}}
Direct lighting from many area light sources is challenging due to variance from both choosing an important light and then a point on it. Resampled Importance Sampling (RIS) achieves low variance in such situations. However, it is limited to simple sampling strategies for its candidates. Specifically for area lights, we can improve the convergence of RIS by incorporating a better sampling strategy: Projected Solid Angle Sampling (ProjLTC). Naively combining RIS and ProjLTC improves equal sample convergence. However, it achieves little to no gain in equal time. We identify the core issue for the high run times and reformulate RIS for better integration with ProjLTC. Our method achieves better convergence and results in both equal sample and equal time. We evaluate our method on challenging scenes with varying numbers of area light sources and compare it to uniform sampling, RIS, and ProjLTC. In all cases, our method seldom performs worse than RIS and often performs better
Accelerating Hair Rendering by Learning High-Order Scattered Radiance
K T Aakash Ajit,Adrian Jarabo,Carlos Aliaga,Matt Jen-Yuan Chiang,Olivier Maury,Christophe Hery,Narayanan P J,Giljoo Nam
Eurographics Symposium on Rendering, EGSR, 2023
@inproceedings{bib_Acce_2023, AUTHOR = {K T Aakash Ajit, Adrian Jarabo, Carlos Aliaga, Matt Jen-Yuan Chiang, Olivier Maury, Christophe Hery, Narayanan P J, Giljoo Nam}, TITLE = {Accelerating Hair Rendering by Learning High-Order Scattered Radiance}, BOOKTITLE = {Eurographics Symposium on Rendering}. YEAR = {2023}}
Efficiently and accurately rendering hair accounting for multiple scattering is a challenging open problem. Path tracing in hair takes long to converge while other techniques are either too approximate while still being computationally expensive or make assumptions about the scene. We present a technique to infer the higher order scattering in hair in constant time within the path tracing framework, while achieving better computational efficiency. Our method makes no assumptions about the scene and provides control over the renderer’s bias & speedup. We achieve this by training a small multilayer perceptron (MLP) to learn the higher-order radiance online, while rendering progresses. We describe how to robustly train this network and thoroughly analyze our resulting renderer’s characteristics. We evaluate our method on various hairstyles and lighting conditions. We also compare our method against a recent learning based & a traditional real-time hair rendering method and demonstrate better quantitative & qualitative results. Our method achieves a significant improvement in speed with respect to path tracing, achieving a run-time reduction of 40% −70% while only introducing a small amount of bias.
A survey on Concept-based Approaches For Model Improvement
Avani Gupta,SAURABH SAINI,Narayanan P J
Technical Report, arXiv, 2023
@inproceedings{bib_A_su_2023, AUTHOR = {Avani Gupta, SAURABH SAINI, Narayanan P J}, TITLE = {A survey on Concept-based Approaches For Model Improvement}, BOOKTITLE = {Technical Report}. YEAR = {2023}}
The focus of recent research has shifted from merely increasing the Deep Neu- ral Networks (DNNs) performance in various tasks to DNNs which are more interpretable to humans. The field of eXplainable Artificial Intelligence (XAI) has observed various techniques, including saliency-based and concept-based ap- proaches. Concept-based approaches explain the model’s decisions in simple human understandable terms called Concepts. Concepts are human interpretable units of data and the thinking ground of humans. Explanations in terms of concepts enable detecting spurious correlations, inherent biases, or clever-hans. With the ad- vent of concept-based explanations, there have been various concept representation methods and automatic concept discovery algorithms. With automatic concept dis- covery, there are a variety of discovered concept evaluation metrics. Additionally, some approaches use concepts to improve the model explainability and generaliza- tion (Concept Oriented Deep Learning CODL). The concept-based approaches are fairly new, with many new representations coming up while there is very limited work on CODL. We thus provide a systematic review and taxonomy of various concept representations and their discovery algorithms in DNNs, specifically in vision. We also provide details on concept-based model improvement literature being the first to survey CODL methods
Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement
Avani Gupta,SAURABH SAINI,Narayanan P J
Technical Report, arXiv, 2023
@inproceedings{bib_Conc_2023, AUTHOR = {Avani Gupta, SAURABH SAINI, Narayanan P J}, TITLE = {Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement}, BOOKTITLE = {Technical Report}. YEAR = {2023}}
Humans use abstract concepts instead of hard features for generalization. Recent interpretability research has focused on human-centered concept explanations of neural networks. We present Concept Distillation, a novel method and framework for concept-sensitive training to induce human-centered knowledge into the model. We use Concept Activation Vectors (CAVs) to estimate the model’s sensitivity and possible biases to a given concept. We extend CAVs to ante-hoc training from post-hoc analysis. We distill the conceptual knowledge from a pretrained knowledgeable teacher to a student model focused on a single downstream task. Our method can sensitize or desensitize the student model towards concepts. We show applications of concept-sensitive training to debias classification and to induce prior knowledge into a reconstruction problem. We also introduce the TextureMNIST dataset to evaluate the presence of complex texture biases. We show that concept-sensitive training can improve model interpretability, reduce biases, and induce prior knowledge
FusedRF: Fusing Multiple Radiance Fields
Rahul Goel,Dhawal Sirikonda,Rajvi Shah,Narayanan P J
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2023
@inproceedings{bib_Fuse_2023, AUTHOR = {Rahul Goel, Dhawal Sirikonda, Rajvi Shah, Narayanan P J}, TITLE = {FusedRF: Fusing Multiple Radiance Fields}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2023}}
Radiance Fields (RFs) have shown great potential to represent scenes from casually captured discrete views. Compositing parts or whole of multiple captured scenes could greatly interest several XR applications. Prior works can generate new views of such scenes by tracing each scene in parallel. This increases the render times and memory requirements with the number of components. In this work, we provide a method to create a single, compact, fused RF representation for a scene composited using multiple RFs. The fused RF has the same render times and memory utilizations as a single RF. Our method distills information from multiple teacher RFs into a single student RF while also facilitating further manipulations like addition and deletion into the fused representation.
Quaternion Factorized Simulated Exposure Fusion
SAURABH SAINI,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2022
Abs | | bib Tex
@inproceedings{bib_Quat_2022, AUTHOR = {SAURABH SAINI, Narayanan P J}, TITLE = {Quaternion Factorized Simulated Exposure Fusion}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2022}}
Image Fusion maximizes the visual information at each pixel location by merging content from multiple images in order to produce an enhanced image. Exposure Fusion, specifically, fuses a bracketed exposure stack of poorly lit images to generate a properly illuminated image. Given a single input image, exposure fusion can still be employed on a ‘simulated’ exposure stack, leading to direct single image contrast and low-light enhancement. In this work, we present a novel ‘Quaternion Factorized Simulated Exposure Fusion’ (QFSEF) method by factorizing an input image into multiple illumination consistent layers. To this end, we use an iterative sparse matrix factorization scheme by representing the image as a two-dimensional pure quaternion matrix. Theoretically, our representation is based on the dichromatic reflection model and accounts for the two scene illumination characteristics by factorizing each progressively generated image into separate specular and diffuse components. We empirically prove the advantages of our factorization scheme over other exposure simulation methods by using it for the low-light image enhancement task. Furthermore, we provide three exposure fusion strategies which can be used with our simulated stack and provide a comprehensive performance analysis. Finally, in order to validate our claims, we show extensive qualitative and quantitative comparisons against relevant state-of-the-art solutions on multiple standard datasets along with relevant ablation analysis to support our proposition. Our code and data are publicly available for easy reproducibility
Interpreting Intrinsic Image Decomposition using Concept Activations
Avani Gupta,SAURABH SAINI,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2022
@inproceedings{bib_Inte_2022, AUTHOR = {Avani Gupta, SAURABH SAINI, Narayanan P J}, TITLE = {Interpreting Intrinsic Image Decomposition using Concept Activations}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2022}}
Evaluation of ill-posed problems like Intrinsic Image Decomposi- tion (IID) is challenging. IID involves decomposing an image into its constituent illumination-invariant Reflectance (R) and albedo- invariant Shading (S) components. Contemporary IID methods use Deep Learning models and require large datasets for training. The evaluation of IID is carried out on either synthetic Ground Truth images or sparsely annotated natural images. A scene can be split into reflectance and shading in multiple, valid ways. Comparison with one specific decomposition in the ground-truth images used by current IID evaluation metrics like LMSE, MSE, DSSIM, WHDR, SAW AP%, etc., is inadequate. Measuring R-S disentanglement is a better way to evaluate the quality of IID. Inspired by ML inter- pretability methods, we propose Concept Sensitivity Metrics (CSM) that directly measure disentanglement using sensitivity to relevant concepts. Activation vectors for albedo invariance and illumination invariance concepts are used for the IID problem. We evaluate and interpret three recent IID methods on our synthetic benchmark of controlled albedo and illumination invariance sets. We also compare our disentanglement score with existing IID evaluation metrics on both natural and synthetic scenes and report our observations. Our code and data are publicly available for reproducibility
Transfer Textures for Fast Precomputed Radiance Transfer
Dhawal Sirikonda,K T Aakash Ajit,Narayanan P J
European Association for Computer Graphics, Eurographics, 2022
@inproceedings{bib_Tran_2022, AUTHOR = {Dhawal Sirikonda, K T Aakash Ajit, Narayanan P J}, TITLE = {Transfer Textures for Fast Precomputed Radiance Transfer}, BOOKTITLE = {European Association for Computer Graphics}. YEAR = {2022}}
Precomputed Radiance Transfer (PRT) can achieve high-quality renders of glossy materials at real-time framerates. PRT involves precomputing a k-dimensional transfer vector or a k × k- matrix of Spherical Harmonic (SH) coefficients at specific points for a scene depending on whether the material is diffuse or glossy respectively. Most prior art precomputes values at vertices of the mesh and interpolates color for interior points. They require finer mesh tessellations for high-quality renders. In this work, we introduce transfer textures for decoupling mesh resolution from transfer storage and sampling specifically benefiting the glossy renders. Dense sampling of the transfer is possible on the fragment shader while rendering with the use of transfer textures for both diffuse as well as glossy materials, even with a low tessellation. This simultaneously provides high render quality and frame rates.
Casual indoor hdr radiance capture from omnidirectional images
PULKIT GERA,Mohammad Reza Karimi Dastjerdi,Charles Renaud,Narayanan P J,Jean-François Lalonde
Technical Report, arXiv, 2022
@inproceedings{bib_Casu_2022, AUTHOR = {PULKIT GERA, Mohammad Reza Karimi Dastjerdi, Charles Renaud, Narayanan P J, Jean-François Lalonde}, TITLE = {Casual indoor hdr radiance capture from omnidirectional images}, BOOKTITLE = {Technical Report}. YEAR = {2022}}
We present PanoHDR-NeRF, a neural representation of the full HDR radiance field of an indoor scene, and a pipeline to capture it casually, without elaborate setups or complex capture protocols. First, a user captures a low dynamic range (LDR) omnidirectional video of the scene by freely waving an off-the-shelf camera around the scene. Then, an LDR2HDR network uplifts the captured LDR frames to HDR, which are used to train a tailored NeRF++ model. The resulting PanoHDR-NeRF can render full HDR images from any location of the scene. Through experiments on a novel test dataset of real scenes with the ground truth HDR radiance captured at locations not seen during training, we show that PanoHDR-NeRF predicts plausible HDR radiance from any scene point. We also show that the predicted radiance can synthesize correct lighting effects, enabling the augmentation of indoor scenes with synthetic objects that are lit correctly. Datasets and code are available at https://lvsn.github.io/PanoHDR-NeRF/.
Real-Time Rendering of Arbitrary Surface Geometries using Learnt Transfer
Dhawal Sirikonda,K T Aakash Ajit,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2022
@inproceedings{bib_Real_2022, AUTHOR = {Dhawal Sirikonda, K T Aakash Ajit, Narayanan P J}, TITLE = {Real-Time Rendering of Arbitrary Surface Geometries using Learnt Transfer}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2022}}
Precomputed Radiance Transfer (PRT) is widely used for real-time photorealistic effects. PRT disentangles the rendering equation into transfer and lighting, enabling their precomputation. Transfer accounts for the cosine-weighted visibility of points in the scene while lighting for emitted radiance from the environment. Prior art stored precomputed transfer in a tabulated manner, either in vertex or texture space. These values are fetched with interpolation at each point for shading. Vertex space methods require densely tessellated mesh vertices for high quality images. Texture space
StyleTRF: Stylizing Tensorial Radiance Fields
Rahul Goel,Dhawal Sirikonda,SAURABH SAINI,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2022
@inproceedings{bib_Styl_2022, AUTHOR = {Rahul Goel, Dhawal Sirikonda, SAURABH SAINI, Narayanan P J}, TITLE = {StyleTRF: Stylizing Tensorial Radiance Fields}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2022}}
Stylized view generation of scenes captured casually using a camera has received much attention recently. The geometry and appearance of the scene are typically captured as neural point sets or neural radiance fields in the previous work. An image stylization method is used to stylize the captured appearance by training its network jointly or iteratively with the structure capture network. The state-of-the-art SNeRF method trains the NeRF and stylization network in an alternating manner. These methods have high training time and require joint optimization. In this work, we present StyleTRF, a compact, quick-to-optimize strategy for stylized view generation using TensoRF. The appearance part is fine-tuned using sparse stylized priors of a few views rendered using the TensoRF representation for a few iterations. Our method thus effectively decouples style-adaption from view capture and is much faster than the previous methods. We show state-of-the-art results on several scenes used for this purpose.
SYSTEM AND METHOD FOR AUTOMATICALLY RECONSTRUCTING 3D MODEL OF AN OBJECT USING MACHINE LEARNING MODEL
Avinash Sharma,Narayanan P J,Jinka Sai Sagar,Teja Sai Dhondu,Rohan Chacko
United States Patent, Us patent, 2022
@inproceedings{bib_SYST_2022, AUTHOR = {Avinash Sharma, Narayanan P J, Jinka Sai Sagar, Teja Sai Dhondu, Rohan Chacko}, TITLE = {SYSTEM AND METHOD FOR AUTOMATICALLY RECONSTRUCTING 3D MODEL OF AN OBJECT USING MACHINE LEARNING MODEL}, BOOKTITLE = {United States Patent}. YEAR = {2022}}
A system and method of automatically reconstructing a three - dimensional ( 3D ) model of an object using a machine learning model is provided . The method includes ( i ) obtain ing , using an image capturing device , a color image of an object , ( ii ) generating , using an encoder , a feature map by converting the color image that is represented in the 3D array to n - dimensional array , ( iii ) generating , using the machine learning model , a set of peeled depth maps and a set of RGB maps from the feature map , ( iv ) determining one or more 3D surface points of the object by back projecting the set of peeled depth maps and the set of RGB maps to 3D space , ( v ) reconstructing , using the machine learning model , a 3D model of the object by performing surface reconstruc tion using the one or more 3D surface points of the object
SHARP: Shape-Aware Reconstruction of People in Loose Clothing
Jinka Sai Sagar,Astitva Srivastava,Chandradeep Pokhariya,Avinash Sharma,Narayanan P J
International Journal of Computer Vision, IJCV, 2022
@inproceedings{bib_SHAR_2022, AUTHOR = {Jinka Sai Sagar, Astitva Srivastava, Chandradeep Pokhariya, Avinash Sharma, Narayanan P J}, TITLE = {SHARP: Shape-Aware Reconstruction of People in Loose Clothing}, BOOKTITLE = {International Journal of Computer Vision}. YEAR = {2022}}
Recent advancements in deep learning have enabled 3D human body reconstruction from a monocular image, which has broad applications in multiple domains. In this paper, we propose SHARP (SHape Aware Reconstruction of People in loose clothing), a novel end-to-end trainable network that accurately recovers the 3D geometry and appearance of humans in loose clothing from a monocular image. SHARP uses a sparse and efficient fusion strategy to combine parametric body prior with a non-parametric 2D representation of clothed humans. The parametric body prior enforces geometrical consistency on the body shape and pose, while the non-parametric representation models loose clothing and handles self-occlusions as well. We also leverage the sparseness of the non-parametric representation for faster training of our network while using losses on 2D maps. Another key contribution is 3DHumans, our new life-like dataset of 3D human body scans with rich geometrical and textural details. We evaluate SHARP on 3DHumans and other publicly available datasets, and show superior qualitative and quantitative performance than existing state-of-the-art methods.
Bringing Linearly Transformed Cosines to Anisotropic GGX
K T Aakash Ajit,Eric Heitz,Jonathan Dupuy,Narayanan P J
Proceedings of the ACM on Computer Graphics and Interactive Techniques, PACMCGIT, 2022
@inproceedings{bib_Brin_2022, AUTHOR = {K T Aakash Ajit, Eric Heitz, Jonathan Dupuy, Narayanan P J}, TITLE = {Bringing Linearly Transformed Cosines to Anisotropic GGX}, BOOKTITLE = {Proceedings of the ACM on Computer Graphics and Interactive Techniques}. YEAR = {2022}}
Linearly Transformed Cosines (LTCs) are a family of distributions that are used for real-time area-light shading thanks to their analytic integration properties. Modern game engines use an LTC approximation of the ubiquitous GGX model, but currently this approximation only exists for isotropic GGX and thus anisotropic GGX is not supported. While the higher dimensionality presents a challenge in itself, we show that several additional problems arise when fitting, post-processing, storing, and interpolating LTCs in the anisotropic case. Each of these operations must be done carefully to avoid rendering artifacts. We find robust solutions for each operation by introducing and exploiting invariance properties of LTCs. As a result, we obtain a small 84 look-up table that provides a plausible and artifact-free LTC approximation to anisotropic GGX and brings it to real-time area-light shading.
PRTT: Precomputed Radiance Transfer Textures
Dhawal Sirikonda,K T Aakash Ajit,Narayanan P J
Technical Report, arXiv, 2022
@inproceedings{bib_PRTT_2022, AUTHOR = {Dhawal Sirikonda, K T Aakash Ajit, Narayanan P J}, TITLE = {PRTT: Precomputed Radiance Transfer Textures}, BOOKTITLE = {Technical Report}. YEAR = {2022}}
Precomputed Radiance Transfer (PRT) can achieve high quality renders of glossy materials at real-time framerates. PRT involves precomputing a 𝑘-dimensional transfer vector of Spherical Harmonic (SH) coefficients at specific points for a scene. Most prior art precomputes transfer at vertices of the mesh and interpolates color for interior points. They require finer mesh tessellations for high quality renderings. In this paper, we explore and present the use of textures for storing transfer. Using transfer textures decouples mesh resolution from transfer storage and sampling which is useful especially for glossy renders. We further demonstrate glossy interreflections by precomputing additional textures. We thoroughly discuss practical aspects of transfer textures and analyze their performance in realtime rendering applications. We show equivalent or higher render quality and FPS and demonstrate results on several challenging scenes
Appearance Editing with Free-viewpoint Neural Rendering
PULKIT GERA,K T Aakash Ajit,Dhawal Sirikonda,Parikshit sakurikar,Narayanan P J
Technical Report, arXiv, 2021
@inproceedings{bib_Appe_2021, AUTHOR = {PULKIT GERA, K T Aakash Ajit, Dhawal Sirikonda, Parikshit Sakurikar, Narayanan P J}, TITLE = {Appearance Editing with Free-viewpoint Neural Rendering}, BOOKTITLE = {Technical Report}. YEAR = {2021}}
We present a neural rendering framework for simultane- ous view synthesis and appearance editing of a scene from multi-view images captured under known environment il- lumination. Existing approaches either achieve view syn- thesis alone or view synthesis along with relighting, with- out direct control over the scene’s appearance. Our ap- proach explicitly disentangles the appearance and learns a lighting representation that is independent of it. Specif- ically, we independently estimate the BRDF and use it to learn a lighting-only representation of the scene. Such dis- entanglement allows our approach to generalize to arbi- trary changes in appearance while performing view synthe- sis. We show results of editing the appearance of a real scene, demonstrating that our approach produces plausible appearance editing. The performance of our view synthesis approach is demonstrated to be at par with state-of-the-art approaches on both real and synthetic data
Neural view synthesis with appearance editing from unstructured images
PULKIT GERA,K T Aakash Ajit,Dhawal Sirikonda,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2021
@inproceedings{bib_Neur_2021, AUTHOR = {PULKIT GERA, K T Aakash Ajit, Dhawal Sirikonda, Narayanan P J}, TITLE = {Neural view synthesis with appearance editing from unstructured images}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2021}}
e present a neural rendering framework for simultaneous view synthesis and appearance editing of a scene with known envi- ronmental illumination captured using a mobile camera. Existing approaches either achieve view synthesis alone or view synthesis along with relighting, without control over the scene’s appearance. Our approach explicitly disentangles the appearance and learns a lighting representation that is independent of it. Specifically, we jointly learn the scene appearance and a lighting-only represen- tation of the scene. Such disentanglement allows our approach to generalize to arbitrary changes in appearance while performing view synthesis. We show results of editing the appearance of real scenes in interesting and non-trivial ways. The performance of our view synthesis approach is on par with state-of-the-art approaches on both real and synthetic data.
Fast Analytic Soft Shadows from Area Lights
K T Aakash Ajit,Parikshit sakurikar,Narayanan P J
Eurographics Symposium on Rendering, EGSR, 2021
@inproceedings{bib_Fast_2021, AUTHOR = {K T Aakash Ajit, Parikshit Sakurikar, Narayanan P J}, TITLE = {Fast Analytic Soft Shadows from Area Lights}, BOOKTITLE = {Eurographics Symposium on Rendering}. YEAR = {2021}}
In this paper, we present the first method to analytically compute shading and soft shadows for physically based BRDFs from arbitrary area lights. We observe that for a given shading point, shadowed radiance can be computed by analytically integrating over the visible portion of the light source using Linearly Transformed Cosines (LTCs). We present a structured approach to project, re-order and horizon-clip spherical polygons of arbitrary lights and occluders. The visible portion is then computed by multiple repetitive set difference operations. Our method produces noise-free shading and soft-shadows and outperforms raytracing within the same compute budget. We further optimize our algorithm for convex light and occluder meshes by projecting the silhouette edges as viewed from the shading point to a spherical polygon, and performing one set difference operation thereby achieving a speedup of more than 2× …
SHARP: Shape-Aware Reconstruction of People In Loose Clothing
Jinka Sai Sagar,Rohan Chacko,Astitva Srivastava,Avinash Sharma,Narayanan P J
Technical Report, arXiv, 2021
@inproceedings{bib_SHAR_2021, AUTHOR = {Jinka Sai Sagar, Rohan Chacko, Astitva Srivastava, Avinash Sharma, Narayanan P J}, TITLE = {SHARP: Shape-Aware Reconstruction of People In Loose Clothing}, BOOKTITLE = {Technical Report}. YEAR = {2021}}
3D human body reconstruction from monocular images is an interesting and ill-posed problem in computer vision with wider applications in multiple domains. In this paper, we propose SHARP, a novel end-to-end trainable network that accurately recovers the detailed geometry and appearance of 3D people in loose clothing from a monocular image. We propose a sparse and efficient fusion of a parametric body prior with a non-parametric peeled depth map representation of clothed models. The parametric body prior constrains our model in two ways: first, the network retains geometrically consistent body parts that are not occluded by clothing, and second, it provides a body shape context that improves prediction of the peeled depth maps. This enables SHARP to recover fine-grained 3D geometrical details with just L1 losses on the 2D maps, given an input image. We evaluate SHARP on publicly available Cloth3D and THuman datasets and report superior performance to
PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction
Jinka Sai Sagar,Rohan Chacko,Avinash Sharma,Narayanan P J
International conference on 3D Vision, 3DV, 2020
@inproceedings{bib_Peel_2020, AUTHOR = {Jinka Sai Sagar, Rohan Chacko, Avinash Sharma, Narayanan P J}, TITLE = {PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction}, BOOKTITLE = {International conference on 3D Vision}. YEAR = {2020}}
We introduce PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing raytracing on the 3D body model and extending each ray beyond its first intersection. This formulation allows us to handle self-occlusions efficiently compared to other representations. Given a monocular RGB image, we learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. We train PeelGAN using a 3D Chamfer loss and other 2D losses to generate multiple depth values per-pixel and a corresponding RGB field per-vertex in a dual-branch setup. In our simple non-parametric solution, the generated Peeled Depth maps are back-projected to 3D space to obtain a complete textured 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method with current parametric and non-parametric methods in 3D reconstruction and find that we achieve state-of-the-art-results. We demonstrate the effectiveness of our representation on publicly available BUFF and MonoPerfCap datasets as well as loose clothing data collected by our calibrated multi-Kinect setup.
PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction
Jinka Sai Sagar,Rohan Chacko,Avinash Sharma,Narayanan P J
International conference on 3D Vision, 3DV, 2020
@inproceedings{bib_Peel_2020, AUTHOR = {Jinka Sai Sagar, Rohan Chacko, Avinash Sharma, Narayanan P J}, TITLE = {PeeledHuman: Robust Shape Representation for Textured 3D Human Body Reconstruction}, BOOKTITLE = {International conference on 3D Vision}. YEAR = {2020}}
We introduce PeeledHuman - a novel shape representation of the human body that is robust to self-occlusions. PeeledHuman encodes the human body as a set of Peeled Depth and RGB maps in 2D, obtained by performing raytracing on the 3D body model and extending each ray beyond its first intersection. This formulation allows us to handle self-occlusions efficiently compared to other representations. Given a monocular RGB image, we learn these Peeled maps in an end-to-end generative adversarial fashion using our novel framework - PeelGAN. We train PeelGAN using a 3D Chamfer loss and other 2D losses to generate multiple depth values per-pixel and a corresponding RGB field per-vertex in a dual-branch setup. In our simple non-parametric solution, the generated Peeled Depth maps are back-projected to 3D space to obtain a complete textured 3D shape. The corresponding RGB maps provide vertex …
Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks
SIDDHARTHA GAIROLA,SHAH RAJVI AJAYBHAI,Narayanan P J
Winter Conference on Applications of Computer Vision, WACV, 2020
@inproceedings{bib_Unsu_2020, AUTHOR = {SIDDHARTHA GAIROLA, SHAH RAJVI AJAYBHAI, Narayanan P J}, TITLE = {Unsupervised Image Style Embeddings for Retrieval and Recognition Tasks}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2020}}
We propose an unsupervised protocol for learning a neural embedding of visual style of images. Style similarity is an important measure for many applications such as style transfer, fashion search, art exploration, etc. However, computational modeling of style is a difficult task owing to its vague and subjective nature. Most methods for style based retrieval use supervised training with pre-defined categorization of images according to style. While this paradigm is suitable for applications where style categories are well-defined and curating large datasets according to such a categorization is feasible, in several other cases such a categorization is either ill-defined or does not exist. Our protocol for learning style based representations does not leverage categorical labels but a proxy measure for forming triplets of anchor, similar, and dissimilar images. Using these triplets, we learn a compact style embedding that is useful for style-based search and retrieval. The learned embeddings outperform other unsupervised representations for style-based image retrieval task on six datasets that capture different meanings of style. We also show that by finetuning the learned features with dataset-specific style labels, we obtain best results for image style recognition task on five of the six datasets.
PeelNet: Textured 3D reconstruction of human body using single view RGB image
Jinka Sai Sagar,Rohan Chacko,Avinash Sharma,Narayanan P J
Technical Report, arXiv, 2020
@inproceedings{bib_Peel_2020, AUTHOR = {Jinka Sai Sagar, Rohan Chacko, Avinash Sharma, Narayanan P J}, TITLE = {PeelNet: Textured 3D reconstruction of human body using single view RGB image}, BOOKTITLE = {Technical Report}. YEAR = {2020}}
Reconstructing human shape and pose from a single image is a challenging problem due to issues like severe self-occlusions, clothing variations, and changes in lighting to name a few. Many applications in the entertainment industry, e-commerce, health-care (physiotherapy), and mobile-based AR/VR platforms can benefit from recovering the 3D human shape, pose, and texture. In this paper, we present PeelNet, an end-toend generative adversarial framework to tackle the problem of textured 3D reconstruction of the human body from a single RGB image. Motivated by ray tracing for generating realistic images of a 3D scene, we tackle this problem by representing the human body as a set of peeled depth and RGB maps which are obtained by extending rays beyond the first intersection with the 3D object. This formulation allows us to handle self-occlusions efficiently. Current parametric model-based approaches fail to model loose clothing and surface-level details and are proposed for the underlying naked human body. Majority of non-parametric approaches are either computationally expensive or provide unsatisfactory results. We present a simple non-parametric solution where the peeled maps are generated from a single RGB image as input. Our proposed peeled depth maps are back-projected to 3D volume to obtain a complete 3D shape. The corresponding RGB maps provide vertex-level texture details. We compare our method against current state-of-the-art methods in 3D reconstruction and demonstrate the effectiveness of our method on BUFF and MonoPerfCap datasets.
Defocus Magnification Using Conditional Adversarial Networks
Parikshit sakurikar,MEHTA ISHIT BHADRESH,Narayanan P J
Winter Conference on Applications of Computer Vision, WACV, 2019
@inproceedings{bib_Defo_2019, AUTHOR = {Parikshit Sakurikar, MEHTA ISHIT BHADRESH, Narayanan P J}, TITLE = {Defocus Magnification Using Conditional Adversarial Networks}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2019}}
Defocus magnification is the process of rendering a shallow depth-of-field in an image captured using a camera with a narrow aperture. Defocus magnification is a useful tool in photography for emphasis on the subject and for highlighting background bokeh. Estimating the per-pixel blur kernel or the depth-map of the scene followed by spatially-varying re-blurring is the standard approach to defocus magnification. We propose a single-step approach that directly converts a narrow-aperture image to a wide-aperture image. We use a conditional adversarial network trained on multiaperture images created from light-fields. We use a novel loss term based on a composite focus measure to improve generalization and show high quality defocus magnification.
Semantic Hierarchical Priors for Intrinsic Image Decomposition
SAURABH SAINI,Narayanan P J
Technical Report, arXiv, 2019
@inproceedings{bib_Sema_2019, AUTHOR = {SAURABH SAINI, Narayanan P J}, TITLE = {Semantic Hierarchical Priors for Intrinsic Image Decomposition}, BOOKTITLE = {Technical Report}. YEAR = {2019}}
Intrinsic Image Decomposition (IID) is a challenging and interesting computer vision problem with various applications in several fields. We present novel semantic priors and an integrated approach for single image IID that involves analyzing image at three hierarchical context levels. Local context priors capture scene properties at each pixel within a small neighbourhood. Mid-level context priors encode object level semantics. Global context priors establish correspondences at the scene level. Our semantic priors are designed on both fixed and flexible regions, using selective search method and Convolutional Neural Network features. Our IID method is an iterative multistage optimization scheme and consists of two complementary formulations: L2 smoothing for shading and L1 sparsity for reflectance. Experiments and analysis of our method indicate the utility of our semantic priors and structured hierarchical analysis in an IID framework. We compare our method with other contemporary IID solutions and show results with lesser artifacts. Finally, we highlight that proper choice and encoding of prior knowledge can produce competitive results even when compared to end-to-end deep learning IID methods, signifying the importance of such priors. We believe that the insights and techniques presented in this paper would be useful in the future IID research.
Welcome to the India Region Special Section
Narayanan P J,Pankaj Jalote,Anand Deshpande
Communications of the ACM, CACM, 2019
@inproceedings{bib_Welc_2019, AUTHOR = {Narayanan P J, Pankaj Jalote, Anand Deshpande}, TITLE = {Welcome to the India Region Special Section}, BOOKTITLE = {Communications of the ACM}. YEAR = {2019}}
A Flexible Neural Renderer for Material Visualization
K T Aakash Ajit,Parikshit sakurikar,SAURABH SAINI,Narayanan P J
SIGGRAPH ASIA Technical Briefs, SATB, 2019
@inproceedings{bib_A_Fl_2019, AUTHOR = {K T Aakash Ajit, Parikshit Sakurikar, SAURABH SAINI, Narayanan P J}, TITLE = {A Flexible Neural Renderer for Material Visualization}, BOOKTITLE = {SIGGRAPH ASIA Technical Briefs}. YEAR = {2019}}
Photo realism in computer generated imagery is crucially dependent on how well an artist is able to recreate real-world materials in the scene. The workflow for material modeling and editing typically involves manual tweaking of material parameters and uses a standard path tracing engine for visual feedback. A lot of time may be spent in iterative selection and rendering of materials at an appropriate quality. In this work, we propose a convolutional neural network based workflow which quickly generates high-quality ray traced material visualizations on a shaderball. Our novel architecture allows for control over environment lighting and assists material selection along with the ability to render spatially-varying materials. Additionally, our network enables control over environment lighting which gives an artist more freedom and provides better visualization of the rendered material. Comparison with state-of-the-art denoising and neural rendering techniques suggests that our neural renderer performs faster and better. We provide a interactive visualization tool and release our training dataset to foster further research in this area.
Nose, eyes and ears: Head pose estimation by locating facial keypoints
ARYAMAN GUPTA,KALPIT THAKKAR,Vineet Gandhi,Narayanan P J
International Conference on Acoustics, Speech, and Signal Processing, ICASSP, 2019
@inproceedings{bib_Nose_2019, AUTHOR = {ARYAMAN GUPTA, KALPIT THAKKAR, Vineet Gandhi, Narayanan P J}, TITLE = {Nose, eyes and ears: Head pose estimation by locating facial keypoints}, BOOKTITLE = {International Conference on Acoustics, Speech, and Signal Processing}. YEAR = {2019}}
Monocular head pose estimation requires learning a model that computes the intrinsic Euler angles for pose (yaw, pitch, roll) from an input image of human face. Annotating ground truth head pose angles for images in the wild is difficult and requires ad-hoc fitting procedures (which provides only coarse and approximate annotations). This highlights the need for approaches which can train on data captured in controlled environment and generalize on the images in the wild (with varying appearance and illumination of the face). Most present day deep learning approaches which learn a regression function directly on the input images fail to do so. To this end, we propose to use a higher level representation to regress the head pose while using deep learning architectures. More specifically, we use the uncertainty maps in the form of 2D soft localization heatmap images over five facial keypoints, namely left ear, right ear, left eye, right eye and nose, and pass them through an convolutional neural network to regress the head-pose. We show head pose estimation results on two challenging benchmarks BIWI and AFLW and our approach surpasses the state of the art on both the datasets. Index Terms— Image analysis, Pose estimation
Structured Adversarial Training for Unsupervised Monocular Depth Estimation
Ishit Mehta,Parikshit Sakurikar ,Narayanan P J
International conference on 3D Vision, 3DV, 2018
@inproceedings{bib_Stru_2018, AUTHOR = {Ishit Mehta, Parikshit Sakurikar , Narayanan P J}, TITLE = {Structured Adversarial Training for Unsupervised Monocular Depth Estimation}, BOOKTITLE = {International conference on 3D Vision}. YEAR = {2018}}
The problem of estimating scene-depth from a single im- age has seen great progress lately. Recent unsupervised methods are based on view-synthesis and learn depth by minimizing photometric reconstruction error. In this pa- per, we introduce Structured Adversarial Training (StrAT) to this problem. We generate multiple novel views using depth (or disparity), with the stereo-baseline changing in an increasing order. Adversarial training that goes from easy examples to harder ones produces richer losses and better models. The impact of StrAT is shown to exceed tra- ditional data augmentation using random new views. The combination of an adversarial framework, multiview learn- ing, and structured adversarial training produces state-of- the-art performance on unsupervised depth estimation for monocular images. The StrAT framework can benefit sev- eral problems that use adversarial training
RefocusGAN: Scene Refocusing using a Single Image
Parikshit sakurikar,MEHTA ISHIT BHADRESH,Vineeth N. Balasubramanian,Narayanan P J
European Conference on Computer Vision, ECCV, 2018
@inproceedings{bib_Refo_2018, AUTHOR = {Parikshit Sakurikar, MEHTA ISHIT BHADRESH, Vineeth N. Balasubramanian, Narayanan P J}, TITLE = {RefocusGAN: Scene Refocusing using a Single Image}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2018}}
Post-capture control of the focus position of an image is a useful photographic tool. Changing the focus of a single image involves the complex task of simultaneously estimating the radiance and the defocus radius of all scene points. We introduce RefocusGAN, a deblurthen-reblur approach to single image refocusing. We train conditional adversarial networks for deblurring and refocusing using wide-aperture images created from light-fields. By appropriately conditioning our networks with a focus measure, an in-focus image and a refocus control parameter δ, we are able to achieve generic free-form refocusing over a single image.
Semantic Priors for Intrinsic Image Decomposition
SAURABH SAINI,Narayanan P J
British Machine Vision Conference, BMVC, 2018
@inproceedings{bib_Sema_2018, AUTHOR = {SAURABH SAINI, Narayanan P J}, TITLE = {Semantic Priors for Intrinsic Image Decomposition}, BOOKTITLE = {British Machine Vision Conference}. YEAR = {2018}}
Intrinsic Image Decomposition (IID) is a challenging and interesting computer vision problem with various applications in several fields. We present novel semantic priors and an integrated approach for single image IID that involves analyzing image at three hierarchical context levels. Local context priors capture scene properties at each pixel within a small neighborhood. Mid-level context priors encode object level semantics. Global context priors establish correspondences at the scene level. Our semantic priors are designed on both fixed and flexible regions, using selective search method and Convolutional Neural Network features. Experiments and analysis of our method indicate the utility of our weak semantic priors and structured hierarchical analysis in an IID framework. We compare our method with the current state-of-the-art and show results with lesser artifacts. Finally, we highlight that proper choice and encoding of prior knowledge can produce competitive results compared to end-to-end deep learning IID methods, signifying the importance of such priors. We believe that the insights and techniques presented in this paper would be useful in the future IID research.
Find Me a Sky: A Data-Driven Method for Color Consistent Sky Search and Replacement
SAUMYA RAWAT,SIDDHARTHA GAIROLA,SHAH RAJVI AJAYBHAI,Narayanan P J
International Conference on MultiMedia Modeling, MMM, 2018
@inproceedings{bib_Find_2018, AUTHOR = {SAUMYA RAWAT, SIDDHARTHA GAIROLA, SHAH RAJVI AJAYBHAI, Narayanan P J}, TITLE = {Find Me a Sky: A Data-Driven Method for Color Consistent Sky Search and Replacement}, BOOKTITLE = {International Conference on MultiMedia Modeling}. YEAR = {2018}}
Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. We propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼ 1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method.
View-graph Selection Framework for SfM
SHAH RAJVI AJAYBHAI,Visesh Chari,Narayanan P J
European Conference on Computer Vision, ECCV, 2018
@inproceedings{bib_View_2018, AUTHOR = {SHAH RAJVI AJAYBHAI, Visesh Chari, Narayanan P J}, TITLE = {View-graph Selection Framework for SfM}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2018}}
View-graph selection is a crucial step for accurate and efficient large-scale structure from motion (sfm). Most sfm methods remove undesirable images and pairs using several fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose an optimization based formulation that can be used to achieve these different reconstruction objectives with taskspecific cost modeling and construct a very efficient network-flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard sfm pipeline and improves its generalization. This paper mainly focuses on application of this framework with standard sfm pipeline for accurate and ghost-free reconstructions of highly ambiguous datasets. To model selection costs for this task, we introduce new disambiguation priors based on local geometry. We further demonstrate versatility of the method by using it for the general objective of accurate and efficient reconstruction of largescale Internet datasets using costs based on well-known sfm priors.
Refocus GAN: Scene Refocusing using a Single Image
Parikshit sakurikar,MEHTA ISHIT BHADRESH,Vineeth N. Balasubramanian,Narayanan P J
European Conference on Computer Vision, ECCV, 2018
@inproceedings{bib_Refo_2018, AUTHOR = {Parikshit Sakurikar, MEHTA ISHIT BHADRESH, Vineeth N. Balasubramanian, Narayanan P J}, TITLE = {Refocus GAN: Scene Refocusing using a Single Image}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2018}}
Post-capture control of the focus position of an image is a useful photographic tool. Changing the focus of a single image involves the complex task of simultaneously estimating the radiance and the defocus radius of all scene points. We introduce RefocusGAN, a deblurthen-reblur approach to single image refocusing. We train conditional adversarial networks for deblurring and refocusing using wide-aperture images created from light-fields. By appropriately conditioning our networks with a focus measure, an in-focus image and a refocus control parameter δ, we are able to achieve generic free-form refocusing over a single image.
Find me a sky : a data-driven method for color-consistent sky search replacement
SAUMYA RAWAT,SIDDHARTHA GAIROLA,SHAH RAJVI AJAYBHAI,Narayanan P J
International Conference on MultiMedia Modeling, MMM, 2018
@inproceedings{bib_Find_2018, AUTHOR = {SAUMYA RAWAT, SIDDHARTHA GAIROLA, SHAH RAJVI AJAYBHAI, Narayanan P J}, TITLE = {Find me a sky : a data-driven method for color-consistent sky search replacement}, BOOKTITLE = {International Conference on MultiMedia Modeling}. YEAR = {2018}}
Replacing overexposed or dull skies in outdoor photographs is a desirable photo manipulation. It is often necessary to color correct the foreground after replacement to make it consistent with the new sky. Methods have been proposed to automate the process of sky replacement and color correction. However, many times a color correction is unwanted by the artist or may produce unrealistic results. We propose a data-driven approach to sky-replacement that avoids color correction by finding a diverse set of skies that are consistent in color and natural illumination with the query image foreground. Our database consists of ∼ 1200 natural images spanning many outdoor categories. Given a query image, we retrieve the most consistent images from the database according to L2 similarity in feature space and produce candidate composites. The candidates are re-ranked based on realism and diversity. We used pre-trained CNN features and a rich set of hand-crafted features that encode color statistics, structural layout, and natural illumination statistics, but observed color statistics to be the most effective for this task. We share our findings on feature selection and show qualitative results and a user-study based evaluation to show the effectiveness of the proposed method.
Human Shape Capture and Tracking at Home
GAURAV MISHRA,SAURABH SAINI,Kiran Varanasi,Narayanan P J
Winter Conference on Applications of Computer Vision, WACV, 2018
@inproceedings{bib_Huma_2018, AUTHOR = {GAURAV MISHRA, SAURABH SAINI, Kiran Varanasi, Narayanan P J}, TITLE = {Human Shape Capture and Tracking at Home}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2018}}
Human body tracking typically requires specialized capture set-ups. Although pose tracking is available in consumer devices like Microsoft Kinect, it is restricted to stick figures visualizing body part detection. In this paper, we propose a method for full 3D human body shape and motion capture of arbitrary movements from the depth channel of a single Kinect, when the subject wears casual clothes. We do not use the RGB channel or an initialization procedure that requires the subject to move around in front of the camera. This makes our method applicable for arbitrary clothing textures and lighting environments, with minimal subject intervention. Our method consists of 3D surface feature detection and articulated motion tracking, which is regularized by a statistical human body model [26]. We also propose the idea of a Consensus Mesh (CMesh) which is the 3D template of a person created from a single view point. We demonstrate tracking results on challenging poses and argue that using CMesh along with statistical body models can improve tracking accuracies. Quantitative evaluation of our dense body tracking shows that our method has very little drift which is improved by the usage of CMesh.
Part-based Graph Convolutional Network for Action Recognition
KALPIT THAKKAR,Narayanan P J
Computer Vision and Pattern Recognition, CVPR, 2018
@inproceedings{bib_Part_2018, AUTHOR = {KALPIT THAKKAR, Narayanan P J}, TITLE = {Part-based Graph Convolutional Network for Action Recognition}, BOOKTITLE = {Computer Vision and Pattern Recognition}. YEAR = {2018}}
Human actions comprise of joint motion of articulated body parts or “gestures”. Human skeleton is intuitively represented as a sparse graph with joints as nodes and natural connections between them as edges. Graph convolutional networks have been used to recognize actions from skeletal videos. We introduce a part-based graph convolutional network (PB-GCN) for this task, inspired by Deformable Part-based Models (DPMs). We divide the skeleton graph into four subgraphs with joints shared across them and learn a recognition model using a part-based graph convolutional network. We show that such a model improves performance of recognition, compared to a model using entire skeleton graph. Instead of using 3D joint coordinates as node features, we show that using relative coordinates and temporal displacements boosts performance. Our model achieves state-of-the-art performance on two challenging benchmark datasets NTURGB+D and HDM05, for skeletal action recognition.
SLFT: A physically accurate framework for Tracing Synthetic Light Fields
UDYAN KHURANA,Parikshit sakurikar,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2018
@inproceedings{bib_SLFT_2018, AUTHOR = {UDYAN KHURANA, Parikshit Sakurikar, Narayanan P J}, TITLE = {SLFT: A physically accurate framework for Tracing Synthetic Light Fields}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2018}}
A light field is a 4D function which captures all the radiance information of a scene. Image-based creation of light fields reconstructs the 4D space using pre-captured imagery from various views and employs refocusing to generate output images. Handheld cameras can also capture light fields using a microlens array between the sensor and the main lens, but physical constraints limit their spatial and angular resolutions. In this paper, we present a GPU based synthetic light field rendering framework that is robust and physically accurate. We demonstrate the equivalence of the standard light field camera representation with light slab representation for synthetic light fields and exhibit the capability of our framework to trace light fields of resolutions much higher than available in commercial plenoptic cameras. The light slab is rich in quality but bulky to store. Our system provides parameters to balance the quality and storage requirements. We also present a compact representation of the 4D light slabs using a video compression codec and demonstrate different quality-size combinations using these representations.
Focal Stack Representation and Focus Manipulation
Parikshit sakurikar,Narayanan P J
Asian Conference on Pattern Recognition, ACPR, 2017
@inproceedings{bib_Foca_2017, AUTHOR = {Parikshit Sakurikar, Narayanan P J}, TITLE = {Focal Stack Representation and Focus Manipulation}, BOOKTITLE = {Asian Conference on Pattern Recognition}. YEAR = {2017}}
Focus, depth-of-field, and defocus are important ele-ments that portray the aesthetic emphasis in a good pho-tograph. The ability to manipulate the focus after capture provides useful creative control to photographers. Captur-ing focal stacks - multiple images with small change in focus setting - of static scenes is relatively easy with modern cam-eras. We propose a compact representation for focal stacks using an all-in-focus image, a focal-slice index map and pair-wise defocus blur parameters. Using our representation, we show reconstruction of images with different focus effects including extended focus, multiple focus, and scene synthesis with natural focus effects. A user study shows high acceptability of the synthesized images compared to real ones. The compact and powerful representation of focal stacks makes them suitable for handling by image editing tools in order to provide flexible focus manipulation
Composite Focus Measure for High Quality Depth Maps
PARIKSHIT VISHWAS SAKURIKAR,Narayanan P J
International Conference on Computer Vision, ICCV, 2017
@inproceedings{bib_Comp_2017, AUTHOR = {PARIKSHIT VISHWAS SAKURIKAR, Narayanan P J}, TITLE = {Composite Focus Measure for High Quality Depth Maps}, BOOKTITLE = {International Conference on Computer Vision}. YEAR = {2017}}
Depth from focus is a highly accessible method to estimate the 3D structure of everyday scenes. Today’s DSLR and mobile cameras facilitate the easy capture of multiple focused images of a scene. Focus measures (FMs) that estimate the amount of focus at each pixel form the basis of depth-from-focus methods. Several FMs have been proposed in the past and new ones will emerge in the future, each with their own strengths. We estimate a weighted combination of standard FMs that outperforms others on a wide range of scene types. The resulting composite focus measure consists of FMs that are in consensus with one another but not in chorus. Our two-stage pipeline first estimates fine depth at each pixel using the composite focus measure. A cost-volume propagation step then assigns depths from confident pixels to others. We can generate high quality depth maps using just the top five FMs from our composite focus measure. This is a positive step towards depth estimation of everyday scenes with no special equipment.
SynCam: Capturing sub-frame synchronous media using smartphones
MEHTA ISHIT BHADRESH,PARIKSHIT VISHWAS SAKURIKAR,RAJVI SHAH,Narayanan P J
International Conference on Multimedia and Expo, ICME, 2017
@inproceedings{bib_SynC_2017, AUTHOR = {MEHTA ISHIT BHADRESH, PARIKSHIT VISHWAS SAKURIKAR, RAJVI SHAH, Narayanan P J}, TITLE = {SynCam: Capturing sub-frame synchronous media using smartphones}, BOOKTITLE = {International Conference on Multimedia and Expo}. YEAR = {2017}}
Smartphones have become the de-facto capture devices for everyday photography. Unlike traditional digital cameras, smartphones are versatile devices with auxiliary sensors, processing power, and networking capabilities. In this work, we harness the communication capabilities of smartphones and present a synchronous/co-ordinated multi-camera capture system. Synchronous capture is important for many image/video fusion and 3D reconstruction applications. The proposed system provides an inexpensive and effective means to capture multi-camera media for such applications. Our coordinated capture system is based on a wireless protocol that uses NTP based synchronization and device specific lag compensation. It achieves sub-frame synchronization across all participating smartphones of even heterogeneous make and model. We propose a new method based on fiducial markers displayed on an LCD screen to temporally calibrate smartphone cameras. We demonstrate the utility and versatility of this system to enhance traditional videography and to create novel visual representations such as panoramic videos, HDR videos, multi-view 3D reconstruction, multi-flash imaging, and multi-camera social media.
A Unified View-Graph Selection Framework for Structure from Motion
RAJVI SHAH, Visesh Chari,Narayanan P J
European Conference on Computer Vision, ECCV, 2017
@inproceedings{bib_A_Un_2017, AUTHOR = {RAJVI SHAH, Visesh Chari, Narayanan P J}, TITLE = {A Unified View-Graph Selection Framework for Structure from Motion}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2017}}
View-graph is an essential input to large-scale structure from motion (SfM) pipelines. Accuracy and efficiency of large-scale SfM is crucially dependent on the input viewgraph. Inconsistent or inaccurate edges can lead to inferior or wrong reconstruction. Most SfM methods remove ‘undesirable’ images and pairs using several, fixed heuristic criteria, and propose tailor-made solutions to achieve specific reconstruction objectives such as efficiency, accuracy, or disambiguation. In contrast to these disparate solutions, we propose a single optimization framework that can be used to achieve these different reconstruction objectives with task-specific cost modeling. We also construct a very efficient network-flow based formulation for its approximate solution. The abstraction brought on by this selection mechanism separates the challenges specific to datasets and reconstruction objectives from the standard SfM pipeline and improves its generalization. This paper demonstrates the application of the proposed view-graph framework with standard SfM pipeline for two particular use-cases, (i) accurate and ghost-free reconstructions of highly ambiguous datasets using costs based on disambiguation priors, and (ii) accurate and efficient reconstruction of large-scale Internet datasets using costs based on commonly used priors.
Multistage SfM: A coarse-to-fine approach for 3d reconstruction
RAJVI SHAH,Deshpande Aditya Rajiv,Narayanan P J
Technical Report, arXiv, 2016
@inproceedings{bib_Mult_2016, AUTHOR = {RAJVI SHAH, Deshpande Aditya Rajiv, Narayanan P J}, TITLE = {Multistage SfM: A coarse-to-fine approach for 3d reconstruction}, BOOKTITLE = {Technical Report}. YEAR = {2016}}
Several methods have been proposed for large-scale 3D reconstruction from large, unorganized image collections. A large reconstruction problem is typically divided into multiple components which are reconstructed independently using structure from motion (SFM) and later merged together. Incremental SFM methods are most popular for the basic structure recovery of a single component. They are robust and effective but strictly sequential in nature. We present a multistage approach for SFM reconstruction of a single component that breaks the sequential nature of the incremental SFM methods. Our approach begins with quickly building a coarse 3D model using only a fraction of features from given images. The coarse model is then enriched by localizing remaining images and matching and triangulating remaining features in subsequent stages. The geometric information available in form of the coarse model allows us to make these stages effective, efficient, and highly parallel. We show that our method produces similar quality models as compared to standard SFM methods while being notably fast and parallel.
From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips
ADITYA SINGH,SAURABH SAINI,RAJVI SHAH,Narayanan P J
German Conference on Pattern Recognition, GCPR, 2016
@inproceedings{bib_From_2016, AUTHOR = {ADITYA SINGH, SAURABH SAINI, RAJVI SHAH, Narayanan P J}, TITLE = {From Traditional to Modern : Domain Adaptation for Action Classification in Short Social Video Clips}, BOOKTITLE = {German Conference on Pattern Recognition}. YEAR = {2016}}
Short internet video clips like vines present a significantly wild distribution compared to traditional video datasets. In this paper, we focus on the problem of unsupervised action classification in wild vines using traditional labeled datasets. To this end, we use a data augmentation based simple domain adaptation strategy. We utilize semantic word2vec space as a common subspace to embed video features from both, labeled source domain and unlabled target domain. Our method incrementally augments the labeled source with target samples and iteratively modifies the embedding function to bring the source and target distributions together. Additionally, we utilize a multi-modal representation that incorporates noisy semantic information available in form of hash-tags. We show the effectiveness of this simple adaptation technique on a test set of vines and achieve notable improvements in performance.
Learning to Hash-tag Videos with Tag2Vec
ADITYA SINGH,SAURABH SAINI,RAJVI SHAH,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2016
@inproceedings{bib_Lear_2016, AUTHOR = {ADITYA SINGH, SAURABH SAINI, RAJVI SHAH, Narayanan P J}, TITLE = {Learning to Hash-tag Videos with Tag2Vec}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2016}}
User-given tags or labels are valuable resources for semantic understanding of visual media such as images and videos. Recently, a new type of labeling mechanism known as hashtags have become increasingly popular on social media sites. In this paper, we study the problem of generating relevant and useful hash-tags for short video clips. Traditional data driven approaches for tag enrichment and recommendation use direct visual similarity for label transfer and propagation. We attempt to learn a direct low-cost mapping from video to hash-tags using a two step training process. We first employ a natural language processing (NLP) technique, skip-gram models with neural network training to learn a low-dimensional vector representation of hash-tags (Tag2Vec) using a corpus of ∼ 10 million hash-tags. We then train an embedding function to map video features to the low-dimensional Tag2vec space. We learn this embedding for 29 categories of short video clips with hash-tags. A query video without any tag-information can then be directly mapped to the vector space of tags using the learned embedding and relevant tags can be found by performing a simple nearest-neighbor retrieval in the Tag2Vec space. We validate the relevance of the tags suggested by our system qualitatively and quantitatively with a user study.
Intrinsic Image Decomposition using Focal Stacks
SAURABH SAINI,PARIKSHIT VISHWAS SAKURIKAR,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2016
@inproceedings{bib_Intr_2016, AUTHOR = {SAURABH SAINI, PARIKSHIT VISHWAS SAKURIKAR, Narayanan P J}, TITLE = {Intrinsic Image Decomposition using Focal Stacks}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2016}}
In this paper, we presents a novel method (RGBF-IID) for intrinsic image decomposition of a wild scene without any restrictions on the complexity, illumination or scale of the image. We use focal stacks of the scene as input. A focal stack captures a scene at varying focal distances. Since focus depends on distance to the object, this representation has information beyond an RGB image towards an RGBD image with depth. We call our representation an RGBF image to highlight this. We use a robust focus measure and generalized random walk algorithm to compute dense probability maps across the stack. These maps are used to define sparse local and global pixel neighbourhoods, adhering to the structure of the underlying 3D scene. We use these neighbourhood correspondences with standard chromaticity assumptions as constraints in an optimization system. We present our results on both indoor and outdoor scenes using manually captured stacks of random objects under natural as well as artificial lighting conditions. We also test our system on a larger dataset of synthetically generated focal stacks from NYUv2 and MPI Sintel datasets and show competitive performance against current state-of-the-art IID methods that use RGBD images. Our method provides a strong evidence for the potential of RGBF modality in place of RGBD in computer vision.
Large-scale Virtual Texturing on a DistributedRendering System
REVANTH REDDY N R,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2015
@inproceedings{bib_Larg_2015, AUTHOR = {REVANTH REDDY N R, Narayanan P J}, TITLE = {Large-scale Virtual Texturing on a DistributedRendering System}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2015}}
here has been a profound interest of late in the digitization and reconstruction of historical monuments. Rendering massive monument models requires a cluster of work stations because of the computational in feasibility of rendering over a single machine. Moreover, interactive rendering of these massive models in an immersive environment is only possible over a cluster of machines. In this paper, we present a design of distributed rendering system to efficiently handle models with massive textures. A server holds the skeleton of the whole model and divides the screen space balancing the rendering load among multiple clients. Each client loads only the require geometry and textures to render its sub-scene. We present a virtual texturing method for handling massive textures over the distributed rendering system. These textures are combined into a texture atlas which is split into equally sized tiles. A virtual texture is built over this atlas with each pixel representing a tile in the atlas. An efficient caching module loads only the required tiles into the memory, that are identified using the virtual texture.A fragment shader uses the virtual texture as a mapping to the physical texture in memory to generate the fragments. We demonstrate the performance of our system over a 350M triangles and 500 gigapixel textured model of Vittala temple.
Geometry-aware Feature Matching for Structure from Motion Applications
RAJVI SHAH,VANSHIKA SRIVASTAVA,Narayanan P J
Winter Conference on Applications of Computer Vision, WACV, 2015
@inproceedings{bib_Geom_2015, AUTHOR = {RAJVI SHAH, VANSHIKA SRIVASTAVA, Narayanan P J}, TITLE = {Geometry-aware Feature Matching for Structure from Motion Applications}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2015}}
We present a two-stage, geometry-aware approach for matching SIFT-like features in a fast and reliable manner. Our approach first uses a small sample of features to estimate the epipolar geometry between the images and leverages it for guided matching of the remaining features. This simple and generalized two-stage matching approach produces denser feature correspondences while allowing us to formulate an accelerated search strategy to gain significant speedup over the traditional matching. The traditional matching punitively rejects many true feature matches due to a global ratio test. The adverse effect of this is particularly visible when matching image pairs with repetitive structures. The geometry-aware approach prevents such preemptive rejection using a selective ratio-test and works effectively even on scenes with repetitive structures. We also show that the proposed algorithm is easy to parallelize and implement it on the GPU. We experimentally validate our algorithm on publicly available datasets and compare the results with state-of-the-art methods.
Fast Burrows Wheeler Compression using All-Cores
Aditya Deshpande,Narayanan P J
International Parallel & Distributed Processing Symposium Workshops, IPDPS-W, 2015
@inproceedings{bib_Fast_2015, AUTHOR = {Aditya Deshpande, Narayanan P J}, TITLE = {Fast Burrows Wheeler Compression using All-Cores}, BOOKTITLE = {International Parallel & Distributed Processing Symposium Workshops}. YEAR = {2015}}
In this paper, we present an all-core implementation of Burrows Wheeler Compression algorithm that exploits all computing resources on a system. Our focus is to provide significant benefit to everyday users on common end-to-end applications by exploiting the parallelism of multiple CPU cores and additional accelerators, viz. many-core GPU, on their machines. The all-core framework is suitable for problems that process large files or buffers in blocks. We consider a system to be made up of compute stations and use a work-queue to dynamically divide the tasks among them. Each compute station uses an implementation that optimally exploits its architecture. We develop a fast GPU BWC algorithm by extending the state-of-the-art GPU string sort to efficiently perform BWT step of BWC. Our hybrid BWC with GPU acceleration achieves a 2.9× speedup over best CPU implementation. Our all-core framework allows concurrent processing of blocks by both GPU and all available CPU cores. We achieve a 3.06× speedup by using all CPU cores and a 4.87× speedup when we additionally use an accelerator i.e. GPU. Our approach will scale to the number and different types of computing resources or accelerators found on a system.
Coherent and Importance Sampled LVC BDPT on the GPU
SRINATH R,Narayanan P J
SIGGRAPH ASIA Technical Briefs, SATB, 2015
@inproceedings{bib_Cohe_2015, AUTHOR = {SRINATH R, Narayanan P J}, TITLE = {Coherent and Importance Sampled LVC BDPT on the GPU}, BOOKTITLE = {SIGGRAPH ASIA Technical Briefs}. YEAR = {2015}}
Bidirectional path tracing (BDPT) can render highly realistic scenes with complicated lighting scenarios. The Light Vertex Cache (LVC) based BDPT method by Davidovic et al. [Davidovic et al. 2014 ˇ ] provided good performance on scenes with simple materials in a progressive rendering scenario. In this paper, we propose a new bidirectional path tracing formulation based on the LVC approach that handles scenes with complex, layered materials efficiently on the GPU. We achieve coherent material evaluation while conserving GPU memory requirements using sorting. We propose a modified method for selecting light vertices using the contribution importance which improves the image quality for a given amount of work. Progressive rendering can empower artists in the production pipeline to iterate and preview their work quickly. We hope the work presented here will enable the use of GPUs in the production pipeline with complex materials and complicated lighting scenarios.
Dense view interpolation on mobile devices using focal stacks
PARIKSHIT VISHWAS SAKURIKAR,Narayanan P J
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2014
@inproceedings{bib_Dens_2014, AUTHOR = {PARIKSHIT VISHWAS SAKURIKAR, Narayanan P J}, TITLE = {Dense view interpolation on mobile devices using focal stacks}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2014}}
Light field rendering is a widely used technique to generate novel views of a scene from novel viewpoints. Interpolative methods for light field rendering require a dense description of the scene in the form of closely spaced images. In this work, we present a simple method for dense view interpolation over general static scenes, using commonly available mobile devices. We capture an approximate focal stack of the scene from adjacent camera locations and interpolate intermediate images by shifting each focal region according to appropriate disparities. We do not rely on focus distance control to capture focal stacks and describe an automatic method of estimating the focal textures and the blur and disparity parameters required for view interpolation.
Fast Burrows Wheeler Compression Using CPU and GPU
Deshpande Aditya Rajiv,Narayanan P J
International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, IPDPSW, 2014
@inproceedings{bib_Fast_2014, AUTHOR = {Deshpande Aditya Rajiv, Narayanan P J}, TITLE = {Fast Burrows Wheeler Compression Using CPU and GPU}, BOOKTITLE = {International Symposium on Parallel and Distributed Processing Workshops and Phd Forum}. YEAR = {2014}}
In this paper, we present an all-core implementation of Burrows Wheeler Compression algorithm that exploits all computing resources on a system. Our focus is to provide significant benefit to everyday users on common end-to-end applications by exploiting the parallelism of multiple CPU cores and many-core GPU on their machines. The all-core framework is suitable for problems that process large files or buffers in blocks. We consider a system to be made up of compute stations and use a work-queue to dynamically divide the tasks among them. Each compute station uses an implementation that optimally exploits its architecture. We develop a fast GPU BWC algorithm by extending the state-of-the-art GPU string sort to efficiently perform BWT step of BWC. Our hybrid BWC implementation achieves a 2.9× speedup over the best CPU implementation. Our all-core framework allows concurrent processing of blocks by both GPU and all available CPU cores. We achieve a 3.06× speedup by using all CPU cores and a 4.87× speedup using the GPU also in the all-core framework. Our approach will scale to the number and type of computing resources on a system.
Multistage sfm: Revisiting incremental structure from motion
RAJVI SHAH,Aditya Deshpande,Narayanan P J
International conference on 3D Vision, 3DV, 2014
@inproceedings{bib_Mult_2014, AUTHOR = {RAJVI SHAH, Aditya Deshpande, Narayanan P J}, TITLE = {Multistage sfm: Revisiting incremental structure from motion}, BOOKTITLE = {International conference on 3D Vision}. YEAR = {2014}}
In this paper, we present a new multistage approach for SfM reconstruction of a single component. Our method begins with building a coarse 3D reconstruction using highscale features of given images. This step uses only a fraction of features and is fast. We enrich the model in stages by localizing remaining images to it and matching and triangulating remaining features. Unlike traditional incremental SfM, localization and triangulation steps in our approach are made efficient and embarrassingly parallel using geometry of the coarse model. The coarse model allows us to use 3D-2D correspondences based direct localization techniques to register remaining images. We further utilize the geometry of the coarse model to reduce the pair-wise image matching effort as well as to perform fast guided feature matching for majority of features. Our method produces similar quality models as compared to incremental SfM methods while being notably fast and parallel. Our algorithm can reconstruct a 1000 images dataset in 15 hours using a single core, in about 2 hours using 8 cores and in a few minutes by utilizing full parallelism of about 200 cores.
Interactive Simulation of Generalised Newtonian Fluids using GPUs
SOMAY JAIN,NITISH TRIPATHI,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2014
@inproceedings{bib_Inte_2014, AUTHOR = {SOMAY JAIN, NITISH TRIPATHI, Narayanan P J}, TITLE = {Interactive Simulation of Generalised Newtonian Fluids using GPUs}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2014}}
We present a method to interactively simulate and visualise Generalised Newtonian Fluids (GNF) using GPUs. GNFs include regular constant viscosity fluids as well as other fluids such as blood, which display variable viscosity due to variable shear rate. We use a statistical approach called Lattice Boltzmann Method (LBM) for the simulation. LBM is easy to understand and implement and does not include discretisation of differential equations. We exploit the inherent parallelism of LBM coupled with its memory access pattern to create a fast GPU implementation that gives scientifically accurate and fast results such as interactive real time simulations for reasonable domain size. MultiGPU implementations provide the potential to scale to larger problem size
Generalized Newtonian Fluid Simulations
NITISH TRIPATHI,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2013
@inproceedings{bib_Gene_2013, AUTHOR = {NITISH TRIPATHI, Narayanan P J}, TITLE = {Generalized Newtonian Fluid Simulations}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2013}}
We present an approach to simulate both Newto-nian and generalized Newtonian fluids using Lattice BoltzmannMethod. The focus has been on accurately modelling non-Newtonian fluids at the micro channel level from biological fluidsin the past. Our method can model macroscopic behaviour of suchfluids by simulating the variation of properties such as viscositythrough the bulk of the fluid. The method works regardless ofthe magnitude of flow, be it through a thin tube or a largequantity of liquid splashing in a container. We simulate thechange in viscosity of a generalized Newtonian fluid and its freesurface interactions with obstacles and boundaries. We harnessthe inherent parallelism of Lattice Boltzmann Method to give afast GPU implementation for the same
Interactive video manipulation using object trajectories and scene backgrounds
RAJVI SHAH,Narayanan P J
IEEE Transactions on Circuits and Systems for Video Technology, TCSVTech, 2013
@inproceedings{bib_Inte_2013, AUTHOR = {RAJVI SHAH, Narayanan P J}, TITLE = {Interactive video manipulation using object trajectories and scene backgrounds}, BOOKTITLE = {IEEE Transactions on Circuits and Systems for Video Technology}. YEAR = {2013}}
—Traditional video editing interfaces model and represent videos as a collection of frames against a timeline, which makes object-centric manipulation of videos a laborious task. We enable simple and meaningful interaction for object-centric navigation and manipulation of long shot videos, by introducing operators on three high-level video semantics: background mosaics, object motions, and camera motions. We estimate the scene background and represent the object motion using 3D space-time trajectories. We use the 3D object trajectories as basic interaction elements and define several object and camera operations as simple and intuitive curve manipulations. These allow users to perform various video object temporal manipulations by interactively manipulating the object trajectories. The camera operations model the camera as a movable and scalable aperture and allow the users to simulate pan, tilt, and zoom effects by creating new camera trajectories. With several example compositions we demonstrate that our representation and operations allow users to simply and interactively perform numerous seemingly complex, high-level video manipulation tasks.
Parallel divide and conquer ray tracing
SRINATH R,Narayanan P J
International Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, 2013
@inproceedings{bib_Para_2013, AUTHOR = {SRINATH R, Narayanan P J}, TITLE = {Parallel divide and conquer ray tracing}, BOOKTITLE = {International Conference on Computer Graphics and Interactive Techniques}. YEAR = {2013}}
Divide and Conquer Ray Tracing (DACRT) is a recent technique which constructs no explicit acceleration structure. It creates and traverses an implicit hierarchy in a depth-first fashion recursively and is suited for dynamic scenes that change constantly. In this paper, we present a parallel version of DACRT that runs entirely on the GPU, which exploits efficient primitives like sort and reduce. Our approach suits the GPU well, with a low memory footprint. Our implementation outperforms the serial CPU algorithm for both primary and secondary ray passes. We show good performance on primary pass and on advanced effects.
Can GPUs sort strings efficiently?
Deshpande Aditya Rajiv,Narayanan P J
International Conference on High Performance Computing, HiPC, 2013
@inproceedings{bib_Can__2013, AUTHOR = {Deshpande Aditya Rajiv, Narayanan P J}, TITLE = {Can GPUs sort strings efficiently?}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2013}}
String sorting or variable-length key sorting has lagged in performance on the GPU even as the fixed-length key sorting has improved dramatically. Radix sorting is the fastest on the GPUs. In this paper, we present a fast and efficient string sort on the GPU that is built on the available radix sort. Our method sorts strings from left to right in steps, moving only indexes and small prefixes for efficiency. We reduce the number of sort steps by adaptively consuming maximum string bytes based on the number of segments in each step. Performance is improved by using Thrust primitives for most steps and by removing singleton segments from consideration. Over 70% of the string sort time is spent on Thrust primitives. This provides high performance along with high adaptability to future GPUs. We achieve speed of up to 10 over current GPU methods, especially on large datasets. We also scale to much larger input sizes. We present results on easy and difficult strings defined using their after-sort tie lengths.
Generalised Newtonian Flow Simulations using Lattice Boltzmann Method
NITISH TRIPATHI,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2013
@inproceedings{bib_Gene_2013, AUTHOR = {NITISH TRIPATHI, Narayanan P J}, TITLE = {Generalised Newtonian Flow Simulations using Lattice Boltzmann Method}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2013}}
We present an approach to simulate both Newtonian and generalized Newtonian fluids using Lattice Boltzmann Method. The focus has been on accurately modelling nonNewtonian fluids at the micro channel level from biological fluids in the past. Our method can model macroscopic behaviour of such fluids by simulating the variation of properties such as viscosity through the bulk of the fluid. The method works regardless of the magnitude of flow, be it through a thin tube or a large quantity of liquid splashing in a container. We simulate the change in viscosity of a generalized Newtonian fluid and its free surface interactions with obstacles and boundaries. We harness the inherent parallelism of Lattice Boltzmann Method to give a fast GPU implementation for the same.
Fast registration of articulated objects from depth images
Sourabh Prajapati,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2013
@inproceedings{bib_Fast_2013, AUTHOR = {Sourabh Prajapati, Narayanan P J}, TITLE = {Fast registration of articulated objects from depth images}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2013}}
We present an approach for fast registration of a Global Articulated 3D Model to RGBD data from Kinect. Our approach uses geometry based matching of rigid parts of the articulated objects in depth images. The registration is performed in a parametric space of transformations independently for each segment. The time for registering each frame with the global model is reduced greatly using this method. We experimented the algorithm with different articulated object datasets and obtained significantly low execution time as compared to ICP algorithm when applied on each rigid part of the articulated object.
Fast graph cuts using shrink-expand reparameterization
PARIKSHIT VISHWAS SAKURIKAR,Narayanan P J
Winter Conference on Applications of Computer Vision Workshops, WACV-W, 2012
@inproceedings{bib_Fast_2012, AUTHOR = {PARIKSHIT VISHWAS SAKURIKAR, Narayanan P J}, TITLE = {Fast graph cuts using shrink-expand reparameterization}, BOOKTITLE = {Winter Conference on Applications of Computer Vision Workshops}. YEAR = {2012}}
Global optimization of MRF energy using graph cuts iswidely used in computer vision. As the images are gettinglarger, faster graph cuts are needed without sacrificing op-timality. Initializing or reparameterizing a graph using re-sults of a similar one has provided efficiency in the past. Inthis paper, we present a method to speedup graph cuts us-ing shrink-expand reparameterization. Our scheme mergesthe nodes of a given graph to shrink it. The resulting graphand its mincut are expanded and used to reparameterize theoriginal graph for faster convergence. Graph shrinking canbe done in different ways. We use a block-wise shrinkingsimilar to multiresolution processing of images in our Mul-tiresolution Cuts algorithm. We also develop a hybrid ap-proach that can mix nodes from different levels without af-fecting optimality. Our algorithm is particularly suited forprocessing large images. The processing time on the fulldetail graph reduces nearly by a factor of 4. The overallapplication time including all book-keeping is faster by afactor of 2 on various types of images
Designing perspectively correct multiplanar displays
PAWAN KUMAR HARISH,Narayanan P J
IEEE Transactions on Visualization and Computer Graphics, TVCG, 2012
@inproceedings{bib_Desi_2012, AUTHOR = {PAWAN KUMAR HARISH, Narayanan P J}, TITLE = {Designing perspectively correct multiplanar displays}, BOOKTITLE = {IEEE Transactions on Visualization and Computer Graphics}. YEAR = {2012}}
Displays remain flat and passive amidst the many changes in their fundamental technologies. One natural step ahead is tocreate displays that merge seamlessly in shape and appearance with one’s natural surroundings. In this paper, we present a system todesign, render to, and build view-dependentmultiplanar displaysof arbitrary piecewise-planar shapes, built using polygonal facets.Our system provides high quality, interactive rendering of 3D environments to a head-tracked viewer on arbitrary multiplanar displays.We develop a novel rendering scheme that produces exact image and depth map at each facet, producing artifact-free images on andacross facet boundaries. The system scales to a large number of display facets by rendering all facets in a single pass of rasterization.This is achieved using a parallel, perframe, view-dependent binning and prewarping of scene triangles. The display is driven using oneor more targetquilt imagesinto which facet pixels are packed. Our method places no constraints on the scene or the display and allowsfor fully dynamic scenes to be rendered interactively at high resolutions. The steps of our system are implemented efficiently oncommodity GPUs. We present a few prototype displays to establish the scalability of our system on different display shapes, formfactors, and complexity: from a cube made out of LCD panels to spherical/cylindrical projected setups to arbitrary complex shapes insimulation. Performance of our system is demonstrated for both rendering quality and speed, for increasing scene and display facetsizes. A subjective user study is also presented to evaluate the user experience using a walk-around display compared to a flat panelfor a game-like setting.
Mixed-resolution patch-matching
HARSHIT SUREKA,Narayanan P J
European Conference on Computer Vision, ECCV, 2012
@inproceedings{bib_Mixe_2012, AUTHOR = {HARSHIT SUREKA, Narayanan P J}, TITLE = {Mixed-resolution patch-matching}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2012}}
Matching patches of a source image with patches of itself ora target image is a first step for many operations. Finding the optimumnearest-neighbors of each patch using a global search of the image isexpensive. Optimality is often sacrificed for speed as a result. We presentthe Mixed-Resolution Patch-Matching (MRPM) algorithm that uses apyramid representation to perform fast global search. We compare mixed-resolution patches at coarser pyramid levels to alleviate the effects ofsmoothing. We store more matches at coarser resolutions to ensure widersearch ranges and better accuracy at finer levels. Our method achievesnear optimality in terms of average error compared to exhaustive search.Our approach is simple compared to complex trees or hash tables used byothers. This enables fast parallel implementations on the GPU, yieldingupto 70×speedup compared to other iterative approaches. Our approachis best suited when multiple, global matches are needed
Visibility probability structure from sfm datasets and applications
SIDDHARTH CHOUDHARY,Narayanan P J
European Conference on Computer Vision, ECCV, 2012
@inproceedings{bib_Visi_2012, AUTHOR = {SIDDHARTH CHOUDHARY, Narayanan P J}, TITLE = {Visibility probability structure from sfm datasets and applications}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2012}}
Large scale reconstructions of camera matrices and pointclouds have been created using structure from motion from communityphoto collections. Such a dataset is rich in information; it representsa sampling of the geometry and appearance of the underlying space.In this paper, we encode the visibility information between and amongpoints and cameras asvisibility probabilities. The conditional visibilityprobability of a set of points on a point (or a set of cameras on a camera)can rank points (or cameras) based on their mutual dependence. Wecombine the conditional probability with a distance measure to prioritizepoints for fast guided search for the image localization problem. Wedefine dual problem of feature triangulation as finding the 3D coordinatesof a given image feature point. We use conditional visibility probabilityto quickly identify a subset of cameras in which a feature is visible
Distributed massive model rendering
REVANTH REDDY N R,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2012
@inproceedings{bib_Dist_2012, AUTHOR = {REVANTH REDDY N R, Narayanan P J}, TITLE = {Distributed massive model rendering}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2012}}
Graphics models are getting increasingly bulkier with de-tailed geometry, textures, normal maps, etc. There is a lotof interest to model and navigate through detailed models oflarge monuments. Many monuments of interest have bothrich detail and large spatial extent. Rendering them for navi-gation on a single workstation is practically impossible, evengiven the power of today’s CPUs and GPUs. Many modelsmay not fit the GPU memory, the CPU memory, or eventhe secondary storage of the CPU. Distributed renderingusing a cluster of workstations is the only way to navigatethrough such models. In this paper, we present a design ofa distributed rendering system intended for massive mod-els. Our design has a server that holds the skeleton of thewhole model, namely, its scenegraph with actual geometryreplaced by bounding boxes at all levels. The server dividesthe screen space among a number of clients and sends thema list of objects they need to render using a frustum cullingstep. The clients use 2 GPUs with one devoted to visibilityculling and the other to rendering. Frustum culling at theserver, visibility culling on one GPU, and rendering on thesecond GPU form the stages of our distributed renderingpipeline. We describe the design and implementation of oursystem and demonstrate the results of rendering relativelylarge models using different clusters of clients in this paper
Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy
ROHIT NIGAM,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2012
@inproceedings{bib_Hybr_2012, AUTHOR = {ROHIT NIGAM, Narayanan P J}, TITLE = {Hybrid ray tracing and path tracing of Bezier surfaces using a mixed hierarchy}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2012}}
We present a scheme for interactive ray tracing of Bezier bicubicpatches using Newton iteration in this paper. We use a mixed hi-erarchy representation as the acceleration structure. This has abounding volume hierarchy above the patches and a fixed depthsubpatch tree below it. This helps reduce the number of ray-patchintersections that needs to be evaluated and provides good initial-ization for the iterative step, keeping the memory requirements low.We use Newton iteration on the generated list of ray patch intersec-tions in parallel. Our method can exploit the cores of the CPU andthe GPU with OpenMP on the CPU and CUDA on the GPU bysharing work between them according to their relative speeds. Adata parallel framework is used throughout starting with a list ofrays, which is transformed to a list of ray-patch intersections bytraversal and then to intersections and a list of secondary rays byroot finding. Shadow and reflection rays can be handled exactly inthe same manner as a result. We also show how our method ex-tends easily to generate soft shadows using area light sources andpath tracing by tracing a large number of rays per pixel. We rendera million pixel image of the Teapot model at 125 fps on a systemwith an Intel i7 920 and a Nvidia GTX580 for primary rays onlyand at about 65 fps with one pass of shadow and refection rays.
Increasing intensity resolution on a single display using spatio-temporal mixing
PAWAN KUMAR HARISH,PARIKSHIT VISHWAS SAKURIKAR,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2012
@inproceedings{bib_Incr_2012, AUTHOR = {PAWAN KUMAR HARISH, PARIKSHIT VISHWAS SAKURIKAR, Narayanan P J}, TITLE = {Increasing intensity resolution on a single display using spatio-temporal mixing}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2012}}
Displays have seen much improvements over the years, withenhancements in spatial resolution and vertical refresh, etc.,to provide better and smoother visual experiences. Color in-tensity resolution, however, has not changed much over thepast few decades. Most displays are still limited to 8-bitsper channel. Simultaneously, much work has gone into cap-turing high dynamic range images. Mapping these directlyto current displays loses information that may be critical tomany applications. We present a way to enhance intensityresolution of a given display by mixing intensities over spa-tial or temporal domains. Our system sacrifices high verticalrefresh and spatial resolution in order to gain intensity res-olution. We present three ways to mix intensities: spatially,temporally and spatio-temporally. The systems produce in-between-intensities not present on the base display, whichare clearly distinguishable by the naked eye. We evaluateour systems using both a camera and human subjects, eval-uating whether they scale the intensity resolution and alsoensuring that the newly generated intensities follow the dis-play model
Geometry directed browser for personal photographs
Deshpande Aditya Rajiv,SIDDHARTH CHOUDHARY,Narayanan P J,KRISHNA KUMAR SINGH,KAUSTAV KUNDU,ADITYA SINGH,APURVA KUMAR
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2012
@inproceedings{bib_Geom_2012, AUTHOR = {Deshpande Aditya Rajiv, SIDDHARTH CHOUDHARY, Narayanan P J, KRISHNA KUMAR SINGH, KAUSTAV KUNDU, ADITYA SINGH, APURVA KUMAR}, TITLE = {Geometry directed browser for personal photographs}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2012}}
Browsers of personal digital photographs all essentially fol-low the slide show paradigm, sequencing through the pho-tos in the order they are taken. A more engaging way tobrowse personal photographs, especially of a large space likea popular monument, should involve the geometric contextof the space. In this paper, we present ageometry directedphoto browserthat enables users to browse their personalpictures with the underlying geometry of the space to guidethe process. The browser uses a pre-computed package ofgeometric information about the monument for this task.The package is used to register a set of photographs takenby the user with the common geometric space of the monu-ment. This involves localizing the images to the monumentspace by computing the camera matrix corresponding to it.We use a state-of-the-art method for fast localization. Reg-istered photographs can be browsed using a visualizationmodule that shows them in the proper geometric contextwith respect to a point-based 3D model of the monument.We present the design of the geometry-directed browser anddemonstrate its utility for a few sequences of personal im-ages of well-known monuments. We believe personal photobrowsers can provide an enhanced sense of one’s own ex-perience with a monument using the underlying geometriccontext of the monument.
Discrete range searching primitive for the GPU and its applications
JYOTHISH SOMAN,Kishore Kothapalli,Narayanan P J
ACM Journal of Experimental Algorithmics, JEA, 2012
@inproceedings{bib_Disc_2012, AUTHOR = {JYOTHISH SOMAN, Kishore Kothapalli, Narayanan P J}, TITLE = {Discrete range searching primitive for the GPU and its applications}, BOOKTITLE = {ACM Journal of Experimental Algorithmics}. YEAR = {2012}}
Graphics processing units (GPUs) provide large computational power at a very low price, which position GPUs well as an ubiquitous accelerator. However, GPUs are space constrained, and hence applications developed for GPUs are space sensitive. Space-constrained computational devices such as GPUs can greatly benefit from representations that reduce space consumption drastically. One such representation is the succinct representation of trees. Succinct representation of trees generally allows for operations such as parent queries, least common ancestor queries, and so on. Mapping such a robust representation to the GPU for targeted applications can lead to substantial improvement in problem sizes that are processed at a given point of time. Space-saving methods such as succinct data structures remain largely unexplored on the GPU. In this work, a succinct representation of ordered trees on the GPU is explored, with application to discrete range searching (DRS). Based on the succinct representations found applicable, a space--saving solution for DRS is presented here. In our method, DRS is mapped to a least common ancestor query on a Cartesian tree. For space-efficient DRS queries, we store the succinct representation of the Cartesian tree of an array. Our method uses a maximum of 7.5 bits of additional space per element. Furthermore, the speed-up achieved by our method is in the range of 20--25 for preprocessing and 25--35 for batch querying over a sequential implementation. Compared to an 8-threaded implementation, our preprocessing and querying methods obtain a speed-up of 6--8. We also study the applications of the DRS on the GPU. Efficient primitives expand the range of applications performed on the GPU. DRS is one such primitive with direct applications to string processing, document and text retrieval systems, and least common ancestor queries. We suggest that graph algorithms that use the least common ancestor, can be enabled on the GPU based on DRS primitive. We also show some applications of DRS in tree queries and string querying.
Person de-identification in videos
PRACHI AGRAWAL,Narayanan P J
IEEE Transactions on Circuits and Systems for Video Technology, TCSVTech, 2011
@inproceedings{bib_Pers_2011, AUTHOR = {PRACHI AGRAWAL, Narayanan P J}, TITLE = {Person de-identification in videos}, BOOKTITLE = {IEEE Transactions on Circuits and Systems for Video Technology}. YEAR = {2011}}
Advances in cameras and web technology have madeit easy to capture and share large amounts of video data over toa large number of people. A large number of cameras overseepublic and semi-public spaces today. These raise concerns on theunintentional and unwarranted invasion of the privacy of individ-uals caught in the videos. To address these concerns, automatedmethods tode-identifyindividuals in these videos are necessary.De-identification does not aim at destroying all informationinvolving the individuals. Its ideal goals are to obscure the identityof the actor without obscuring the action. This paper outlinesthe scenarios in which de-identification is required and the issuesbrought out by those. We also present an approach to de-identifyindividuals from videos. Our approach involves tracking andsegmenting individuals in a conservative voxel space involvingx, y,and time. A de-identification transformation is applied perframe using these voxels to obscure the identity. Face, silhouette,gait, and other characteristics need to be obscured, ideally. Weshow results of our scheme on a number of videos and forseveral variations of the transformations. We present the resultsof applying algorithmic identification on the transformed videos.We also present the results of a user-study to evaluate how wellhumans can identify individuals from the transformed videos
Raytracing dynamic scenes on the GPU using grids
SASHIDHAR GUNTURY,Narayanan P J
IEEE Transactions on Visualization and Computer Graphics, TVCG, 2011
@inproceedings{bib_Rayt_2011, AUTHOR = {SASHIDHAR GUNTURY, Narayanan P J}, TITLE = {Raytracing dynamic scenes on the GPU using grids}, BOOKTITLE = {IEEE Transactions on Visualization and Computer Graphics}. YEAR = {2011}}
Raytracing dynamic scenes at interactive rates have received a lot of attention recently. We present a few strategies forhigh performance raytracing on a commodity GPU. The construction of grids needs sorting, which is fast on today’s GPUs. The grid isthus the acceleration structure of choice for dynamic scenes as per-frame rebuilding is required. We advocate the use of appropriatedata structures for each stage of raytracing, resulting in multiple structure building per frame. A perspective grid built for the cameraachieves perfect coherence for primary rays. A perspective grid built with respect to each light source provides the best performancefor shadow rays. Spherical grids handle lights positioned inside the model space and handle spotlights. Uniform grids are best forreflection and refraction rays with little coherence. We propose an Enforced Coherence method to bring coherence to them byrearranging the ray to voxel mapping using sorting. This gives the best performance on GPUs with only user-managed caches. Wealso propose a simple, Independent Voxel Walk method, which performs best by taking advantage of the L1 and L2 caches on recentGPUs. We achieve over 10 fps of total rendering on the Conference model with one light source and one reflection bounce, whilerebuilding the data structure for each stage. Ideas presented here are likely to give high performance on the future GPUs as well asother manycore architectures
Trajectory based video object manipulation
RAJVI SHAH,Narayanan P J
International Conference on Multimedia and Expo, ICME, 2011
@inproceedings{bib_Traj_2011, AUTHOR = {RAJVI SHAH, Narayanan P J}, TITLE = {Trajectory based video object manipulation}, BOOKTITLE = {International Conference on Multimedia and Expo}. YEAR = {2011}}
We propose an object centric representation for easy and in-tuitive navigation and manipulation of videos. Object centricrepresentation allows a user to directly access and process ob-jects as basic video components. We demonstrate a trajectorybased interface and example operations, which allow users toretime, reorder, remove or clone video objects in a ‘click anddrag’ fashion. This interface is created by extracting objectmotion information from the video. We use object detectionand tracking to obtain spatiotemporal video object tube. Thecorresponding object motion trajectories are represented in a3D(x,y,t)grid. Users can navigate and manipulate videoobjects by scrubbing or manipulating corresponding trajecto-ries. We show some example applications of proposed inter-face like object synchronization, saliency magnification, vi-sual effects and composite video creation.
A GPU-assisted personal video organizing system
KHAJA WASIF MOHIUDDIN,Narayanan P J
International Conference on Computer Vision Workshops, ICCV-W, 2011
@inproceedings{bib_A_GP_2011, AUTHOR = {KHAJA WASIF MOHIUDDIN, Narayanan P J}, TITLE = {A GPU-assisted personal video organizing system}, BOOKTITLE = {International Conference on Computer Vision Workshops}. YEAR = {2011}}
Video data is increasing rapidly along with the capacity of storage devices owned by a lay user. Users have moderate to large personal collections of videos and would like to keep them in an organized manner based on its content. Video organizing tools for personal users are way behind even the primitive image organizing tools. We present a mechanism in this paper to help ordinary users organize their personal collection of videos based on categories they choose. We cluster the PHOG features extracted from selected key frames to form a representation for each userselected category during the learning phase. During the organization phase, labels from a K -NN classifier on these cluster centres for each key frame are aggregated to give a label to the video while categorizing. Video processing is computationally intensive. To peiform the computationally intensive steps involved, we exploit the CPU as well as the GPU that is common even on personal systems. Effective use of the parallel hardware on the system is the only way to make the tool scale reasonably to large collections that will be available soon. Our tool is able to organize a set of 100 sport videos of total duration of 1375 minutes in about 9.5 minutes. The process of learning the categories from 12 annotated videos of duration 165 minutes took 75 seconds on a GTX 580 card. These were on a standard desktop with an off-the-shelfGPU. The labeling accuracy is about 96% on all video
A New Measure of Detail for Triangulated Meshes
ISHAAN SINGH,ROHITH B V,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2011
@inproceedings{bib_A_Ne_2011, AUTHOR = {ISHAAN SINGH, ROHITH B V, Narayanan P J}, TITLE = {A New Measure of Detail for Triangulated Meshes}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2011}}
s the complexity of 3D models used in computergraphics applications grows, there arises a need to visualize theoverall distribution of detail on them. Detail is a function ofthe amount of information present on a surface. In this paper,we present a method to quantify detail using a combination oflocal measures of curvature and density. We show that detailcan be used for applications like ordering for mesh decimation,visualizing abnormalities in a mesh and so on
The de-identification camera
MRITYUNJAY,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2011
@inproceedings{bib_The__2011, AUTHOR = {MRITYUNJAY, Narayanan P J}, TITLE = {The de-identification camera}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2011}}
Visual surveillance is increasingly prevalent todaybut the privacy issues of individuals involved in surveillancevideos have not been dealt with adequately so far. In mostcases, if not all, the chief purpose of placing a camera can beserved without knowing the identity of the individuals involvedunless the activity is of some predetermined kind. One needsto transform these videos to protect identity of individualsinvolved possibly at the source camera itself. In this paperwe present the De-Identification Camera, which is a scalable,low cost and real-time solution to the privacy protection issues.Our main contribution lies in proposing a privacy protectionarchitecture which transforms the video at the camera level itself,We also present and implement a de-identification pipeline whichis suitable for real-time implementation. We implemented thesystem on a Texas Instrument OMAP4 based embedded platformand was able to de-identify videos in real time, transformingthe video on the camera itself ensures protection from variousattacks. We also address issues like data utility of surveillancevideos by making this solution customizable. We propose thatsuch systems can replace the traditional surveillance cameras inthe future by providing all the surveillance and privacy protectionsolutions on hardware, probably with few performance upgrades.
Hybrid implementation of error diffusion dithering
Deshpande Aditya Rajiv,MISRA ISHAN SATISH,Narayanan P J
International Conference on High Performance Computing, HiPC, 2011
@inproceedings{bib_Hybr_2011, AUTHOR = {Deshpande Aditya Rajiv, MISRA ISHAN SATISH, Narayanan P J}, TITLE = {Hybrid implementation of error diffusion dithering}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2011}}
Many image filtering operations provide ample par-allelism, but progressive non-linear processing of im-ages is among the hardest to parallelize due to long,sequential, and non-linear data dependency. A typicalexample of such an operation is error diffusion dither-ing, exemplified by the Floyd-Steinberg algorithm. Inthis paper, we present its parallelization on multicoreCPUs using a block-based approach and on the GPUusing a pixel based approach. We also present a hybridapproach in which the CPU and the GPU operatein parallel during the computation. High PerformanceComputing has traditionally been associated with highend CPUs and GPUs. Our focus is on everyday com-puters such as laptops and desktops, where significantcompute power is available on the GPU as on theCPU. Our implementation can dither an8K8Kimage on an off-the-shelf laptop with an Nvidia 8600MGPU in about 400 milliseconds when the sequentialimplementation on its CPU took about 4 seconds.
Scalable clustering using multiple GPUs
KHAJA WASIF MOHIUDDIN,Narayanan P J
International Conference on High Performance Computing, HiPC, 2011
@inproceedings{bib_Scal_2011, AUTHOR = {KHAJA WASIF MOHIUDDIN, Narayanan P J}, TITLE = {Scalable clustering using multiple GPUs}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2011}}
K-Means is a popular clustering algorithm with wideapplications in Computer Vision, Data mining, Data Visu-alization, etc. Clustering is an important step for indexingand searching of documents, images, video, etc. Clusteringlarge numbers of high-dimensional vectors is very compu-tation intensive. In this paper, we present the design andimplementation of theK-Means clustering algorithm on themodern GPU. All steps are performed entirely on the GPUefficiently in our approach. We also present a load balancedmulti-node, multi-GPU implementation which can handleup to 6 million, 128-dimensional vectors. We use efficientmemory layout for all steps to get high performance. TheGPU accelerators are now present on high-end workstationsand low-end laptops. Scalability in the number and dimen-sionality of the vectors, the number of clusters, as well as inthe number of cores available for processing are importantfor usability to different users. Our implementation scaleslinearly or near-linearly with different problem parameters.We achieve up to 2 times increase in speed compared to thebest GPU implementation forK-Means on a single GPU.We obtain a speed up of over 170 on a single Nvidia FermiGPU compared to a standard sequential implementation.We are able to execute one iteration ofK-Means in 136seconds on off-the-shelf GPUs to cluster 6 million vectorsof 128 dimensions into 4K clusters and in 2.5 seconds tocluster 125K vectors of 128 dimensions into 2K clusters.
Fast two dimensional convex hull on the GPU
SRIKANTH SRUNGARAPU,B DURGA PRASAD REDDY,Kishore Kothapalli,Narayanan P J
International Conference on Advanced Information Networking and Applications Workshops, AINA-W, 2011
@inproceedings{bib_Fast_2011, AUTHOR = {SRIKANTH SRUNGARAPU, B DURGA PRASAD REDDY, Kishore Kothapalli, Narayanan P J}, TITLE = {Fast two dimensional convex hull on the GPU}, BOOKTITLE = {International Conference on Advanced Information Networking and Applications Workshops}. YEAR = {2011}}
General purpose programming on the graphics processing units(GPGPU) has received a lot of attention in the parallel computing community as it promises to offer a large computational power at a very low price. GPGPU is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as convex hull, list ranking etc. In this paper, we present a GPU-optimized implementation for finding the convex hull of a two dimensional point set. Our implementation tries to minimize the impact of irregular data access patterns. Our implementation can find the convex hull of 10 million random points in less than 0.2 seconds and achieves a speedup of up to 14 over the standard sequential CPU implementation. We also discuss some of the practical issues relating to the implementation of convex hull algorithms on massively multi-threaded architectures like that of the GPU.
Interactive Visualization and Tuning of SIFT Indexing.
DASARI PAVAN KUMAR,Narayanan P J
International Symposium on Vision, Modeling, and Visualization, VMV, 2010
@inproceedings{bib_Inte_2010, AUTHOR = {DASARI PAVAN KUMAR, Narayanan P J}, TITLE = {Interactive Visualization and Tuning of SIFT Indexing.}, BOOKTITLE = {International Symposium on Vision, Modeling, and Visualization}. YEAR = {2010}}
ndexing image data for content-based image search is an important area in Computer Vision. The state of theart uses the 128-dimensional SIFT as low level descriptors. Indexing even a moderate collection involves severalmillions of such vectors. The search performance depends on the quality of indexing and there is often a need tointeractively tune the process for better accuracy. In this paper, we propose a a visualization-based tool to tunethe indexing process for images and videos. We use a feature selection approach to improve the clustering of SIFTvectors. Users can visualize the quality of clusters and interactively control the importance of individual or groupsof feature dimensions easily. The results of the process can be visualized quickly and the process can be repeated.The user can use a filter or a wrapper model in our tool. We use input sampling, GPU-based processing, and visualtools to analyze correlations to provide interactivity. We present results of tuning the indexing for a few standarddatasets. A few tuning iterations result in an improvement of over 4% in the final classification performance, whichis significant.
A fast GPU algorithm for graph connectivity
JYOTHISH SOMAN,Kishore Kothapalli,Narayanan P J
International Parallel & Distributed Processing Symposium Workshops, IPDPS-W, 2010
@inproceedings{bib_A_fa_2010, AUTHOR = {JYOTHISH SOMAN, Kishore Kothapalli, Narayanan P J}, TITLE = {A fast GPU algorithm for graph connectivity}, BOOKTITLE = {International Parallel & Distributed Processing Symposium Workshops}. YEAR = {2010}}
Graphics processing units provide a large compu-tational power at a very low price which position them as anubiquitous accelerator. General purpose programming on thegraphics processing units (GPGPU) is best suited for regulardata parallel algorithms. They are not directly amenable foralgorithms which have irregular data access patterns suchas list ranking, and finding the connected components of agraph, and the like. In this work, we present a GPU-optimizedimplementation for finding the connected components of agiven graph. Our implementation tries to minimize the impactof irregularity, both at the data level and functional level.Our implementation achieves a speed up of 9 to 12 timesover the best sequential CPU implementation. For instance, ourimplementation finds connected components of a graph of 10million nodes and 60 million edges in about 500 millisecondson a GPU, given a random edge list. We also draw interestingobservations on why PRAM algorithms, such as the Shiloach-Vishkin algorithm may not be a good fit for the GPU and howthey should be modified.
GPU-accelerated genetic algorithms
RAJVI SHAH,Narayanan P J,Kishore Kothapalli
Technical Report, arXiv, 2010
@inproceedings{bib_GPU-_2010, AUTHOR = {RAJVI SHAH, Narayanan P J, Kishore Kothapalli}, TITLE = {GPU-accelerated genetic algorithms}, BOOKTITLE = {Technical Report}. YEAR = {2010}}
Genetic algorithms are effective in solving many optimiza-tion tasks. However, the long execution time associated withit prevents its use in many domains. In this paper, we pro-pose a new approach for parallel implementation of geneticalgorithm on graphics processing units (GPUs) using CUDAprogramming model. We exploit the parallelism within achromosome in addition to the parallelism across multiplechromosomes. The use of one thread per chromosome byprevious efforts does not utilize the GPU resources effec-tively. Our approach uses multiple threads per chromosome,thereby exploiting the massively multithreaded GPU moreeffectively. This results in good utilization of GPU resourceseven at small population sizes while maintaining impressivespeed up for large population sizes. Our approach is mod-eled after the GAlib library and is adaptable to a varietyof problems. We obtain a speedup of over 1500 over theCPU on problems involving a million chromosomes. Prob-lems of such magnitude are not ordinarily attempted due tothe prohibitive computation times.
Practical time bundle adjustment for 3d reconstruction on the gpu
SIDDHARTH CHOUDHARY,SHUBHAM GUPTA,Narayanan P J
European Conference on Computer Vision, ECCV, 2010
@inproceedings{bib_Prac_2010, AUTHOR = {SIDDHARTH CHOUDHARY, SHUBHAM GUPTA, Narayanan P J}, TITLE = {Practical time bundle adjustment for 3d reconstruction on the gpu}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2010}}
Large-scale 3D reconstruction has received a lot of attentionrecently. Bundle adjustment is a key component of the reconstructionpipeline and often its slowest and most computational resource intensive.It hasn’t been parallelized effectively so far. In this paper, we present ahybrid implementation of sparse bundle adjustment on the GPU usingCUDA, with the CPU working in parallel. The algorithm is decomposedinto smaller steps, each of which is scheduled on the GPU or the CPU. Wedevelop efficient kernels for the steps and make use of existing libraries forseveral steps. Our implementation outperforms the CPU implementationsignificantly, achieving a speedup of 30-40 times over the standard CPUimplementation for datasets with upto 500 images on an Nvidia TeslaC2050 GPU.
Fast GPU algorithms for graph connectivity
JYOTHISH SOMAN,Kishore Kothapalli,Narayanan P J
Workshop on Large Sacle Parallel Processing, LSPP, 2010
@inproceedings{bib_Fast_2010, AUTHOR = {JYOTHISH SOMAN, Kishore Kothapalli, Narayanan P J}, TITLE = {Fast GPU algorithms for graph connectivity}, BOOKTITLE = {Workshop on Large Sacle Parallel Processing}. YEAR = {2010}}
Graphics processing units provide a large compu-tational power at a very low price which position them as an ubiquitous accelerator. General purpose programming on the graphics processing units (GPGPU) is best suited for regular data parallel algorithms. They are not directly amenable for algorithms which have irregular data access patterns such as list ranking, and finding the connected components of a graph, and the like. In this work, we present a GPU-optimized implementation for finding the connected components of a given graph. Our implementation tries to minimize the impact of irregularity, both at the data level and functional level. Our implementation achieves a speed up of 9 to 12 times over the best sequential CPU implementation. For instance, our implementation finds connected components of a graph of 10 million nodes and 60 million edges in about 500 milliseconds on a GPU, given a random edge list. We also draw interesting observations on why PRAM algorithms, such as the Shiloach-Vishkin algorithm may not be a good fit for the GPU and how they should be modified.
Some GPU algorithms for graph connected components and spanning tree
JYOTHISH SOMAN,Kishore Kothapalli,Narayanan P J
Parallel Processing Letters, JPPL, 2010
@inproceedings{bib_Some_2010, AUTHOR = {JYOTHISH SOMAN, Kishore Kothapalli, Narayanan P J}, TITLE = {Some GPU algorithms for graph connected components and spanning tree}, BOOKTITLE = {Parallel Processing Letters}. YEAR = {2010}}
Graphics Processing Units (GPU) are application specific accelerators which provide high performance to cost ratio and are widely available and used, hence places them as a ubiquitous accelerator. A computing paradigm based on the same is the general purpose computing on the GPU (GPGPU) model. The GPU due to its graphics lineage is better suited for the data-parallel, data-regular algorithms. The hardware architecture of the GPU is not suitable for the data parallel but data irregular algorithms such as graph connected components and list ranking. In this paper, we present results that show how to use GPUs efficiently for graph algorithms which are known to have irregular data access patterns. We consider two fundamental graph problems: finding the connected components and finding a spanning tree. These two problems find applications in several graph theoretical problems. In this paper we arrive at efficient GPU implementations for the above two problems. The algorithms focus on minimising irregularity at both algorithmic and implementation level. Our implementation achieves a speedup of 11-16 times over a corresponding best sequential implementation.
Efficient Discrete Range Searching primitives on the GPU with applications
JYOTHISH SOMAN,KIRAN KUMAR M,Kishore Kothapalli,Narayanan P J
International Conference on High Performance Computing, HiPC, 2010
@inproceedings{bib_Effi_2010, AUTHOR = {JYOTHISH SOMAN, KIRAN KUMAR M, Kishore Kothapalli, Narayanan P J}, TITLE = {Efficient Discrete Range Searching primitives on the GPU with applications}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2010}}
Graphics processing units provide a large computational power at a very low price which position them as an ubiquitous accelerator. Efficient primitives that can expand the r ange of operations performed on the GPU are thus important. Discrete Range Searching(DRS) is one such primitive with direct applications to string processing, document and text retrieval systems, and least common ancestor queries. In this work, we present a GPU specific implementation of DRS with an optimal space-time trade off. Toward this end, we also present GPU amenable succinct representations and discuss limitations on the GPU. Our method uses 7.5 bits of additional space per element. The speedup achieved by our method is in the range of 20-25 for preprocessing, and 25-35 for batch querying over a sequential implementation. Compared to an 8-threaded implementation, our methods obtain a speedup of 6-8. We study applications of the DRS on the GPU. Also, we suggest that most graph algorithms which focus on using least common ancestor, can easily be enabled on the GPU based on range minima primitive. Beyond this, we show applications of DRS in string querying and tree queries, and suggest how DRS can be helpful in implementing tree based graph algorithms on the GPU.
Large Graph Algorithms for Massively Multithreaded Architectures
PAWAN KUMAR HARISH,VIBHAV VINEET,Narayanan P J
Technical Report, arXiv, 2009
@inproceedings{bib_Larg_2009, AUTHOR = {PAWAN KUMAR HARISH, VIBHAV VINEET, Narayanan P J}, TITLE = {Large Graph Algorithms for Massively Multithreaded Architectures}, BOOKTITLE = {Technical Report}. YEAR = {2009}}
The Graphics Processing Units (GPUs) provide highcomputation power at a low cost and is an important computeaccelerator with a massively multithreaded architecture. Inthis paper, we present fast implementations of common graphoperations like breadth-first search, st-connectivity, single-sourceshortest path, all-pairs shortest path, minimum spanning tree,and maximum flow for undirected graphs on the GPU using theCUDA programming model. Our implementations exhibit highperformance, especially on large graphs. We use two data-parallelprogramming methodologies for these algorithms. One is aniterative, mask-based approach that processes valid data elementslike vertices and edges using independent threads for each.The other is a divide-and-conquer approach that reduces theproblem into smaller problems that are handled later using thesame approach. Parallel algorithms for such problems have beenreported in the literature before, especially on supercomputers.The massively multithreaded model of the GPU makes it possibleto adopt the data-parallel approach even to irregular algorithmslike graph algorithms, usingO(V)orO(E)simultaneous threads.The algorithms and the underlying techniques presented in thispaper are likely to be applicable to many irregular algorithms.We show the impact of our implementations on random, scale-free, and real-life graphs of up to millions of vertices on high-end and low-end GPUs. The availability and spread of GPUs todesktops and laptops make them ideal candidates to accelerategraph operations over the CPU-only implementations. Practicalimplementations of basic operations go a long way in realizingtheir potential.
Scalable Split and Gather Primitives for the GPU
SURYAKANT PATIDAR,Narayanan P J
Technical Report, arXiv, 2009
@inproceedings{bib_Scal_2009, AUTHOR = {SURYAKANT PATIDAR, Narayanan P J}, TITLE = {Scalable Split and Gather Primitives for the GPU}, BOOKTITLE = {Technical Report}. YEAR = {2009}}
We present efficient implementations of two primitives for datamapping and distribution on the massively multithreaded architec-ture of the GPUs in this paper. Thesplitprimitive distributes el-ements of a list according to their category. Split is an importantoperation for data mapping and is used to build data structures,distribute work load, etc., in a massively parallel environment.Thegather/scatterprimitive performs fast, distributed data move-ment. Efficient data movement is critical to high performance onthe GPUs as suboptimal memory accesses can pay heavy penalties.The split we implement is a generalization of the binary split [Blel-loch 1990] and is implemented using the shared memory and theatomic operations available on them. The split performance scaleslogarithmically with the number of categories, linearly with the listlength, and linearly with the number of cores on the GPU. Thismakes it useful for applications that deal with large data sets. Wealso present a variant of split that partitions the indexes of records.This facilitates the use of the GPU as a coprocessor for split or sort,with the actual data movement handled separately. We can computethe split indexes for a list of 32 million records in 180 millisecondsfor a 32-bit key and in 800 ms for a 96-bit key. The instantaneouslocality of memory references play a critical role in data movementon the current GPU memory architectures. For scatter and gatherinvolving large records, we use collective data movement in whichmultiple threads cooperate on individual records to improve the in-stantaneous locality. The split, gather, and their combinations findmany applications and expect our primitives will be used by fu-ture GPU programmers. We show sorting of 16 million 128-byterecords in 379 milliseconds with 4-byte keys and in 556 ms with8-byte keys
Real-time ray tracing of implicit surfaces on the GPU
JAG MOHAN SINGH,Narayanan P J
IEEE Transactions on Visualization and Computer Graphics, TVCG, 2009
@inproceedings{bib_Real_2009, AUTHOR = {JAG MOHAN SINGH, Narayanan P J}, TITLE = {Real-time ray tracing of implicit surfaces on the GPU}, BOOKTITLE = {IEEE Transactions on Visualization and Computer Graphics}. YEAR = {2009}}
Compact representation of geometry using a suitable procedural or mathematical model and a ray-tracing mode ofrendering fit the programmable graphics processor units (GPUs) well. Several such representations including parametric andsubdivision surfaces have been explored in recent research. The important and widely applicable category of the general implicitsurface has received less attention. In this paper, we present a ray-tracing procedure to render general implicit surfaces efficiently onthe GPU. Though only the fourth or lower order surfaces can be rendered using analytical roots, ouradaptive marching pointsalgorithm can ray trace arbitrary implicit surfaces without multiple roots, by sampling the ray at selected points till a root is found.Adapting the sampling step size based on a proximity measure and a horizon measure delivers high speed. The sign test can handleany surface without multiple roots. The Taylor test that uses ideas from interval analysis can ray trace many surfaces with complexroots. Overall, a simple algorithm that fits the SIMD architecture of the GPU results in high performance. We demonstrate the raytracing of algebraic surfaces up to order 50 and nonalgebraic surfaces including a Blinn’s blobby with 75 spheres at better thaninteractive frame rates.
Singular value decomposition on GPU using CUDA
LAHABAR SHEETAL MADHUKAR,Narayanan P J
International Parallel and Distributed Processing Symposium, IPDPS, 2009
@inproceedings{bib_Sing_2009, AUTHOR = {LAHABAR SHEETAL MADHUKAR, Narayanan P J}, TITLE = {Singular value decomposition on GPU using CUDA}, BOOKTITLE = {International Parallel and Distributed Processing Symposium}. YEAR = {2009}}
Linear algebra algorithms are fundamental to many com-puting applications. Modern GPUs are suited for manygeneral purpose processing tasks and have emerged asinexpensive high performance co-processors due to theirtremendous computing power. In this paper, we present theimplementation of singular value decomposition (SVD) of adense matrix on GPU using the CUDA programming model.SVD is implemented using the twin steps of bidiagonalizationfollowed by diagonalization. It has not been implemented onthe GPU before. Bidiagonalization is implemented using aseries of Householder transformations which map well toBLAS operations. Diagonalization is performed by applyingthe implicitly shifted QR algorithm. Our complete SVDimplementation outperforms the MATLAB and IntelR©MathKernel Library (MKL) LAPACK implementation significantlyon the CPU. We show a speedup of upto60over theMATLAB implementation and upto8over the Intel MKLimplementation on a Intel Dual Core2.66GHz PC onNVIDIA GTX280for large matrices. We also give resultsfor very large matrices on NVIDIA Tesla S1070.
Fast minimum spanning tree for large graphs on the GPU
VIBHAV VINEET,PAWAN KUMAR HARISH,SURYAKANT PATIDAR,Narayanan P J
Conference on High Performance Graphics, HPG, 2009
@inproceedings{bib_Fast_2009, AUTHOR = {VIBHAV VINEET, PAWAN KUMAR HARISH, SURYAKANT PATIDAR, Narayanan P J}, TITLE = {Fast minimum spanning tree for large graphs on the GPU}, BOOKTITLE = {Conference on High Performance Graphics}. YEAR = {2009}}
Graphics Processor Units are used for many general purpose pro-cessing due to high compute power available on them. Regular,data-parallel algorithms map well to the SIMD architecture of cur-rent GPU. Irregular algorithms on discrete structures like graphs areharder to map to them. Efficient data-mapping primitives can playcrucial role in mapping such algorithms onto the GPU. In this paper,we present a minimum spanning tree algorithm on Nvidia GPUsunder CUDA, as a recursive formulation of Bor ̊uvka’s approach forundirected graphs. We implement it using scalable primitives suchas scan, segmented scan and split. The irregular steps of superver-tex formation and recursive graph construction are mapped to prim-itives like split to categories involving vertex ids and edge weights.We obtain30to50times speedup over the CPU implementationon most graphs and3to10times speedup over our previous GPUimplementation. We construct the minimum spanning tree on a5million node and30million edge graph in under1second on onequarter of the Tesla S1070GPU.
Solving multilabel MRFs using incremental α-expansion on the GPUs
VIBHAV VINEET,Narayanan P J
Asian Conference on Computer Vision, ACCV, 2009
@inproceedings{bib_Solv_2009, AUTHOR = {VIBHAV VINEET, Narayanan P J}, TITLE = {Solving multilabel MRFs using incremental α-expansion on the GPUs}, BOOKTITLE = {Asian Conference on Computer Vision}. YEAR = {2009}}
Many vision problems map to the minimization of an energyfunction over a discrete MRF. Fast performance is needed if the energyminimization is one step in a control loop. In this paper, we presentthe incrementalα-expansion algorithm for high-performance multilabelMRF optimization on the GPU. Our algorithm utilizes the gridstruc-ture of the MRFs for good parallelism on the GPU. We improve the basicpush-relabel implementation of graph cuts using the atomicoperationsof the GPU and by processing blocks stochastically. We also reuse theflow using reparametrization of the graph from cycle to cycleand itera-tion to iteration for fast performance. We show results on various visionproblems on standard datasets. Our approach takes 950 milliseconds onthe GPU for stereo correspondence on Tsukuba image with 16 labelscompared to 5.4 seconds on the CPU.
A view-dependent, polyhedral 3D display
PAWAN KUMAR HARISH,Narayanan P J
International Conference on Virtual Reality Continuum and Its Applications in Industry, VRCAI, 2009
@inproceedings{bib_A_vi_2009, AUTHOR = {PAWAN KUMAR HARISH, Narayanan P J}, TITLE = {A view-dependent, polyhedral 3D display}, BOOKTITLE = {International Conference on Virtual Reality Continuum and Its Applications in Industry}. YEAR = {2009}}
In this paper, we present the design and construction of a simple andinexpensive 3D display made up of polygonal elements. We use aper-pixel transformation of image and depth to produce accuratepicture and depth map on an arbitrary planar display facet fromany viewpoint. Though the facets are rendered independently, theimage and depth for rays from the eye-point through the facet pixelsis produced by our method. Thus, there are no artifacts on a facet orat facet boundaries. Our method can be extended to any polygonaldisplay surface as demonstrated using synthetic setups. We alsoshow a real display constructed using off-the-shelf LCD panels andcomputers. The display uses a simple calibration procedure and canbe setup in minutes. Frame-sequential and anaglyphic stereo modescan be supported for any eye-orientation and at high resolutions.
Parallelizing two dimensional convex hull on NVIDIA GPU and Cell BE
SRIKANTH S,B DURGA PRASAD REDDY,Kishore Kothapalli,Govindarajulu Regeti,Narayanan P J
International Conference on High Performance Computing, HiPC, 2009
@inproceedings{bib_Para_2009, AUTHOR = {SRIKANTH S, B DURGA PRASAD REDDY, Kishore Kothapalli, Govindarajulu Regeti, Narayanan P J}, TITLE = {Parallelizing two dimensional convex hull on NVIDIA GPU and Cell BE}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2009}}
Multicore processors are a shift of paradigm in computer architecture that promises dramatic increase in performance. But they also bring complexity in algorithmic design. In this paper we describe the challenges and design issues involved in parallelizing two dimensional convex hull on both CUDA and Cell Brodband Engine (Cell BE). We have parallelized the quickhull algorithm for two dimensional convex hull. The major advantage of this algorithm is that interprocessor communication cost is highly reduced.
Hybrid multi-core algorithms for regular image filtering applications
SHRENIK LAD,KRISHNA KUMAR SINGH,Kishore Kothapalli,Narayanan P J
Genetic Programming and Evolvable Machines, GPEM, 2009
@inproceedings{bib_Hybr_2009, AUTHOR = {SHRENIK LAD, KRISHNA KUMAR SINGH, Kishore Kothapalli, Narayanan P J}, TITLE = {Hybrid multi-core algorithms for regular image filtering applications}, BOOKTITLE = {Genetic Programming and Evolvable Machines}. YEAR = {2009}}
GPGPU has received lot of attention in the past few years mainly because of the performance gain GPUs offer at a low price. Recently, researchers have identified hybrid multi-core computing as a better solution compared to accelerator based computing for several prob-lems. In this paper, we evaluate two regular problems in image processing, bilateral filtering and convolution, on a hybrid multi-core platform. We provide a detailed analysis of these algorithms by comparing their performance on three platforms 1)CPU+GPU hybrid, 2) pure GPU and3)pure CPU. We show that performance gains as hig has 30-40%can be obtained by using basic techniques like data decomposition and overlapped execution, when a hybrid computing model is used. Finally, we conclude by discussing some future prospects in the area of hybrid computing
Fast and scalable list ranking on the GPU
MOHAMMED SUHAIL REHMAN,Kishore Kothapalli,Narayanan P J
ACM International Conference on Supercomputing, ICS, 2009
@inproceedings{bib_Fast_2009, AUTHOR = {MOHAMMED SUHAIL REHMAN, Kishore Kothapalli, Narayanan P J}, TITLE = {Fast and scalable list ranking on the GPU}, BOOKTITLE = {ACM International Conference on Supercomputing}. YEAR = {2009}}
General purpose programming on the graphics processing units (GPGPU) has received a lot of attention in the parallel computing community as it promises to offer the highest performance per dollar. The GPUs have been used extensively on regular problems that can be easily parallelized. In this paper, we describe two implementations of List Ranking, a traditional irregular algorithm that is difficult to parallelize on such massively multi-threaded hardware. We first present an implementation of Wyllie's algorithm based on pointer jumping. This technique does not scale well to large lists due to the suboptimal work done. We then present a GPU-optimized, Recursive Helman-JaJa (RHJ) algorithm. Our RHJ implementation can rank a random list of 32 million elements in about a second and achieves a speedup of about 8-9 over a CPU implementation as well as a speedup of 3-4 over the best reported implementation on the Cell Broadband engine. We also discuss the practical issues relating to the implementation of irregular algorithms on massively multi-threaded architectures like that of the GPU. Regular or coalesced memory accesses pattern and balanced load are critical to achieve good performance on the GPU.
A performance prediction model for the CUDA GPGPU platform
Kishore Kothapalli,RISHABH MUKHERJEE,MOHAMMED SUHAIL REHMAN,SURYAKANT PATIDAR,Narayanan P J,Srinathan Kannan
International Conference on High Performance Computing, HiPC, 2009
@inproceedings{bib_A_pe_2009, AUTHOR = {Kishore Kothapalli, RISHABH MUKHERJEE, MOHAMMED SUHAIL REHMAN, SURYAKANT PATIDAR, Narayanan P J, Srinathan Kannan}, TITLE = {A performance prediction model for the CUDA GPGPU platform}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2009}}
The significant growth in computational power of modern Graphics Processing Units (GPUs) coupled with the advent of general purpose programming environments like NVIDIA's CUDA, has seen GPUs emerging as a very popular parallel computing platform. Till recently, there has not been a performance model for GPGPUs. The absence of such a model makes it difficult to definitively assess the suitability of the GPU for solving a particular problem and is a significant impediment to the mainstream adoption of GPUs as a massively parallel (super)computing platform. In this paper we present a performance prediction model for the CUDA GPGPU platform. This model encompasses the various facets of the GPU architecture like scheduling, memory hierarchy, and pipelining among others. We also perform experiments that demonstrate the effects of various memory access strategies. The proposed model can be used to analyze pseudo code for a CUDA kernel to obtain a performance estimate, in a way that is similar to performing asymptotic analysis. We illustrate the usage of our model and its accuracy with three case studies: matrix multiplication, list ranking, and histogram generation.
A Parametric Proxy-Based Compression of Depth Movies
POOJA VERLANI,Narayanan P J
Data Compression Conference, DCC, 2008
@inproceedings{bib_A_Pa_2008, AUTHOR = {POOJA VERLANI, Narayanan P J}, TITLE = {A Parametric Proxy-Based Compression of Depth Movies}, BOOKTITLE = {Data Compression Conference}. YEAR = {2008}}
Depth movies provide 2-D representations of a time-varying 3D scene. Multistream depth movies arise in many structure-capturing setups and have been used for image based rendering. They are bulky and carry redundant information. We propose a proxy-based compression scheme for multistream depth movies of a scene involving dynamic human actors. The input to our system is depth movies from different viewpoints and the calibration parameters to relate depths to 3D points. We use an articulated human model as a proxy to represent the common structure of the scene. The proxy is parametrized by various bone angles.
CUDA cuts: Fast graph cuts on the GPU
VIBHAV VINEET,Narayanan P J
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2008
@inproceedings{bib_CUDA_2008, AUTHOR = {VIBHAV VINEET, Narayanan P J}, TITLE = {CUDA cuts: Fast graph cuts on the GPU}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2008}}
Graph cuts has become a powerful and popular opti-mization tool for energies defined over an MRF and havefound applications in image segmentation, stereo vision,image restoration, etc. The maxflow/mincut algorithm tocompute graph-cuts is computationally heavy. The best-reported implementation of graph cuts takes over 100 mil-liseconds even on images of size640×480and cannot beused for real-time applications or when iterated applica-tions are needed. The commodity Graphics Processor Unit(GPU) has emerged as an economical and fast computationco-processor recently. In this paper, we present an imple-mentation of the push-relabel algorithm for graph cuts onthe GPU. We can perform over 60 graph cuts per secondon1024×1024images and over 150 graph cuts per secondon640×480images on an Nvidia 8800 GTX. The time foreach complete graph-cut is about 1 millisecond when only afew weights change from the previous graph, as on dynamicgraphs resulting from videos. The CUDA code with a well-defined interface can be downloaded for anyone’s use
Parametric Proxy-Based Compression of Multiple Depth Movies of Humans
POOJA VERLANI,Narayanan P J
Data Compression Conference, DCC, 2008
@inproceedings{bib_Para_2008, AUTHOR = {POOJA VERLANI, Narayanan P J}, TITLE = {Parametric Proxy-Based Compression of Multiple Depth Movies of Humans}, BOOKTITLE = {Data Compression Conference}. YEAR = {2008}}
Capturing dynamic scenes in 3D using a suitable depth or structure recovery mechanism has become popular today for image based modelling, 3D telelconferencing, etc. The 2 1 2 D geometric structure and the aligned texture – together called a depth image – are captured by such setups. Time varying sequences of depth images are called depth movies. Depth movies of humans performing interesting actions are being captured using specialized setups today in different research labs. The data so captured is often streamed to another location for immersive viewing using standard depth-image rendering techniques. The depth maps are bulky and need innovative algorithms for efficient representation and compression. In this paper, we present a compression scheme that uses parametric proxy models for the underlying action. We use a generic articulated human model as the proxy and the various joint angles as its parameters for each time instant. The proxy model represents common prediction of the scene structure for that time instant. It can be projected to each view and the difference or residue between the depth due to the proxy and the original depth can be used to represent the scene. The residues compress better and can exploit temporal and spatial relationship between multiple depth movies. We show results on several synthetic actions to demonstrate the usefulness of the scheme
Augmented reality using over-segmentation
Visesh Chari,JAG MOHAN SINGH,Narayanan P J
National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, NCVPRIPG, 2008
@inproceedings{bib_Augm_2008, AUTHOR = {Visesh Chari, JAG MOHAN SINGH, Narayanan P J}, TITLE = {Augmented reality using over-segmentation}, BOOKTITLE = {National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics}. YEAR = {2008}}
Augmenting virtual objects between real objectsof a video poses a challenging problem, primarily because itnecessitates accurate knowledge of the scene’s depth. In thispaper, we propose an approach that generates an approximatedepth for every pixel in the vicinity of the virtual object, which weshow is enough to decide the ordering of objects in every image.We also draw a similarity between layered segmentation andaugmentation. We argue that augmentation only needs to knowwhich layers are in front of the object and which are behind it.Our algorithm makes effective use of object boundaries detectedin an image using image segmentation. The assumption thatthese boundaries correspond to depth discontinuities provides auseful simplification of the problem, sufficient to produce realisticresults. We use a combination of segmentation and featuredetection for sparse depth assignment. Results on challengingdata sets show the effectiveness of our approach
Proxy-Based Compression of 2-1/2D Structure of Dynamic Events for Tele-immersive Systems
POOJA VERLANI,Narayanan P J
International Symposium on 3D Data Processing Visualization and Transmission, 3DPVT, 2008
@inproceedings{bib_Prox_2008, AUTHOR = {POOJA VERLANI, Narayanan P J}, TITLE = {Proxy-Based Compression of 2-1/2D Structure of Dynamic Events for Tele-immersive Systems}, BOOKTITLE = {International Symposium on 3D Data Processing Visualization and Transmission}. YEAR = {2008}}
Capture of dynamic events is an active research area to-day. Capturingthe212D geometric structure and photomet-ric appearance of dynamic scenesnds applications in 3Dtele-conferencing systems,3DTVetc.The captured ìn depth movies îcontain aligned sequences of depth maps and tex-tures and are often streamed to a distant location for immer-siveviewing. The depth maps are heavy and needed acient compression schemes.In this paper, we present a scheme to compress depth movies of human actors using a parametric proxy model for the under lying action.We use a generic articulated human model as the proxy and the various joint angles a sits parameters for reach time in stant to represent a common prediction of the scene structure. The difference or residue between the captured depth and the depth of the proxy represents the scene to exploits patial coherence. Dif-ferences in residues across time are used to exploit temporal coherence. In tra-frame code frames and difference-coded frames provider and om access and high compression.We show results on several synthetic and real actions to demon-strate the compression ratio and resulting quality using a depth-based rendering of the decoded scene
Hexagonal geometry clipmaps for spherical terrain rendering
SHIBEN BHATTACHARJEE,Narayanan P J
International Conference on Computer Graphics and Interactive Techniques, SIGGRAPH, 2008
@inproceedings{bib_Hexa_2008, AUTHOR = {SHIBEN BHATTACHARJEE, Narayanan P J}, TITLE = {Hexagonal geometry clipmaps for spherical terrain rendering}, BOOKTITLE = {International Conference on Computer Graphics and Interactive Techniques}. YEAR = {2008}}
Terrains can be rendered efficiently with rectangular 2D grid of heights. Terrain on spheres, on the other hand, can be rendered using Hierarchical Triangular Mesh (HTM) but the representation does not fit directly with 2D grid of heights. We present a unified representation of HTM and clip mapping using Hexagonal Geome-try Clipmaps. This provides one to one correspondence of vertices and heights, low and constant memory usage, less storage space requirements, no pole singularity, fast vertex look-ups, and large range of view distances. Hexagonal clip maps fit equilateral trian-gles of HTM and achieve uniform triangle count on the screen. We keep the clipmaps on the GPU for fast access from shaders and usevertex buffer objects for fast triangle rendering.
Real-time rendering and manipulation of large terrains
SHIBEN BHATTACHARJEE,SURYAKANT PATIDAR,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2008
@inproceedings{bib_Real_2008, AUTHOR = {SHIBEN BHATTACHARJEE, SURYAKANT PATIDAR, Narayanan P J}, TITLE = {Real-time rendering and manipulation of large terrains}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2008}}
Terrains are challenging geometric objects for real-timerendering and interactive manipulation. State-of-the-artterrain rendering systems use custom, multiresolution, rep-resentations like geometry clipmaps for fast rendering onthe GPU. In this paper, we present a system that exploitsthe power and flexibility of the modern GPUs to store, ren-der, and manipulate terrains with minimal CPU involve-ment. The central idea is to use a regular-grid represen-tation and fixed size blocks/tiles that change in resolution.The potentially visible portion of the terrain is cached atthe highest necessary resolution and is rendered from theGPU. The CPU sends a light geometry template which isexpanded by the Geometry Shader to the triangles, usingthe heights stored in the GPU Cache. The CPU performsa coarse culling of the tiles with the GPU performing fineculling. The GPU cache is updated continuously as theviewpoint changes. Our system enables the terrain to bemodified procedurally or edited interactively on the GPUwith no CPU involvement. The terrain can also interactwith a large number of external objects in real-time entirelywithin the GPU. We achieve a consistent rendering rate of100 frames per second with terrain modification and inter-actions as well as a triangle rate of upto 350 million persecond on an Nvidia 8800 GTX GPU for large terrains, witha CPU load below 10%.
Ray casting deformable models on the GPU
SURYAKANT PATIDAR,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2008
@inproceedings{bib_Ray__2008, AUTHOR = {SURYAKANT PATIDAR, Narayanan P J}, TITLE = {Ray casting deformable models on the GPU}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2008}}
The GPUs pack high computation power and a restricted architecture into easily available hardware today. They are now used as computation co-processors and come with programming models that treat them as standard paral-lel architectures. We explore the problem of real time raycasting of large deformable models (over a million trian-gles) on large displays (a million pixels) on an off-the-shelf GPU in this paper. Ray casting is an inherently parallel and highly compute intensive operation. We build a GPU-efficient three-dimensional data structure for this purpose and a corresponding algorithm that uses it for fast ray cast-ing. We also present fast methods to build the data struc-ture on the SIMD GPUs, including a fast multi-split opera-tion. We achieve real-time ray-casting of a million triangle model onto a million pixels on current Nvidia GPUs using the CUDA model. Results are presented on the data struc-ture building and ray casting on a number of models. The ideas presented here are likely to extend to later models and architectures of the GPU as well as to other multi core ar-chitectures.
Algebraic splats representation for point based models
NAVEEN KUMAR REDDY BOLLA,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2008
@inproceedings{bib_Alge_2008, AUTHOR = {NAVEEN KUMAR REDDY BOLLA, Narayanan P J}, TITLE = {Algebraic splats representation for point based models}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2008}}
The primitives of point-based representations are in-dependent but are rendered using surfels, which approxi-mate the immediate neighborhood of each point linearly.A large number of surfels are needed to convey the exactshape. Higher-order approximations of the local neigh-borhood have the potential to represent the shape usingfewer primitives,simultaneously achieving higher renderingspeeds. In this paper, we proposealgebraic splatsas abasic primitive of representation for point based models.An algebraic splat based representation can be computedusing a moving least squares procedure. We specificallystudy low order polynomial splats in this paper. Quadraticand cubic splats provide good quality and high renderingspeed using far fewer primitives on a wide range of mod-els. They can also be rendered fast using ray tracing onmodern GPUs. We also present an algorithm to constructa representation of a model with a user-specified number ofprimitives. Our method to generates a hole-free represen-tation parametrized by a smoothing radius. The hole-freerepresentation reduces the number of primitives needed bya factor 20 to 30 on most models and by a factor of over100 on dense models like David with little or no drop invisual quality. We also present a two-pass GPU algorithmthat ray-traces the algebraic splats and blends them usinga Gaussian weighting scheme for smooth appearance. Weare able to render models like David at upwards of 200 fpson a commodity GPU using algebraic splats.
Real-time painterly rendering of terrains
SHIBEN BHATTACHARJEE,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2008
@inproceedings{bib_Real_2008, AUTHOR = {SHIBEN BHATTACHARJEE, Narayanan P J}, TITLE = {Real-time painterly rendering of terrains}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2008}}
We present a non-photo realistic, real-time painterly ren-dering technique for terrains. The painterly appearanceand the impression of terrains is created by effectively ren-dering several brush strokes. The strokes have fixed loca-tions on the surfaces of the terrain during animation to en-able frame to frame coherency. The strokes are rendered asalpha blended sprites in two-dimensions and are orientedalong the slope of terrain analogous to the way artists painton canvas. By exploiting the regular nature of terrain data,we create pre-decided rendering depth orders for primitivesfor any camera orientation.With this, we avoid the neces-sity of sorting the primitives of sprites required for alphablending. We use DirectX10/SM4.0 based shaders to renderstrokes to improve performance. Being distributed on ter-rain, strokes get cluttered when they are closely located onscreen. We follow a level of detail scheme that maintains auniform stroke density in screen space. Various styles canbe achieved with different stroke variations. Phong shadingthe rendered output in real-time is possible for more variedstyles. We achieve painterly rendering in real-time with acombination of object space positioning and image spacerendering of strokes. We illustrate our method with imagesand performance results.
VIDEO FRAME ALIGNMENT IN MULTIPLE VIEWS
SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
International Conference on Image Processing, ICIP, 2008
@inproceedings{bib_VIDE_2008, AUTHOR = {SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {VIDEO FRAME ALIGNMENT IN MULTIPLE VIEWS}, BOOKTITLE = {International Conference on Image Processing}. YEAR = {2008}}
Many events are captured using multiple cameras today. Frames of each video stream have to be synchronized and aligned to a common time axis before processing them. Synchronization of the video streams necessarily needs a hardware based solution that is applied while capturing. The alignment problem between the frames of multiple videos can be posed as a search using traditional measures for image similarity. Multiview relations and constraints developed in Computer Vision recently can provide more elegant solutions to this problem. In this paper, we provide two solutions for the video frame alignment problem using two view and three view constraints. We present solutions to this problem for the case when the videos are taken using affine cameras and for general projective cameras. Excellent experimental results are achieved by our algorithms.
Video Completion as Noise Removal
Visesh Chari,Narayanan P J,Jawahar C V
National Conference on Communications, NCC, 2008
@inproceedings{bib_Vide_2008, AUTHOR = {Visesh Chari, Narayanan P J, Jawahar C V}, TITLE = {Video Completion as Noise Removal}, BOOKTITLE = {National Conference on Communications}. YEAR = {2008}}
Video completion algorithms have concentrated on obtaining visually consistent solutions to fill-in the missing portions, without any emphasis on the physical correctness of the video. Resulting solutions thus use texture or image structure based cues and are limited in the situations they can handle. In this paper we take a model based signal processing approach to video completion [1]. Completion of the video is then defined as satisfying the given model by detecting and removing the error (selected parts of the video to be replaced). Given a probabilistic model, video completion then becomes an unsupervised learning algorithm with the input video giving a “noisy” version. Dense completion is the automatic inferencing of the “noise-less” or “true” video from the input. This approach finds a solution that satisfies visual coherence and is applicable to a wide variety of scenarios. We demonstrate the efficacy of our approach and its wide applicability using two scenarios.
Attention-based super resolution from videos
DILEEP REDDY VAKA,Narayanan P J,Jawahar C V
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2008
@inproceedings{bib_Atte_2008, AUTHOR = {DILEEP REDDY VAKA, Narayanan P J, Jawahar C V}, TITLE = {Attention-based super resolution from videos}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2008}}
A video from a moving camera produces different number of observations of different scene areas. We can construct an attention map of the scene by bringing the frames to a common reference and counting the number of frames that observed each scene point. Different representations can be constructed from this. The base of the attention map gives the scene mosaic. Super-resolved images of parts of the scene can be obtained using a subset of observations or video frames. We can combine mosaicing with superresolution by using all observations, but the magnification factor will vary across the scene based on the attention received. The height of the attention map indicates the amount of super-resolution for that scene point. We modify the traditional super-resolution framework to generate a varying resolution image for panning cameras in this paper. The varying resolution image uses all useful data available in a video. We introduce the concept of attention-based super-resolution and give the modified framework for it. We also show its applicability on a few indoor and outdoor videos.
Machine vision analysis of the energy efficiency of intermodal freight trains
Y-C Lai,C P L Barkan,J Drapa,N Ahuja, J M Hart,Narayanan P J,Jawahar C V, A Kumar, L R Milhon
Journal of Railway and Rapid Transit, JRRT, 2007
@inproceedings{bib_Mach_2007, AUTHOR = {Y-C Lai, C P L Barkan, J Drapa, N Ahuja, J M Hart, Narayanan P J, Jawahar C V, A Kumar, L R Milhon}, TITLE = {Machine vision analysis of the energy efficiency of intermodal freight trains}, BOOKTITLE = {Journal of Railway and Rapid Transit}. YEAR = {2007}}
Intermodal (IM) trains are typically the fastest freight trains operated in North America. The aerodynamic characteristics of many of these trains are often relatively poor resulting in high fuel consumption. However, considerable variation in fuel efficiency is possible depending on how the loads are placed on railcars in the train. Consequently, substantial potential fuel savings are possible if more attention is paid on the loading configurations of trains. A wayside machine vision (MV) system was developed to automatically scan passing IM trains and assess their aerodynamic efficiency. MV algorithms are used to analyse these images, detect and measure gaps between loads. In order to make use of the data, a scoring system was devel- oped based on two attributes – the aerodynamic coefficient and slot efficiency. The aerodynamic coefficient is calculated using the Aerodynamic Subroutine of the Train Energy Model. Slot effi- ciency represents the difference between the actual and ideal loading configuration given the particular set of railcars in the train. This system can provide IM terminal managers feedback on loading performance for trains and be integrated into the software support systems used for loading assignment.
Point Based Representations for Hierarchical Environments
THANGUDU KEDARNATH,GADE LAKSHMI,JAG MOHAN SINGH,Narayanan P J
International Conference on Computer Theory and Applications, ICCTA, 2007
@inproceedings{bib_Poin_2007, AUTHOR = {THANGUDU KEDARNATH, GADE LAKSHMI, JAG MOHAN SINGH, Narayanan P J}, TITLE = {Point Based Representations for Hierarchical Environments}, BOOKTITLE = {International Conference on Computer Theory and Applications}. YEAR = {2007}}
The advent of advanced graphics technologies and im-proved hardware has enabled the generation of highly com-plex models with huge number of triangles. Point-basedrepresentations and rendering have emerged as viable rep-resentations for high-quality models in this scenario. Thesemethods have been demonstrated only on high resolution,compact objects so far. They have to be adapted to environ-ments that extend over large areas to be considered seriousrepresentations. In this paper, we present an adaptation ofthe point-based representation to large, hierarchical envi-ronments. We show how point-based data can be generatedby sampling polygon-based representations. We also showthe combination of an object hierarchy and multiresolutionpoint representations and develop rendering algorithms forthe same. The multiresolution representation is constructedduring the generation process. We then show a hybrid rep-resentation in which the more complex portions of the envi-ronment are represented using points and others using theoriginal polygon-based representation. This produces bet-ter rendering performance by keeping large, flat regions astriangles. We demonstrate the method on the model of theFatehpur Sikri which has 14000 objects with over 500,000triangles
Virtualized reality: perspectives on 4D digitization of dynamic events
Takeo Kanade,Narayanan P J
IEEE Computer Graphics and Applications, CGApp, 2007
@inproceedings{bib_Virt_2007, AUTHOR = {Takeo Kanade, Narayanan P J}, TITLE = {Virtualized reality: perspectives on 4D digitization of dynamic events}, BOOKTITLE = {IEEE Computer Graphics and Applications}. YEAR = {2007}}
Digitally recording dynamic events, such as sporting events, for experiencing in a spatio-temporally distant and arbitrary setting requires 4D capture: three dimensions for their geometry and appearance over the fourth dimension of time. Today's computer vision techniques make 4D capture possible. The virtualized reality system serves as an example on the general problem of digitizing dynamic events. In this article, we present the virtualized reality system's details from a historical perspective
Garuda: A scalable tiled display wall using commodity PCs
NIRNIMESH,PAWAN KUMAR HARISH,Narayanan P J
IEEE Transactions on Visualization and Computer Graphics, TVCG, 2007
@inproceedings{bib_Garu_2007, AUTHOR = {NIRNIMESH, PAWAN KUMAR HARISH, Narayanan P J}, TITLE = {Garuda: A scalable tiled display wall using commodity PCs}, BOOKTITLE = {IEEE Transactions on Visualization and Computer Graphics}. YEAR = {2007}}
Cluster-based tiled display walls can provide cost-effective and scalable displays with high resolution and a large displayarea. The software to drive them needs to scale too if arbitrarily large displays are to be built. Chromium is a popular software API usedto construct such displays. Chromium transparently renders any OpenGL application to a tiled display by partitioning and sendingindividual OpenGL primitives to each client per frame. Visualization applications often deal with massive geometric data with millions ofprimitives. Transmitting them every frame results in huge network requirements that adversely affect the scalability of the system. Inthis paper, we present Garuda, a client-server-based display wall framework that uses off-the-shelf hardware and a standard network.Garuda is scalable to large tile configurations and massive environments. It can transparently render any application built using theOpen Scene Graph (OSG) API to a tiled display without any modification by the user. The Garuda server uses an object-based scenestructure represented using a scene graph. The server determines the objects visible to each display tile using a novel adaptivealgorithm that culls the scene graph to a hierarchy of frustums. Required parts of the scene graph are transmitted to the clients, whichcache them to exploit the interframe redundancy. A multicast-based protocol is used to transmit the geometry to exploit the spatialredundancy present in tiled display systems. A geometry push philosophy from the server helps keep the clients in sync with oneanother. Neither the server nor a client needs to render the entire scene, making the system suitable for interactive rendering ofmassive models. Transparent rendering is achieved by intercepting the cull, draw, and swap functions of OSG and replacing them withour own. We demonstrate the performance and scalability of the Garuda system for different configurations of display wall. We alsoshow that the server and network loads grow sublinearly with the increase in the number of tiles, which makes our scheme suitable toconstruct very large displays
Accelerating large graph algorithms on the GPU using CUDA
PAWAN KUMAR HARISH,Narayanan P J
International Conference on High Performance Computing, HiPC, 2007
@inproceedings{bib_Acce_2007, AUTHOR = {PAWAN KUMAR HARISH, Narayanan P J}, TITLE = {Accelerating large graph algorithms on the GPU using CUDA}, BOOKTITLE = {International Conference on High Performance Computing}. YEAR = {2007}}
Large graphs involving millions of vertices are common in many prac-tical applications and are challenging to process. Practical-time implementationsusing high-end computers are reported but are accessible only to a few. GraphicsProcessing Units (GPUs) of today have high computation power and low price.They have a restrictive programming model and are tricky to use. The G80 lineof Nvidia GPUs can be treated as a SIMD processor array using the CUDA pro-gramming model. We present a few fundamental algorithms – including breadthfirst search, single source shortest path, and all-pairs shortest path – using CUDAon large graphs. We can compute the single source shortest path on a 10 millionvertex graph in1.5seconds using the Nvidia 8800GTX GPU costing $600. Insome cases optimal sequential algorithm is not the fastest on the GPU architec-ture. GPUs have great potential as high-performance co-processors.
A vision system for monitoring intermodal freight trains
Avinash Kumar,Narendra Ahuja,John M Hart,UDAY KUMAR VISESH,Narayanan P J,Jawahar C V
Winter Conference on Applications of Computer Vision, WACV, 2007
@inproceedings{bib_A_vi_2007, AUTHOR = {Avinash Kumar, Narendra Ahuja, John M Hart, UDAY KUMAR VISESH, Narayanan P J, Jawahar C V}, TITLE = {A vision system for monitoring intermodal freight trains}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2007}}
We describe the design and implementation of a vision based Intermodal Train Monitoring System(ITMS) for extracting various features like length of gaps in an intermodal(IM) train which can later be used for higher level inferences. An intermodal train is a freight train consisting of two basic types of loads - containers and trailers. Our system first captures the video of an IM train, and applies image processing and machine learning techniques developed in this work to identify the various types of loads as containers and trailers. The whole process relies on a sequence of following tasks - robust background subtraction in each frame of the video, estimation of train velocity, creation of mosaic of the whole train from the video and classification of train loads into containers and trailers. Finally, the length of gaps between the loads of the IM train is estimated and is used to analyze the aerodynamic efficiency of the loading pattern of the train, which is a critical aspect of freight trains. This paper focusses on the machine vision aspect of the whole system.
On using classical poetry structure for Indian language post-processing
Anoop Namboodiri,Narayanan P J,Jawahar C V
International Conference on Document Analysis and Recognition, ICDAR, 2007
@inproceedings{bib_On_u_2007, AUTHOR = {Anoop Namboodiri, Narayanan P J, Jawahar C V}, TITLE = {On using classical poetry structure for Indian language post-processing}, BOOKTITLE = {International Conference on Document Analysis and Recognition}. YEAR = {2007}}
Post-processors are critical to the performance of language recognizers like OCRs, speech recognizers, etc. Dictionary-based post-processing commonly employ either an algorithmic approach or a statistical approach. Other linguistic features are not exploited for this purpose. The language analysis is also largely limited to the prose form.This paper proposes a framework to use the rich metric and formal structure of classical poetic forms in Indian languages for post-processing a recognizer like an OCR engine. We show that the structure present in the form of the vrtta and prasa ¯ can be efficiently used to disambiguate some cases that may be difficult for an OCR. The approach is efficient, and complementary to other post-processing approaches and can be used in conjunction with them.
Real-time streaming and rendering of terrains
SOUMYAJIT DEB,SHIBEN BHATTACHARJEE,SURYAKANT PATIDAR,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2006
@inproceedings{bib_Real_2006, AUTHOR = {SOUMYAJIT DEB, SHIBEN BHATTACHARJEE, SURYAKANT PATIDAR, Narayanan P J}, TITLE = {Real-time streaming and rendering of terrains}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2006}}
Terrains and other geometric models have been traditionally stored locally. Their remote access presents the characteristics that are a combination of file serving and realtime streaming like audio-visual media. This paper presents a terrain streaming system based upon a client server architecture to handle heterogeneous clients over low-bandwidth networks. We present an efficient representation for handling terrains streaming. We design a client-server system that utilizes this representation to stream virtual environments containing terrains and overlayed geometry efficiently. We handle dynamic entities in environment and the synchronization of the same between multiple clients. We also present a method of sharing and storing terrain annotations for collaboration between multiple users. We conclude by presenting preliminary performance data for the streaming system
GPU objects
SUNIL MOHAN RANTA,JAG MOHAN SINGH,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2006
@inproceedings{bib_GPU__2006, AUTHOR = {SUNIL MOHAN RANTA, JAG MOHAN SINGH, Narayanan P J}, TITLE = {GPU objects}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2006}}
Points, lines, and polygons have been the fundamental primitives in graphics. Graphics hardware is optimized to handle them in a pipeline. Other objects are converted to these primitives before rendering. Programmable GPUs have made it possible to introduce a wide class of computations on each vertex and on each fragment. In this paper, we outline a procedure to accurately draw high-level procedural elements efficiently using the GPU. The CPU and the vertex shader setup the drawing area on screen and pass the required parameters. The pixel shader uses ray-casting to compute the 3D point that projects to it and shades it using a general shading model. We demonstrate the fast rendering of 2D and 3D primitives like circle, conic, triangle, sphere, quadric, box, etc., with a combination of specularity, refraction, and environment mapping. We also show combination of objects, like Constructive Solid Geometry (CSG) objects, can be rendered fast on the GPU. We believe customized GPU programs for a new set of high-level primitives – which we call GPU Objects – is a way to exploit the power of GPUs and to provide interactive rendering of scenes otherwise considered too complex.
Progressive decomposition of point clouds without local planes
JAG MOHAN SINGH,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2006
@inproceedings{bib_Prog_2006, AUTHOR = {JAG MOHAN SINGH, Narayanan P J}, TITLE = {Progressive decomposition of point clouds without local planes}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2006}}
We present a reordering-based procedure for the multiresolution decomposition of a point cloud in this paper. The points are first reordered recursively based on an optimal pairing. Each level of reordering induces a division of the points into approximation and detail values. A balanced quantization at each level results in further compression. The original point cloud can be reconstructed without loss from the decomposition. Our scheme does not require local reference planes for encoding or decoding and is progressive. The points also lie on the original manifold at all levels of decomposition. The scheme can be used to generate different discrete LODs of the point set with fewer points in each at low BPP numbers. We also present a scheme for the progressive representation of the point set by adding the detail values selectively. This results in the progressive approximation of the original shape with dense points even at low BPP numbers. The shape gets refined as more details are added and can reproduce the original point set. This scheme uses a wavelet decomposition of the detail coefficients of the multiresolution decomposition. Progressiveness is achieved by including different levels of the DWT decomposition at all multiresolution representation levels. We show that this scheme can generate much better approximations at equivalent BPP numbers for the point set.
Culling an object hierarchy to a frustum hierarchy
NIRNIMESH,PAWAN KUMAR HARISH,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2006
@inproceedings{bib_Cull_2006, AUTHOR = {NIRNIMESH, PAWAN KUMAR HARISH, Narayanan P J}, TITLE = {Culling an object hierarchy to a frustum hierarchy}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2006}}
Visibility culling of a scene is a crucial stage for interactive graphics applications, particularly for scenes with thousands of objects. The culling time must be small for it to be effective. A hierarchical representation of the scene is used for efficient culling tests. However, when there are multiple view frustums (as in a tiled display wall), visibility culling time becomes substantial and cannot be hidden by pipelining it with other stages of rendering. In this paper, we address the problem of culling an object to a hierarchically organized set of frustums, such as those found in tiled displays and shadow volume computation. We present an adaptive algorithm to unfold the twin hierarchies at every stage in the culling procedure. Our algorithm computes from-point visibility and is conservative. The precomputation required is minimal, allowing our approach to be applied for dynamic scenes as well. We show performance of our technique over different variants of culling a scene to multiple frustums. We also show results for dynamic scenes.
Layer extraction using graph cuts and feature tracking
VARDHMAN JAIN,Narayanan P J
International Conference on Digital Libraries, ICDLi, 2006
@inproceedings{bib_Laye_2006, AUTHOR = {VARDHMAN JAIN, Narayanan P J}, TITLE = {Layer extraction using graph cuts and feature tracking}, BOOKTITLE = {International Conference on Digital Libraries}. YEAR = {2006}}
In this paper we present a new method for layer extraction by tracking a non-rigid body with no fixed motion model, in a video. The method integrates the graph cuts approach with robust point based tracking to achieve good tracking of the whole object over frames of a video. With the help of a little user interaction our method can perform layer extraction over irregular motion and difficult object boundaries. To achieve this we apply the 3D graph cuts on a pair of frames and propagate the labels obtained in the earlier frame to new frame by use of robust tracking method. The user is shown the results of the layer extraction and can provide extra strokes to improve the results.
Video completion for indoor scenes
VARDHMAN JAIN,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2006
@inproceedings{bib_Vide_2006, AUTHOR = {VARDHMAN JAIN, Narayanan P J}, TITLE = {Video completion for indoor scenes}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2006}}
In this paper, we present a new approach for object removaland video completion of indoor scenes. In indoor images, the frames arenot affine related. The region near the object to be removed can havemultiple planes with sharply different motions. Dense motion estimationmay fail for such scenes due to missing pixels. We use feature tracking tofind dominant motion between two frames. The geometry of the motionof multiple planes is used to segment the motion layers into componentplanes. The homography corresponding to each hole pixel is used to warpa frame in the future or past for filling it. We show the application ofour technique on some typical indoor videos.
Depth Images: Representations and Real-Time Rendering
POOJA VERLANI,ADITI GOSWAMI,Narayanan P J,Shekhar Dwivedi,Sashi Kumar Penta
International Symposium on 3D Data Processing Visualization and Transmission, 3DPVT, 2006
@inproceedings{bib_Dept_2006, AUTHOR = {POOJA VERLANI, ADITI GOSWAMI, Narayanan P J, Shekhar Dwivedi, Sashi Kumar Penta}, TITLE = {Depth Images: Representations and Real-Time Rendering}, BOOKTITLE = {International Symposium on 3D Data Processing Visualization and Transmission}. YEAR = {2006}}
Depth Images are viable representations that can be computed from the real world using cameras and/or other scanning devices. The depth map provides 2-kD structure of the scene. A set of Depth Images can provide hole-free rendering of the scene. Multiple views need to blended to provide smooth hole-free rendering, however. Such a representation of the scene is bulky and needs good algorithms for real-time rendering and efficient representation. In this paper, we present a discussion on the Depth Image representation and provide a GPU-based algorithm that can render large models represented using DIs in real time. We then present a proxy-based compression scheme for Depth Images and provide results for the same. Results are shown on synthetic scenes under different conditions and on some scenes generated from images. Lastly, we initiate discussion on varying quality levels in IBR and show a way to create representations using DIs with different trade-offs between model size and rendering quality. This enables the use of this representation for a variety of rendering situations.
Historical perspectives on 4d virtualized reality
Takeo Kanade,Narayanan P J
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2006
@inproceedings{bib_Hist_2006, AUTHOR = {Takeo Kanade, Narayanan P J}, TITLE = {Historical perspectives on 4d virtualized reality}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2006}}
Recording dynamic events, such as a sports event,a ballet performance, or a lecture, digitally for ex-periencing in a spatiotemporally distant setting re-quires 4D capture: three dimensions for their geom-etry/appearance over the fourth dimension of time.Cameras are suitable for this task as they are non-intrusive, universal, and inexpensive. Computer Vi-sion techniques have advanced sufficiently to make the4D capture possible. In this paper, we present a his-torical perspective on the Virtualized RealityTMsys-tem developed since early 90s to early 2000 at CMUfor the 4D capture of dynamic events
Streaming terrain rendering
Soumyajit Deb,Narayanan P J,SHIBEN BHATTACHARJEE
SIGGRAPH ASIA Technical Briefs, SATB, 2006
@inproceedings{bib_Stre_2006, AUTHOR = {Soumyajit Deb, Narayanan P J, SHIBEN BHATTACHARJEE}, TITLE = {Streaming terrain rendering}, BOOKTITLE = {SIGGRAPH ASIA Technical Briefs}. YEAR = {2006}}
Terrains and other geometric models have been traditionally storedlocally. Their remote access presents the characteristics that are acombination of data serving such as files and real-time streaminglike audio-visual media. In this sketch we describe a client-serversystem to serve and stream large terrains to heterogenous clients.This process is sensitive to both the client’s capabilities as well asthe available network bandwidth. Level of Detail and view predic-tion are used to alleviate the effects of changing latency and band-width. We discuss the design of a terrain streaming system andpresent preliminary results
Compression of multiple depth maps for ibr
SASHI KUMAR PENTA,Narayanan P J
The Visual Computer, VC, 2005
@inproceedings{bib_Comp_2005, AUTHOR = {SASHI KUMAR PENTA, Narayanan P J}, TITLE = {Compression of multiple depth maps for ibr}, BOOKTITLE = {The Visual Computer}. YEAR = {2005}}
Image-based rendering techniques include those with geometry and those without. Geometric information in the form of a depth map aligned with the image holds a lot of promise for IBR due to the several methods available to capture it. It can improve the quality of generated views using a limited number of views. Compression of light fields or multiple images has attracted a lot of attention in the past. Compression of multiple depth maps of the same scene has not been explored much in the literature. We propose a method for compressing multiple depth maps in this paper using geometric proxy. Different quality of rendering and compression ratio can be achieved by varying different parameters. Experiments show the effectiveness of the compression technique on several model data
Design of A Geometry Streaming System.
SOUMYAJIT DEB,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2004
@inproceedings{bib_Desi_2004, AUTHOR = {SOUMYAJIT DEB, Narayanan P J}, TITLE = {Design of A Geometry Streaming System.}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2004}}
The size and detail of graphics environments in everyday use has gone up considerably recently. Still, most applications use locally resident geometric content for rendering. In this paper, we present a system to stream large graphics environments from a central server to multiple number of clients. The streaming is transparent to the user who can treat remote models just like local ones. The streaming system automatically adapts to the rendering capabilities, network bandwidth and latency of the client and transmits an optimized model. We present the design of the streaming system and give results of streaming a large model using it
RepVis: A Remote Visualization System for Large Environments
SOUMYAJIT DEB,Narayanan P J
National Conference on Communications, NCC, 2004
@inproceedings{bib_RepV_2004, AUTHOR = {SOUMYAJIT DEB, Narayanan P J}, TITLE = {RepVis: A Remote Visualization System for Large Environments}, BOOKTITLE = {National Conference on Communications}. YEAR = {2004}}
Large virtual environments arise in many applications and may be used by many users simultaneously for walkthroughs. It is not feasible to store a copy of the whole environment at each of the client locations due to their size. The dynamic nature of the environment also makes it more convenient for a central server to ensure consistency across different users. In this paper, we present a client-server based system for visualizing large environments from remote users. The client gets only the portion of the geometry necessary for navigating around the current user location. The system can adapt to different client parameters such as graphics capability, network bandwidth, communication latency, etc. We present the design and implementation of the system in this paper and produce relevant results using it
Remotevis: Remote visualization of massive virtual environments
SOUMYAJIT DEB,Narayanan P J
National Conference on Communications, NCC, 2004
@inproceedings{bib_Remo_2004, AUTHOR = {SOUMYAJIT DEB, Narayanan P J}, TITLE = {Remotevis: Remote visualization of massive virtual environments}, BOOKTITLE = {National Conference on Communications}. YEAR = {2004}}
Graphics models for virtual environments are increasingly getting massive, requiring large amounts of memory to store and high end graphics capabilities to render. It may not be possible or desirable to store the entire model in the station where it will be rendered, especially for environments that change. Such virtual models could be stored on a server and streamed to remote clients on demand. In this paper, we present the design and implementation of a client-server system to render large virtual environments. Our system can provide the best visualization quality for a wide range in connection bandwidth and latency and the rendering capacity of the client. The system uses a visibility-based geometry representation so that little invisible geometry is sent to the client. Incremental updating and client-side prediction of user motion optimize the communication requiremnts for a given client. We present results of the study conducted on a representative range of the relevant parameters in this paper
Depth+ Texture Representation for Image Based Rendering.
Narayanan P J,SASHI KUMAR PENTA,SIREESH REDDY KOMATIREDDY
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2004
@inproceedings{bib_Dept_2004, AUTHOR = {Narayanan P J, SASHI KUMAR PENTA, SIREESH REDDY KOMATIREDDY}, TITLE = {Depth+ Texture Representation for Image Based Rendering.}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2004}}
Image Based Rendering holds a lot of promise for navigating through a real world scene without modeling it manually. Different representations have been proposed for IBR in the literature. In this paper, we argue that a representation using depth maps and texture images from a number of viewpoints is a rich and viable representation for IBR. We discuss different aspects of this representation including capture, representation, compression and rendering. We show several results to show how the representation can be used to model and render complex scenes.
Fourier domain representation of planar curves for recognition in multiple views
SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
Pattern Recognition, PR, 2004
@inproceedings{bib_Four_2004, AUTHOR = {SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Fourier domain representation of planar curves for recognition in multiple views}, BOOKTITLE = {Pattern Recognition}. YEAR = {2004}}
Recognition of planar shapes is an important problem in computer vision andpattern recognition. The same planar object contour imaged from di1erent cameras or from di1erent viewpoints looks di1erent and their recognition is non-trivial. Traditional shape recognition deals with views of the shapes that di1er only by simple rotations, translations, and scaling. However, shapes su1er more serious deformation between two general views and hence recognition approaches designed to handle translations, rotations, and/or scaling would prove to be insu5cient. Many algebraic relations between matching primitives in multiple views have been identi7edrecently. In this paper, we explore how shape properties andmultiview relations can be combinedto recognize planar shapes across multiple views. We propose novel recognition constraints that a planar shape boundary must satisfy in multiple views. The constraints are on the rank of a Fourier-domain measurement matrix computed from the points on the shape boundary. Our method can additionally compute the correspondence between the curve points after a match is established. We demonstrate the applications of these constraints experimentally on a number of synthetic and real images.
Constraints on coplanar moving points
SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
European Conference on Computer Vision, ECCV, 2004
@inproceedings{bib_Cons_2004, AUTHOR = {SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Constraints on coplanar moving points}, BOOKTITLE = {European Conference on Computer Vision}. YEAR = {2004}}
Configurations of dynamic points viewed by one or more cameras have not been studied much. In this paper, we present several view and time-independent constraints on different configurations of points moving on a plane. We show that 4 points with constant independent velocities or accelerations under affine projection can be characterized in a view independent manner using 2 views. Under perspective projection, 5 coplanar points under uniform linear velocity observed for 3 time instants in a single view have a view-independent characterization. The best known constraint for this case involves 6 points observed for 35 frames. Under uniform acceleration, 5 points in 5 time instants have a view-independent characterization. We also present constraints on a point undergoing arbitrary planar motion under affine projections in the Fourier domain. The constraints introduced in this paper involve fewer points or views than similar results reported in the literature and are simpler to compute in most cases. The constraints developed can be applied to many aspects of computer vision. Recognition constraints for several planar point configurations of moving points can result from them. We also show how time-alignment of views captured independently can follow from the constraints on moving point configurations.
Building blocks for autonomous navigation using contour correspondences
PAWAN KUMAR M,Jawahar C V,Narayanan P J
International Conference on Image Processing, ICIP, 2004
@inproceedings{bib_Buil_2004, AUTHOR = {PAWAN KUMAR M, Jawahar C V, Narayanan P J}, TITLE = {Building blocks for autonomous navigation using contour correspondences}, BOOKTITLE = {International Conference on Image Processing}. YEAR = {2004}}
We address a few problems in navigation of automated vehicles using images captured by a mounted camera. Specifically, we look at the recognition of sign boards, rectification of planar objects imaged by the camera, and estimation of the position of a vehicle with respect to a fixed sign board. Our solutions are based on contour correspondence between a reference view and the current view. A mapping between corresponding points of a planar object in two different views is a matrix called the homography. A novel two-step linear algorithm for homography calculation from contour correspondence is developed first. Our algorithm requires the identification of an image contour as the projections of a known planar world contour and the selection of a known starting point. The homography between the reference view and the target view is applied to several reallife navigation applications, results of which are presented in this paper
Geometric Structure Computation from Conics.
Pawan Kumar Mudigonda,Jawahar C V,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2004
@inproceedings{bib_Geom_2004, AUTHOR = {Pawan Kumar Mudigonda, Jawahar C V, Narayanan P J}, TITLE = {Geometric Structure Computation from Conics.}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2004}}
This paper presents several results on images of various configurations of conics. We extract information about the plane from single and multiple views of known and unknown conics, based on planar homography and conic correspondences. We show that a single conic section cannot provide sufficient information. Metric rectification of the plane can be performed from a single view if two conics can be identified to be images of circles without knowing their centers or radii. The homography between two views of a planar scene can be computed if two arbitrary conics are identified in them without knowing anything specific about them. The scene can be reconstructed from a single view if images of a pair of circles can be identified in two planes. Our results are simpler and require less information from the images than previously known results. The results presented here involve univariate polynomial equations of degree 4 or 8 and always have solutions. Applications to metric rectification, homography calculation, 3D reconstruction, and projective OCR are presented to demonstrate the usefulness of our scheme
Discrete contours in multiple views: approximation and recognition
PAWAN KUMAR M,SAURABH GOYAL,SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
Image Vision Computing, IVC, 2004
@inproceedings{bib_Disc_2004, AUTHOR = {PAWAN KUMAR M, SAURABH GOYAL, SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Discrete contours in multiple views: approximation and recognition}, BOOKTITLE = {Image Vision Computing}. YEAR = {2004}}
Recognition of discrete planar contours under similarity transformations has received a lot of attention but little work has been reported on recognizing them under more general transformations. Planar object boundaries undergo projective or affine transformations across multiple views. We present two methods to recognize discrete curves in this paper. The first method computes a piecewise parametric approximation of the discrete curve that is projectively invariant. A polygon approximation scheme and a piecewise conic approximation scheme are presented here. The second method computes an invariant sequence directly from the sequence of discrete points on the curve in a Fourier transform space. The sequence is shown to be identical up to a scale factor in all affine related views of the curve. We present the theory and demonstrate its applications to several problems including numeral recognition, aircraft recognition, and homography computation. q 2004 Elsevier B.V. All rights reserved
Planar homography from fourier domain representation
PAWAN KUMAR M,SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
International Conference on Signal Processing and Communications, SPCOM, 2004
@inproceedings{bib_Plan_2004, AUTHOR = {PAWAN KUMAR M, SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Planar homography from fourier domain representation}, BOOKTITLE = {International Conference on Signal Processing and Communications}. YEAR = {2004}}
Computing the transformation between two views of a planar scene is an important step in many computer vision applications. Spatial approaches to solve this problem need corresponding sets of primitives – points, lines, conics, etc. Identification of corresponding primitives in two images is non-trivial, limiting the applicability of such approaches. In this paper, we present a novel Fourier domain based approach that makes use of image intensities for computing the image-to-image transformation. Our approach transforms the images to the Fourier domain and then represents them in a coordinate system in which the affine transformation is reduced to an anisotropic scaling. The anisotropic scale factors can be computed using cross correlation methods, and working backwards from this, we compute the entire transformation. It does not require any correspondences thereby making it practically very useful. Applications to registration and recognition are discussed.
Tools for developing OCRs for Indian scripts
M N S S K PAVAN KUMAR,SANTOSH RAVI KIRAN S,ABHISHEK NAYANI,Jawahar C V,Narayanan P J
Computer Vision and Pattern Recognition Conference workshops, CVPR-W, 2003
@inproceedings{bib_Tool_2003, AUTHOR = {M N S S K PAVAN KUMAR, SANTOSH RAVI KIRAN S, ABHISHEK NAYANI, Jawahar C V, Narayanan P J}, TITLE = {Tools for developing OCRs for Indian scripts}, BOOKTITLE = {Computer Vision and Pattern Recognition Conference workshops}. YEAR = {2003}}
Development of OCRs for Indian script is an active area of activity today. Indian scripts present great challenges to an OCR designer due to the large number of letters in the alphabet, the sophisticated ways in which they combine, and the complicated graphemes they result in. The problem is compounded by the unstructured manner in which popular fonts are designed. There is a lot of common structure in the different Indian scripts. In this paper, we argue that a number of automatic and semi-automatic tools can ease the development of recognizers for new font styles and new scripts. We discuss briefly three such tools we developed and show how they have helped build new OCRs. An integrated approach to the design of OCRs for all Indian scripts has great benefits. We are building OCRs for all Indian languages following this approach as part of a system to provide tools to create content in them.
Towards fuzzy calibration
Jawahar C V,Narayanan P J
International Conference on Fuzzy Systems, FUZZ , 2002
@inproceedings{bib_Towa_2002, AUTHOR = {Jawahar C V, Narayanan P J}, TITLE = {Towards fuzzy calibration}, BOOKTITLE = {International Conference on Fuzzy Systems}. YEAR = {2002}}
Algebraic Constraints on Moving Points in Multiple Views.
SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2002
@inproceedings{bib_Alge_2002, AUTHOR = {SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Algebraic Constraints on Moving Points in Multiple Views.}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2002}}
Multiview analysis of scenes includes the study of sceneindependent constraints satisfied by a configuration of cameras for all types of scenes as well as the study of viewindependent constraints satisfied by any camera on a configuration of points. In this paper, we derive new constraints involving configurations of points that move with constant velocity, with constant acceleration, and for unconstrained planar motion. We show how these constraints can be applied to problems like motion recognition, frame alignment, etc.
Polygonal Approximation of Closed Curves across Multiple Views.
PAWAN KUMAR M,SAURABH GOYAL,Jawahar C V,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2002
@inproceedings{bib_Poly_2002, AUTHOR = {PAWAN KUMAR M, SAURABH GOYAL, Jawahar C V, Narayanan P J}, TITLE = {Polygonal Approximation of Closed Curves across Multiple Views.}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2002}}
Polygon approximation is an important step in the recognition of planar shapes. Traditional polygonal approximation algorithms handle only images that are related by a similarity transformation. The transformation of a planar shape as the viewpoint changes with a perspective camera is a general projective one. In this paper, we present a novel method for polygonal approximation of closed curves that is invariant to projective transformation. The polygons generated by our algorithm from two images, related by a projective homography, are isomorphic. We also describe an application of this in the form of numeral recognition. We demonstrate the importance of this algorithm for real-life applications like number plate recognition, aircraft recognition and metric rectification.
A Multifeature Correspondence Algorithm Using Dynamic Programming
Jawahar C V,Narayanan P J
Asian Conference on Computer Vision, ACCV, 2002
@inproceedings{bib_A_Mu_2002, AUTHOR = {Jawahar C V, Narayanan P J}, TITLE = {A Multifeature Correspondence Algorithm Using Dynamic Programming}, BOOKTITLE = {Asian Conference on Computer Vision}. YEAR = {2002}}
Correspondence between pixels is an important problem in stereo vision. Several algorithms have been proposed to carry out this task in literature. Almost all of them employ only gray-values. We show here that addition of primary or secondary evidence maps can improve the correspondence computation. However any particular combination is not guaranteed to provide proper results in a general sitiuation. What one needs is a mechanism to select the evidences which are apropriate for a particular pair of images. We present an algorithm for stereo correspondence that can take advantage of different image features adaptively for matching. A match measure combining different individual measures computed from different featuresis used by our algorithm. The advantages of each feature can be combined in a single correspondence computation. We describe an unsupervised scheme to compute the relevance of each feature to a particular situation, given a set of possibly useful features. We present an implementation of the scheme using dynamic programming for pixel-to-pixel correspondence
An adaptive multifeature correspondence algorithm for stereo using dynamic programming
Jawahar C V,Narayanan P J
Pattern Recognition Letters, PRLJ, 2002
@inproceedings{bib_An_a_2002, AUTHOR = {Jawahar C V, Narayanan P J}, TITLE = {An adaptive multifeature correspondence algorithm for stereo using dynamic programming}, BOOKTITLE = {Pattern Recognition Letters}. YEAR = {2002}}
We present an algorithm for stereo correspondence that can take advantage of different image features adaptively for matching. A match measure combining different match measures computed from different features is used by our algorithm. It is possible to compute correspondences using the gray value, multispectral components, derived features such as the edge strength, texture, etc., in a flexible manner using this algorithm. The advantages of each feature can be combined in a single correspondence computation. We describe a non-supervised scheme to compute the relevance of each feature to a particular situation, given a set of possibly useful features. We present an implementation of the scheme using dynamic programming for pixel-to-pixel correspondence. Results demonstrate the advantages of our scheme under different conditions.
Generalised correlation for multi-feature correspondence
Jawahar C V,Narayanan P J
Pattern Recognition, PR, 2002
@inproceedings{bib_Gene_2002, AUTHOR = {Jawahar C V, Narayanan P J}, TITLE = {Generalised correlation for multi-feature correspondence}, BOOKTITLE = {Pattern Recognition}. YEAR = {2002}}
Computing correspondences between pairs of images is fundamental to all structures from motion algorithms. Correlation is a popular method to estimate similarity between patches of images. In the standard formulation, the correlation function uses only one feature such as the gray level values of a small neighbourhood. Research has shown that different features—such as colour, edge strength, corners, texture measures—work better under di0erent conditions. We propose a framework of generalized correlation that can compute a real valued similarity measure using a feature vector whose components can be dissimilar. The framework can combine the e0ects of di0erent image features, such as multi-spectral features, edges, corners, texture measures, etc., into a single similarity measure in a 3exible manner. Additionally, it can combine results of di0erent window sizes used for correlation with proper weighting for each. Relative importances of the features can be estimated from the image itself for accurate correspondence. In this paper, we present the framework of generalised correlation, provide a few examples demonstrating its power, as well as discuss the implementation issues. ? 2002 Published by Elsevier Science Ltd on behalf of Pattern Recognition Society.
Planar shape recognition across multiple views
SUJIT KUTHIRUMMAL,Jawahar C V,Narayanan P J
International conference on Pattern Recognition, ICPR, 2002
@inproceedings{bib_Plan_2002, AUTHOR = {SUJIT KUTHIRUMMAL, Jawahar C V, Narayanan P J}, TITLE = {Planar shape recognition across multiple views}, BOOKTITLE = {International conference on Pattern Recognition}. YEAR = {2002}}
Multiview studies in Computer Vision have concentrated on the constraints satisfied by individual primitives such as points and lines. Not much attention has been paid to the properties of a collection of primitives in multiple views, which could be studied in the spatial domain or in an appropriate transform domain. We derive an algebraic constraint for planar shape recognition across multiple views based on the rank of a matrix of Fourier domain descriptor coefficients of the shape in different views. We also show how correspondence between points on the boundary can be computed for matching shapes using the phase of a measure for recognition.
A multimedia-based City information system
KRANTHI KUMAR RAVI,Narayanan P J,Jawahar C V
IETE Technical Review, TR, 2001
@inproceedings{bib_A_mu_2001, AUTHOR = {KRANTHI KUMAR RAVI, Narayanan P J, Jawahar C V}, TITLE = {A multimedia-based City information system}, BOOKTITLE = {IETE Technical Review}. YEAR = {2001}}
Plenty of information is available today from different sources about a particular geographical area. However, none of it is reliable when different needs arise like for tourists, city planners, administrators, and lay people. We have designed a multimedia based city information system which attempts to solve this problem. In this paper we explain the design and implementation of our system. The paper discusses in detail the need for such a system, how it compares with similar existing systems and the features expected in such a system. It also discusses our implementation of the system and the future work we want to pursue in this direction
A flexible scheme for representation, matching, and retrieval of images
Jawahar C V,Narayanan P J,Subrata Rakshit
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2000
@inproceedings{bib_A_fl_2000, AUTHOR = {Jawahar C V, Narayanan P J, Subrata Rakshit}, TITLE = {A flexible scheme for representation, matching, and retrieval of images}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2000}}
Image databases index in them using features extracted from the image. The indexing scheme is decided apriori and is optimized for a specific querying criterion. This is not suitable for a generic database of images which may be queried by multiple users based on different criteria. IN this paper, we present a flexible scheme which adapts itself to the user 's preferences. Though the method uses a conservative set of features during indexing that includes a large number and type of fundamental features the query processing time does not increase due to this redundancy. A boot-strapping mechanism allows the user to build up a "desired class" from a few samples. The feature selection computation scales linearly with the size of the desired class, rather than that of the entire database. THis feature makes our algorithm viable for very large databases. We present an implementation of our scheme and some results from it.
Feature Integration and Selection for Pixel Correspondence
Jawahar C V,Narayanan P J
Indian Conference on Computer Vision, Graphics and Image Processing, ICVGIP, 2000
@inproceedings{bib_Feat_2000, AUTHOR = {Jawahar C V, Narayanan P J}, TITLE = {Feature Integration and Selection for Pixel Correspondence}, BOOKTITLE = {Indian Conference on Computer Vision, Graphics and Image Processing}. YEAR = {2000}}
Pixel correspondence is an important problem in stereo vision, motion, structure from motion, etc. Several procedures have been proposed in the literature for this problem, using a variety of image features to identify the corresponding features. Dierent features work wel l under dierent conditions. An algorithm that can seamlessly integrate multiple features in a flexible manner can combine the advantages of each. We propose a framework to combine heterogenous features, each with a dierent measure of importance, into a single correspondence computation in this paper. We also present an unsupervised procedure to select the optimal combination of features for a given pair of images by computing the relative importances of each feature. A unique aspect of our framework is that it is independent of the specic correspondence algorithm used. Optimal feature selection can be done using any correspondence mechanism that can be extended to use multiple features. We also present a few examples that demonstrate the eectiveness of the feature selection framework.