VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning
Madhavaram Vivek Vardhan,Vartika Sengar,Arkadipta De,Charu Sharma
Winter Conference on Applications of Computer Vision, WACV, 2026
@inproceedings{bib_VIZO_2026, AUTHOR = {Vardhan, Madhavaram Vivek and Sengar, Vartika and De, Arkadipta and Sharma, Charu }, TITLE = {VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}
Scene understanding and reasoning has been a fundamental problem in 3D computer vision, requiring models to identify objects, their properties, and spatial or comparative relationships among the objects. Existing approaches enable this by creating scene graphs using multiple inputs such as 2D images, depth maps, object labels, and annotated relationships from specific reference view. However, these methods often struggle with generalization and produce inaccurate spatial relationships like "left/right", which become inconsistent across different viewpoints. To address these limitations, we propose Viewpoint-Invariant ZerO-shot scene graph generation for 3D scene Reasoning (VIZOR). VIZOR is a training-free, end-to-end framework that constructs dense, viewpoint-invariant 3D scene graphs directly from raw 3D scenes. The generated scene graph is unambiguous, as spatial relationships are defined relative to each object’s front-facing direction, making them consistent regardless of the reference view. Furthermore, it infers open-vocabulary relationships that describe spatial and proximity relationships among scene objects without requiring annotated training data. We conduct extensive quantitative and qualitative evaluations to assess the effectiveness of VIZOR on scene graph generation and downstream tasks, such as query-based object grounding. VIZOR outperforms state-of-the-art methods, showing clear improvements in scene graph generation and achieving 22% and 4.81% gains in zero-shot grounding accuracy on the Replica and Nr3D datasets, respectively.
InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation
Sreehari Rajan,Kunal Kamalkishor Bhosikar,Charu Sharma
Winter Conference on Applications of Computer Vision, WACV, 2026
@inproceedings{bib_Inte_2026, AUTHOR = {Rajan, Sreehari and Bhosikar, Kunal Kamalkishor and Sharma, Charu }, TITLE = {InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}
Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speech-driven gestures or object interactions independently, limiting real-world applicability due to a lack of integrated, comprehensive datasets. To overcome this, we introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation. We achieve this by employing a multi-stage training process to learn a unified motion, speech, and prompt embedding space. To support this, we curate a rich human-object interaction dataset, formed by augmenting an existing text-to-motion dataset with detailed object interaction annotations. Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition, which is then dynamically combined during inference. To address the imbalance between heterogeneous conditioning signals, we propose an adaptive fusion strategy, which dynamically reweights the conditioning signals during diffusion sampling.
InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis, outperforming gesture-focused diffusion methods, yielding highly realistic, object-aware full-body motions with enhanced realism, flexibility, and control.(https://sreeharirajan.github.io/projects/InteracTalker/)
SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data
Ven Janaksinh Vanabhai,Charu Sharma,Syed Azeemuddin
Winter Conference on Applications of Computer Vision, WACV, 2026
@inproceedings{bib_SegM_2026, AUTHOR = {Vanabhai, Ven Janaksinh and Sharma, Charu and Azeemuddin, Syed }, TITLE = {SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}
Early-stage fruit yield prediction plays a key role in supporting timely agronomic decisions, enhancing market planning, and empowering farmers with data-driven insights. Over the years, most approaches to yield estimation have focused on fruit counting techniques, typically performed just before harvest. While these methods have proven useful, they often come into play late in the cultivation cycle, limiting their impact on early planning and resource optimization. In this work, we introduce a comprehensive baseline framework for predicting mango yield at an earlier stage - during flowering - using image-based learning. Our contributions are twofold. (i) Our approach combines a SegFormer-based segmentation model with a regression pipeline to estimate yield from images, while also exploring the role of contextual features such as weather and scale. (ii) This work introduces a novel benchmark and an enriched dataset, paving the way for scalable, automated tools that can assist farmers and stakeholders in making proactive decisions throughout the mango growing season. Our work demonstrates that for multi-modal yield prediction, integrating features that complement visual representations (like scale) can be more impactful than using features with a stronger standalone linear correlation (like weather). Our single-image model, based on the SegFormer-B1 encoder, achieved a mean absolute error (MAE) of 7.68, R² of 0.76, and mean squared error (MSE) of 115.48. These results highlight the promise of vision-based models for yield estimation from early-stage flowering cues. To the best of our knowledge, this is the first work to address the prediction of mango yield using images from the flowering stage and weather data.
LORETTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs
Himanshu Pal,Venkata Sai Pranav Bachina,Ankit Gangwal,Charu Sharma
Association for the Advancement of Artificial Intelligence, AAAI, 2026
@inproceedings{bib_LORE_2026, AUTHOR = {Pal, Himanshu and Bachina, Venkata Sai Pranav and Gangwal, Ankit and Sharma, Charu }, TITLE = {LORETTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence}. YEAR = {2026}}
Temporal Graph Neural Networks (TGNNs) are increasingly used in high-stakes domains, such as financial forecasting, recommendation systems, and fraud detection. However,
their susceptibility to poisoning attacks poses a critical security risk. We introduce LORETTA (Low Resource Twophase Temporal Attack), a novel adversarial framework on
Continuous-Time Dynamic Graphs, which degrades TGNN
performance by an average of 29.47% across 4 widely benchmark datasets and 4 State-of-the-Art (SotA) models.
LORETTA operates through a two-stage approach: (1) sparsify the graph by removing high-impact edges using any of the
16 tested temporal importance metrics, (2) strategically replace
removed edges with adversarial negatives via LORETTA’s
novel degree-preserving negative sampling algorithm. Our
plug-and-play design eliminates the need for expensive surrogate models while adhering to realistic unnoticeability constraints. LORETTA degrades performance by upto 42.0% on
MOOC, 31.5% on Wikipedia, 28.8% on UCI, and 15.6%on Enron. LORETTA outperforms 11 attack baselines, remains undetectable to 4 leading anomaly detection systems,
and is robust to 4 SotA adversarial defense training methods,
establishing its effectiveness, unnoticeability, and robustness.
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Vinit Mehta,Charu Sharma,Karthick Thiyagarajan
@inproceedings{bib_Larg_2025, AUTHOR = {Mehta, Vinit and Sharma, Charu and Thiyagarajan, Karthick }, TITLE = {Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy}, BOOKTITLE = {Sensors}. YEAR = {2025}}
With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason, and interact with complex environments through natural language and spatial understanding, bridging the gap between linguistic intelligence and spatial perception. This review provides a comprehensive analysis of state-of-the-art methodologies, applications, and challenges at the intersection of LLMs and 3D vision, with a focus on next-generation robotic sensing technologies. We first introduce the foundational principles of LLMs and 3D data representations, followed by an in depth examination of 3D sensing technologies critical for robotics. The review then explores key advancements in scene understanding, text-to-3D generation, object grounding, and embodied agents, highlighting cutting-edge techniques such as zero-shot 3D segmentation, dynamic scene synthesis, and language-guided manipulation. Furthermore, we discuss multimodal LLMs that integrate 3D data with touch, auditory, and thermal inputs, enhancing environmental comprehension and robotic decision-making. To support future research, we catalog benchmark datasets and evaluation metrics tailored for 3D-language and vision tasks. Finally, we identify key challenges and future research directions, including adaptive model architectures, enhanced cross-modal alignment, and real-time processing capabilities, which pave the way for more intelligent, context-aware, and autonomous robotic sensing systems.
MOGRAS: Human Motion with Grasping in 3D Scenes
Kunal Kamalkishor Bhosikar,Katageri Siddharth Gangadhar,Madhavaram Vivek Vardhan,Kai Han,Charu Sharma
British Machine Vision Conference, BMVC, 2025
@inproceedings{bib_MOGR_2025, AUTHOR = {Bhosikar, Kunal Kamalkishor and Gangadhar, Katageri Siddharth and Vardhan, Madhavaram Vivek and Han, Kai and Sharma, Charu }, TITLE = {MOGRAS: Human Motion with Grasping in 3D Scenes}, BOOKTITLE = {British Machine Vision Conference}. YEAR = {2025}}
Generating realistic full-body motion interacting with objects is critical for applications in robotics, virtual reality, and human-computer interaction. While existing methods can generate full-body motion within 3D scenes, they often lack the fidelity for fine-grained tasks like object grasping. Conversely, methods that generate precise grasping motions typically ignore the surrounding 3D scene. This gap, generating full-body grasping motions that are physically plausible within a 3D scene, remains a significant challenge. To address this, we introduce textbf{MOGRAS} (Human textbf{MO}tion with textbf{GRA}sping in 3D textbf{S}cenes), a large-scale dataset that bridges this gap. MOGRAS provides pre-grasping full-body walking motions and final grasping poses within richly annotated 3D indoor scenes. We leverage MOGRAS to benchmark existing full-body grasping methods and demonstrate their limitations in scene-aware generation. Furthermore, we propose a simple yet effective method to adapt existing approaches to work seamlessly within 3D scenes. Through extensive quantitative and qualitative experiments, we validate the effectiveness of our dataset and highlight the significant improvements our proposed method achieves, paving the way for more realistic human-scene interactions.
Reducing Misclassification Risk in Dynamic Graph Neural Networks through Abstention
Jayadratha Gayen,Himanshu Pal,Naresh Manwani,Charu Sharma
IEEE International Conference on Advances in Social Networks Analysis and Mining, ASONAM, 2025
@inproceedings{bib_Redu_2025, AUTHOR = {Gayen, Jayadratha and Pal, Himanshu and Manwani, Naresh and Sharma, Charu }, TITLE = {Reducing Misclassification Risk in Dynamic Graph Neural Networks through Abstention}, BOOKTITLE = {IEEE International Conference on Advances in Social Networks Analysis and Mining}. YEAR = {2025}}
Many real-world systems can be modeled as dynamic graphs, where nodes and edges evolve over time, requiring specialized models to capture their evolving dynamics in risk-sensitive applications effectively. Graph neural networks (GNNs) for temporal graphs are one such category of specialized models. For the first time, our approach integrates a reject option strategy within the framework of GNNs for continuous time dynamic graphs (CTDGs). This allows the model to strategically abstain from making predictions when the uncertainty is high and confidence is low, thus minimizing the risk of critical misclassification and enhancing the results and reliability. We propose a coverage-based abstention prediction model to implement the reject option that maximizes prediction within a specified coverage. It improves the prediction score for link prediction and node classification tasks. Temporal GNNs deal with extremely skewed datasets for the next state prediction or node classification task. In the case of class imbalance, our method can be further tuned to provide a higher weight to the minority class. Exhaustive experiments are presented on four datasets for dynamic link prediction and two datasets for dynamic node classification tasks. This demonstrates the effectiveness of our approach in improving the reliability and area under the curve (AUC)/average precision (AP) scores for predictions in dynamic graph scenarios. The results highlight our model’s ability to efficiently handle the trade-offs between prediction confidence and coverage, making it a dependable solution for applications requiring high precision in dynamic and uncertain environments.
Confidence First: Reliability-Driven Temporal Graph Neural Networks
Jayadratha Gayen,Himanshu Pal,Naresh Manwani,Charu Sharma
KNOWLEDGE DISCOVERY AND DATA MINING WORKSHOPS, KDD-W, 2025
@inproceedings{bib_Conf_2025, AUTHOR = {Gayen, Jayadratha and Pal, Himanshu and Manwani, Naresh and Sharma, Charu }, TITLE = {Confidence First: Reliability-Driven Temporal Graph Neural Networks}, BOOKTITLE = {KNOWLEDGE DISCOVERY AND DATA MINING WORKSHOPS}. YEAR = {2025}}
Many real-world systems can be modeled as dynamic graphs, where nodes and edges evolve over time, requiring specialized models to capture their evolving dynamics in risk-sensitive applications effectively. Graph neural networks (GNNs) for temporal graphs are one such category of specialized models. For the first time, our approach integrates a reject option strategy within the framework of GNNs for continuous-time dynamic graphs (CTDGs). This allows the model to strategically abstain from making predictions when the uncertainty is high and confidence is low, thus minimizing the risk of critical misclassification and enhancing the results and reliability. We propose a coverage-based abstention prediction model to implement the reject option that maximizes prediction within a specified coverage. It improves the prediction score for link prediction and node classification tasks. Temporal GNNs deal with extremely skewed datasets for the next state prediction or node classification task. In the case of class imbalance, our method can be further tuned to provide a higher weight to the minority class. Exhaustive experiments are presented on four datasets for dynamic link prediction and two datasets for dynamic node classification tasks. This demonstrates the effectiveness of our approach in improving the reliability and area under the curve (AUC)/average precision (AP) scores for predictions in dynamic graph scenarios. The results highlight our model's ability to efficiently handle the trade-offs between prediction confidence and coverage, making it a dependable solution for applications requiring high precision in dynamic and uncertain environments.
MangoSense: A time-series vision sensing dataset for mango tree segmentation and detection toward yield prediction
Ven Janaksinh Vanabhai,Charu Sharma,Syed Azeemuddin
Computers and Electronics in Agriculture, CEAG, 2025
@inproceedings{bib_Mang_2025, AUTHOR = {Vanabhai, Ven Janaksinh and Sharma, Charu and Azeemuddin, Syed }, TITLE = {MangoSense: A time-series vision sensing dataset for mango tree segmentation and detection toward yield prediction}, BOOKTITLE = {Computers and Electronics in Agriculture}. YEAR = {2025}}
Mangoes hold significant economic and cultural importance in India and globally, making accurate yield prediction crucial for optimizing domestic consumption, enhancing international trade, and supporting farmers in decision-making. Traditional yield estimation methods, such as manual counting, are labor-intensive, error-prone, and impractical for large-scale orchards, necessitating automated solutions. This study presents a novel time-series vision dataset for mango yield prediction, capturing images from 12 trees across eight spatial orientations (SW, S, SE, E, NE, N, NW, W) over 2.5 months to analyze fruit growth and flowering patterns. A high-resolution image dataset was created and annotated using a semi-automatic pipeline integrating YOLO for object detection and SAM for precise segmentation, significantly reducing manual annotation efforts. Benchmarking was conducted using state-of-the-art deep learning models for segmentation (DeepLabV3+, PSPNet, SegFormer, Swin-S, YOLO+SAM) and detection (Mask R-CNN, Faster R-CNN, DETR, YOLO). For flower segmentation, Swin-S achieved the best results with 67.35 IoU, followed closely by SegFormer with 66.44 IoU, while for fruitlet segmentation, YOLO+SAM obtained the highest IoU of 78.97. In detection tasks, YOLO achieved the best performance for both flowers and fruitlets, with 63.8 mAP50 and 80.5 mAP50, respectively. Additionally, segmentation is identified as a suitable approach for flower feature extraction in yield prediction models, with SegFormer emerging as a strong choice due to its lower computational cost. Furthermore, this study discusses the relationship between wind direction patterns and the flower-to-fruit conversion ratio, challenging previous research that attributed yield variations solely to canopy structure differences with respect to orientation.
Federated Spectral Graph Transformers Meet Neural Ordinary Differential Equations for Non-IID Graphs
Kishan Gurumurthy,Himanshu Pal,Charu Sharma
Transactions in Machine Learning Research, TMLR, 2025
@inproceedings{bib_Fede_2025, AUTHOR = {Gurumurthy, Kishan and Pal, Himanshu and Sharma, Charu }, TITLE = {Federated Spectral Graph Transformers Meet Neural Ordinary Differential Equations for Non-IID Graphs}, BOOKTITLE = {Transactions in Machine Learning Research}. YEAR = {2025}}
Graph Neural Network (GNN) research is rapidly advancing due to GNNs’ capacity to learn distributed representations from graph-structured data. However, centralizing large volumes of real-world graph data for GNN training is often impractical due to privacy concerns, regulatory restrictions, and commercial competition. Federated learning (FL), a distributed learning paradigm, offers a solution by preserving data privacy with collaborative model training. Despite progress in training huge vision and language models, federated learning for GNNs remains underexplored. To address this challenge, we present a novel method for federated learning on GNNs based on spectral GNNs equipped with neural ordinary differential equations (ODE) for better information capture, showing promising results across both homophilic and heterophilic graphs. Our approach effectively handles non-Independent and Identically Distributed (non-IID) data, while also achieving performance comparable to existing methods that only operate on IID data. It is designed to be privacy-preserving and bandwidth-optimized, making it suitable for real-world applications such as social network analysis, recommendation systems, and fraud detection, which often involve complex, non-IID, and heterophilic graph structures. Our results in the area of federated learning on non-IID heterophilic graphs demonstrate significant improvements while also achieving better performance on homophilic graphs. This work highlights the potential of federated learning in diverse and challenging graph settings.
Node Classification With Reject Option
Uday Bhaskar K,Jayadratha Gayen,Charu Sharma,Naresh Manwani
Transactions in Machine Learning Research, TMLR, 2025
@inproceedings{bib_Node_2025, AUTHOR = {K, Uday Bhaskar and Gayen, Jayadratha and Sharma, Charu and Manwani, Naresh }, TITLE = {Node Classification With Reject Option}, BOOKTITLE = {Transactions in Machine Learning Research}. YEAR = {2025}}
One of the key tasks in graph learning is node classification. While Graph neural networks have been used for various applications, their adaptivity to reject option settings has not been previously explored. In this paper, we propose NCwR, a novel approach to node classification in Graph Neural Networks (GNNs) with an integrated reject option. This allows the model to abstain from making predictions when uncertainty is high. We propose cost-based and coverage-based methods for classification with abstention in node classification settings using GNNs. We perform experiments using our method on three standard citation network datasets Cora, Citeseer and Pubmed and compare with relevant baselines. We also model the Legal judgment prediction problem on the ILDC dataset as a node classification problem, where nodes represent legal cases and edges represent citations. We further interpret the model by analyzing the cases in which it abstains from predicting and visualizing which part of the input features influenced this decision.
Higher Order Structures For Graph Explanations
Akshit Sinha,Sreeram Reddy Vennam,Charu Sharma,Ponnurangam Kumaraguru
AAAI Conference on Artificial Intelligence, AAAI, 2025
@inproceedings{bib_High_2025, AUTHOR = {Sinha, Akshit and Vennam, Sreeram Reddy and Sharma, Charu and Kumaraguru, Ponnurangam }, TITLE = {Higher Order Structures For Graph Explanations}, BOOKTITLE = {AAAI Conference on Artificial Intelligence}. YEAR = {2025}}
Graph Neural Networks (GNNs) have emerged as powerful tools for learning representations of graph-structured data, demonstrating remarkable performance across various tasks. Recognising their importance, there has been extensive research focused on explaining GNN predictions, aiming to enhance their interpretability and trustworthiness. However, GNNs and their explainers face a notable challenge: graphs are primarily designed to model pair-wise relationships between nodes, which can make it tough to capture higher-order, multi-node interactions. This characteristic can pose difficulties for existing explainers in fully representing multi-node relationships. To address this gap, we present Framework For Higher-Order Representations In Graph Explanations (FORGE), a framework that enables graph explainers to capture such interactions by incorporating higher-order structures, resulting in more accurate and faithful explanations. Extensive evaluation shows that on average real-world datasets from the GraphXAI benchmark and synthetic datasets across various graph explainers, FORGE improves average explanation accuracy by 1.9x and 2.25x, respectively. We perform ablation studies to confirm the importance of higher-order relations in improving explanations, while our scalability analysis demonstrates FORGE's efficacy on large graphs.
Node Classification With Integrated Reject Option For Legal Judgement Prediction
Uday Bhaskar K,Jayadratha Gayen,Charu Sharma,Naresh Manwani
Association for the Advancement of Artificial Intelligence Workshop, AAAI-W, 2025
@inproceedings{bib_Node_2025, AUTHOR = {K, Uday Bhaskar and Gayen, Jayadratha and Sharma, Charu and Manwani, Naresh }, TITLE = {Node Classification With Integrated Reject Option For Legal Judgement Prediction}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence Workshop}. YEAR = {2025}}
One of the key tasks in graph learning is node classification. While Graph neural networks have been used for various applications, their adaptivity to reject option setting is not previously explored. In this paper, we propose NCwR, a novel approach to node classification in Graph Neural Networks (GNNs) with an integrated reject option, which allows the model to abstain from making predictions when uncertainty is high. We propose both cost-based and coverage-based methods for classification with abstention in node classification setting using GNNs. We perform experiments using our method on three standard citation network datasets Cora, Citeseer and Pubmed and compare with relevant baselines. We also model the Legal judgment prediction problem on ILDC dataset as a node classification problem where nodes represent legal cases and edges represent citations. We further interpret the model by analyzing the cases that the model abstains from predicting by visualizing which part of the input features influenced this decision.
Adversarial Learning based Knowledge Distillation on 3D Point Clouds
Sanjay S J,Akash J,Sreehari Rajan,Dimple A Shajahan,Charu Sharma
Winter Conference on Applications of Computer Vision, WACV, 2025
@inproceedings{bib_Adve_2025, AUTHOR = {J, Sanjay S and J, Akash and Rajan, Sreehari and Shajahan, Dimple A and Sharma, Charu }, TITLE = {Adversarial Learning based Knowledge Distillation on 3D Point Clouds}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2025}}
The significant improvements in point cloud representation learning have increased its applicability in many real-life applications, resulting in the need for lightweight, better-performing models. One widely proposed efficient method is knowledge distillation, where a lightweight model uses knowledge from large models. Very few works exist on distilling the knowledge for point clouds. Most of the work focuses on cross-modal-based approaches that make the method expensive to train. This paper proposes PointKAD, an adversarial knowledge distillation framework for point cloud-based tasks. PointKAD includes adversarial feature distillation and response distillation with the help of discriminators to extract and distill the representation of feature maps and logits. We conduct extensive experimental studies on both synthetic (ModelNet40) and real (ScanObjectNN) datasets to show that PointKAD achieves state-of-the-art results compared to the existing knowledge distillation methods for point cloud classification. Additionally, we present results on the part segmentation task, highlighting the efficacy of the PointKAD framework. Our experiments further reveal that PointKAD is capable of transferring knowledge across different tasks and datasets, showcasing its versatility. Furthermore, we demonstrate that PointKAD can be applied to a cross-modal training setup, achieving competitive performance with cross-modal-based point cloud methods for classification.
Towards a Training Free Approach for 3D Scene Editing
Madhavaram Vivek Vardhan,Shivangana Rawat,Chaitanya Devaguptapu,Charu Sharma,Manohar Kaul
Winter Conference on Applications of Computer Vision, WACV, 2025
Abs | | bib Tex
@inproceedings{bib_Towa_2025, AUTHOR = {Vardhan, Madhavaram Vivek and Rawat, Shivangana and Devaguptapu, Chaitanya and Sharma, Charu and Kaul, Manohar }, TITLE = {Towards a Training Free Approach for 3D Scene Editing}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2025}}
Text driven diffusion models have shown remarkable capabilities in editing images. However, when editing 3D scenes, existing works mostly rely on training a NeRF for 3D editing. Recent NeRF editing methods leverages edit operations by deploying 2D diffusion models and project these edits into 3D space. They require strong positional priors alongside text prompt to identify the edit location. These methods are operational on small 3D scenes and are more generalized to particular scene. They require training for each specific edit and cannot be exploited in real-time edits. To address these limitations, we propose a novel method, FreeEdit, to make edits in training free manner using mesh representations as a substitute for NeRF. Training-free methods are now a possibility because of the advances in foundation model’s space. We leverage these models to bring a training-free alternative and introduce solutions for insertion, replacement and deletion. We consider insertion, replacement and deletion as basic blocks for performing intricate edits with certain combinations of these operations. Given a text prompt and a 3D scene, our model is capable of identifying what object should be inserted/replaced or deleted and location where edit should be performed. We also introduce a novel algorithm as part of FreeEdit to find the optimal location on grounding object for placement. We evaluate our model by comparing it with baseline models on a wide range of scenes using quantitative and qualitative metrics and showcase the merits of our method with respect to others.
Coverage Path Planning using Multiple AUVs with Nadir Gap
Nikhil Chandak,Charu Sharma,Kamalakar Karlapalem
Autonomous Robots and Multirobot Systems Workshop, ARMS-W, 2024
@inproceedings{bib_Cove_2024, AUTHOR = {Chandak, Nikhil and Sharma, Charu and Karlapalem, Kamalakar }, TITLE = {Coverage Path Planning using Multiple AUVs with Nadir Gap}, BOOKTITLE = {Autonomous Robots and Multirobot Systems Workshop}. YEAR = {2024}}
Autonomous Underwater Vehicles (AUVs) play a vital role in explor- ing and mapping underwater environments. However, the presence of nadir gaps, or blind zones, in commercial AUVs can lead to unex- plored areas during mission execution, limiting their effectiveness. Our work addresses the challenges of path planning in the presence of nadir gaps and presents scalable coverage strategies for AUVs minimizing either the mission completion time or the total number of turns performed while ensuring complete exploration, eliminating the risk of leaving critical areas unexplored. We provide provably complete strategies and perform extensive simulations on diverse input configurations based on real-world instances to demonstrate the efficacy of our strategies.
Autonomous Inspection of High-Rise Buildings for Façade Detection and 3D Modeling Using UAVs
Prayushi Mathur,Charu Sharma,Azeemuddin Syed
IEEE Access, ACCESS, 2024
@inproceedings{bib_Auto_2024, AUTHOR = {Mathur, Prayushi and Sharma, Charu and Syed, Azeemuddin }, TITLE = {Autonomous Inspection of High-Rise Buildings for Façade Detection and 3D Modeling Using UAVs}, BOOKTITLE = {IEEE Access}. YEAR = {2024}}
Given the current emphasis on maintaining and inspecting high-rise buildings, conventional inspection approaches are costly, slow, error-prone, and labor-intensive due to manual processes and lack of automation. In this paper, we provide an automated, periodic, accurate and economical solution for the inspection of such buildings on real-world images. We propose a novel end-to-end integrated autonomous pipeline for building inspection which consists of three modules: i) Autonomous Drone Navigation, ii) Façade Detection, and iii) Model Construction. Our first module computes a collision-free trajectory for the UAV around the building for surveillance. The images captured in this step are used for façade detection and 3D building model construction. The façade detection module is a deep learning-based object detection method which detects cracks. Finally, the model construction module focuses on reconstructing a 3D model of a building from captured images to mark the corresponding cracks on the 3D model for efficient and accurate inferences from the inspection. We conduct experiments for each module, including collision avoidance for drone navigation, façade detection, model construction and mapping. Our experimental analysis shows the promising performance of i) our crack detection model with a precision and recall of 0.95 and mAP score of 0.96; ii) our 3D reconstruction method includes finer details of the building without having additional information on the sequence of images; and iii) our 2D-3D mapping to compute the original location/world coordinates of cracks for a building.
Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation
Katageri Siddharth Gangadhar,Arkadipta De,Chaitanya Devaguptapu,VSSV Prasad,Charu Sharma,Manohar Kaul
Winter Conference on Applications of Computer Vision, WACV, 2024
@inproceedings{bib_Syne_2024, AUTHOR = {Gangadhar, Katageri Siddharth and De, Arkadipta and Devaguptapu, Chaitanya and Prasad, VSSV and Sharma, Charu and Kaul, Manohar }, TITLE = {Synergizing Contrastive Learning and Optimal Transport for 3D Point Cloud Domain Adaptation}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2024}}
Recently, the fundamental problem of unsupervised domain adaptation (UDA) on 3D point clouds has been motivated by a wide variety of applications in robotics, virtual reality, and scene understanding, to name a few. The point cloud data acquisition procedures manifest themselves as significant domain discrepancies and geometric variations among both similar and dissimilar classes. The standard domain adaptation methods developed for images do not directly translate to point cloud data because of their complex geometric nature. To address this challenge, we leverage the idea of multimodality and alignment between distributions. We propose a new UDA architecture for point cloud classification that benefits from multimodal contrastive learning to get better class separation in both domains individually. Further, the use of optimal transport (OT) aims at learning source and target data distributions jointly to reduce the cross-domain shift and provide a better alignment. We conduct a comprehensive empirical study on PointDA-10 and GraspNetPC-10 and show that our method achieves state-of-the-art performance on GraspNetPC-10 (with ≈ 4-12% margin) and best average performance on PointDA-10. Our ablation studies and decision boundary analysis also validate the significance of our contrastive learning module and OT alig
Metric Learning for 3D Point Clouds Using Optimal Transport
Katageri Siddharth Gangadhar,Srinjay Sarkar ,Charu Sharma
Winter Conference on Applications of Computer Vision Workshops, WACV-W, 2024
@inproceedings{bib_Metr_2024, AUTHOR = {Gangadhar, Katageri Siddharth and , Srinjay Sarkar and Sharma, Charu }, TITLE = {Metric Learning for 3D Point Clouds Using Optimal Transport}, BOOKTITLE = {Winter Conference on Applications of Computer Vision Workshops}. YEAR = {2024}}
Learning embeddings of any data largely depends on the ability of the target space to capture semantic relations. The widely used Euclidean space, where embeddings are represented as point vectors, is known to be lacking in its potential to exploit complex structures and relations. Contrary to standard Euclidean embeddings, in this work, we embed point clouds as discrete probability distributions in Wasserstein space. We build a contrastive learning setup to learn Wasserstein embeddings that can be used as a pre-training method with or without supervision towards any downstream task. We show that the features captured by Wasserstein embeddings are better in preserving the point cloud geometry, including both global and local information, thus resulting in improved quality embeddings. We perform exhaustive experiments and demonstrate the effectiveness of our method for point cloud classification, transfer learning, segmentation, and interpolation tasks over multiple datasets including synthetic and realworld objects. We also compare against recent methods that use Wasserstein space and show that our method outperforms them in all downstream tasks. Additionally, our study reveals a promising interpretation of capturing critical points of point clouds that makes our proposed method self-explainable.
GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering
Dhaval Taunk,Lakshya Khanna,Kandru Siri Venkata Pavan Kumar,Vasudeva Varma Kalidindi,Charu Sharma,Makarand Tapaswi
WWW Workshop on Natural Language Processing for Knowledge Graph Construction, NLP4KGc, 2023
@inproceedings{bib_Grap_2023, AUTHOR = {Taunk, Dhaval and Khanna, Lakshya and Kumar, Kandru Siri Venkata Pavan and Kalidindi, Vasudeva Varma and Sharma, Charu and Tapaswi, Makarand }, TITLE = {GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering}, BOOKTITLE = {WWW Workshop on Natural Language Processing for Knowledge Graph Construction}. YEAR = {2023}}
Commonsense question-answering (QA) methods combine the power of pre-trained Language Models (LM) with the reasoning provided by Knowledge Graphs (KG). A typical approach collects nodes relevant to the QA pair from a KG to form a Working Graph (WG) followed by reasoning using Graph Neural Networks (GNNs). This faces two major challenges: (i) it is difficult to capture all the information from the QA in the WG, and (ii) the WG contains some irrelevant nodes from the KG. To address these, we propose GrapeQA with two simple improvements on the WG: (i) Prominent Entities for Graph Augmentation identifies relevant text chunks from the QA pair and augments the WG with corresponding latent representations from the LM, and (ii) ContextAware Node Pruning removes nodes that are less relevant to the QA pair. We evaluate our results on OpenBookQA, CommonsenseQA and MedQA-USMLE and see that GrapeQA shows consistent improvements over its LM + KG predecessor (QA-GNN in particular) and large improvements on OpenBookQA.
JobXMLC: EXtreme Multi-Label Classification of Job Skills with Graph Neural Networks
Nidhi Goyal,Jushaan Singh Kalra,Charu Sharma,Raghava Mutharaju,Niharika Sachdeva,Ponnurangam Kumaraguru
Conference of the European Chapter of the Association for Computational Linguistics (EACL), EACL, 2023
@inproceedings{bib_JobX_2023, AUTHOR = {Goyal, Nidhi and Kalra, Jushaan Singh and Sharma, Charu and Mutharaju, Raghava and Sachdeva, Niharika and Kumaraguru, Ponnurangam }, TITLE = {JobXMLC: EXtreme Multi-Label Classification of Job Skills with Graph Neural Networks}, BOOKTITLE = {Conference of the European Chapter of the Association for Computational Linguistics (EACL)}. YEAR = {2023}}
Writing a good job description is an important step in the online recruitment process to hire the best candidates. Most recruiters forget to include some relevant skills in the job description. These missing skills affect the performance of recruitment tasks such as job suggestions, job search, candidate recommendations, etc. Existing approaches are limited to contextual modelling, do not exploit inter-relational structures like job-job and job-skill relationships, and are not scalable. In this paper, we exploit these structural relationships using a graph-based approach. We propose a novel skill prediction framework called JobXMLC, which uses graph neural networks with skill attention to predict missing skills using job descriptions. JobXMLC enables joint learning over a job-skill graph consisting of 22.8K entities (jobs and skills) and 650K relationships. We experiment with real-world recruitment datasets to evaluate our proposed approach. We train JobXMLC on 20, 298 jobs and 2, 548 skills within 30 minutes on a single GPU machine. JobXMLC outperforms the state-of-the-art approaches by 6% on precision and 3% on recall. JobXMLC is 18X faster for training tasks and up to 634X faster in skill prediction on benchmark datasets enabling JobXMLC to scale up on larger datasets. We have made our code and dataset public at https://precog.iiit.ac.in/resources.html.
An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy
Anmol Goel,Charu Sharma,Ponnurangam Kumaraguru
Conference on Empirical Methods in Natural Language Processing, EMNLP, 2022
@inproceedings{bib_An_U_2022, AUTHOR = {Goel, Anmol and Sharma, Charu and Kumaraguru, Ponnurangam }, TITLE = {An Unsupervised, Geometric and Syntax-aware Quantification of Polysemy}, BOOKTITLE = {Conference on Empirical Methods in Natural Language Processing}. YEAR = {2022}}
Polysemy is the phenomenon where a single word form possesses two or more related senses. It is an extremely ubiquitous part of natural language and analyzing it has sparked rich discussions in the linguistics, psychology and philosophy communities alike. With scarce attention paid to polysemy in computational linguistics, and even scarcer attention toward quantifying polysemy, in this paper, we propose a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages. We infuse our proposed quantification with syntactic knowledge in the form of dependency structures. This informs the final polysemy scores of the lexicon motivated by recent linguistic findings that suggest there is an implicit relation between syntax and ambiguity/polysemy. We adopt a graph based approach by computing the discrete Ollivier Ricci curvature on a graph of the contextual nearest neighbors. We test our framework on curated datasets controlling for different sense distributions of words in 3 typologically diverse languages - English, French and Spanish. The effectiveness of our framework is demonstrated by significant correlations of our quantification with expert human annotated language resources like WordNet. We observe a 0.3 point increase in the correlation coefficient as compared to previous quantification studies in English. Our research leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy