IIITH

Graph Signal Processing-Based Road Object Detection Using mmWave Radar for ADAS Application

IEEE Internet of Things Journal, IOT, 2026

Core Rank : - Google Rank :186

Abs PDF bibTex

@inproceedings{bib_Grap_2026, AUTHOR = {Mohan, Anand and Meena, Hemant Kumar and Wajid, Mohd and Srivastava, Abhishek }, TITLE = {Graph Signal Processing-Based Road Object Detection Using mmWave Radar for ADAS Application}, BOOKTITLE = {IEEE Internet of Things Journal}. YEAR = {2026}}

Graph Signal Processing-Based Road Object Detection Using mmWave Radar for ADAS Application

Abstract

Reliable road object detection is crucial for self-driving cars and intelligent transportation systems. However, vision-only techniques fail in low-visibility conditions such as fog, rain, or nightfall, whereas deep radar-based models frequently suffer from high latency and energy consumption. To overcome these restrictions, we propose a millimeter-wave (mmWave) radar-based object identification system based on a PYNQ-ZU (Field-Programmable Gate Array) FPGA that achieves a compromise between accuracy, efficiency, and real-time performance. Our proposed pipeline transforms 3D radar point clouds to 2D Top-View (TV) representations and uses the Spectral Graph Wavelet Transform (SGWT) to extract discriminative spatial-temporal features with little computational cost. A Random Forest (RF) classifier achieves 97% accuracy across seven object classes with an end-to-end latency of 61 ms. When it comes to power performance, the FPGA implementation uses 119.32 mW at idle, 268.48 mW on average, and 357.97 mW at its peak, demonstrating its energy efficiency under dynamic workloads. Compared with deep learning alternatives like Tiny Convolutional Neural Network (CNN) (73.60 MB, 4493 mW, 449 ms), the model size is 6.47 MB, which is substantially smaller and more power-efficient. By overcoming the limitations of vision-only systems (poor visibility) and deep radar models (high latency, energy, and memory), the proposed SGWT + RF framework enables real-time, multi-class detection in resource-constrained environments—offering a practical and robust solution for autonomous navigation and traffic monitoring applications.

DFORD: Directional Feedback based Online Ordinal Regression Learning

International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2026

Core Rank : A* Google Rank :54

Abs PDF DOI bibTex

@inproceedings{bib_DFOR_2026, AUTHOR = {Manwani, Naresh and Elamparithy, M and Taneja, Tanish }, TITLE = {DFORD: Directional Feedback based Online Ordinal Regression Learning}, BOOKTITLE = {International Conference on Autonomous Agents and Multiagent Systems}. YEAR = {2026}}

DFORD: Directional Feedback based Online Ordinal Regression Learning

Abstract

In this paper, we introduce directional feedback in the ordinal regression setting, in which the learner receives feedback on whether the predicted label is on the left or the right side of the actual label. This is a weak supervision setting for ordinal regression compared to the full information setting, where the learner can access the labels. We propose an online algorithm for ordinal regression using directional feedback. The proposed algorithm uses an exploration-exploitation scheme to learn from directional feedback efficiently. Furthermore, we introduce its kernel-based variant to learn non-linear ordinal regression models in an online setting. We use a truncation trick to make the kernel implementation more memory efficient. The proposed algorithm maintains the ordering of the thresholds in the expected sense. Moreover, it achieves the expected regret of O(logT). We compare our approach with a full information and a weakly supervised algorithm for ordinal regression on synthetic and real-world datasets. The proposed approach, which learns using directional feedback, performs comparably (sometimes better) to its full information counterpart.

MonoMPC: Monocular Vision Based Navigation With Learned Collision Model and Risk-Aware Model Predictive Control

IEEE Robotics and Automation Letters, RAL, 2026

Core Rank : - Google Rank :117

Abs PDF DOI bibTex

@inproceedings{bib_Mono_2026, AUTHOR = {Sharma, Basant and Jadhav, Prajyot and Paul, Pranjal and Krishna, K Madhava and Singh, Arun Kumar }, TITLE = {MonoMPC: Monocular Vision Based Navigation With Learned Collision Model and Risk-Aware Model Predictive Control}, BOOKTITLE = {IEEE Robotics and Automation Letters}. YEAR = {2026}}

MonoMPC: Monocular Vision Based Navigation With Learned Collision Model and Risk-Aware Model Predictive Control

Abstract

Navigating unknown environments with a single RGB camera is challenging, as the lack of depth information prevents reliable collision-checking. While some methods use estimated depth to build collision maps, we found that depth estimates from vision foundation models are too noisy for zero-shot navigation in cluttered environments. We propose an alternative approach: instead of using noisy estimated depth for direct collision-checking, we use it as a rich context input to a learned collision model. This model predicts the distribution of minimum obstacle clearance that the robot can expect for a given control sequence. At inference, these predictions inform a risk-aware MPC planner that minimizes estimated collision risk. We proposed a joint learning pipeline that co-trains the collision model and risk metric using both safe and unsafe trajectories. Crucially, our joint-training ensures well calibrated uncertainty in our collision model that improves navigation in highly cluttered environments. Consequently, real-world experiments show reductions in collision-rate and improvements in goal reaching and speed over several strong baselines.

A Current Injection Based Constant-gm Rail to Rail OTA Achieving Uniform Small And Large Signal Behaviour

International Conference on VLSI Design, VLSID, 2026

Core Rank : - Google Rank :-

Abs PDF bibTex

@inproceedings{bib_A_Cu_2026, AUTHOR = {Khan, Mohammed Hammad and Zope, Saurabh and Acharyya, Ishan and Srivastava, Abhishek }, TITLE = {A Current Injection Based Constant-gm Rail to Rail OTA Achieving Uniform Small And Large Signal Behaviour}, BOOKTITLE = {International Conference on VLSI Design}. YEAR = {2026}}

A Current Injection Based Constant-gm Rail to Rail OTA Achieving Uniform Small And Large Signal Behaviour

Abstract

This paper presents a constant gm, 1.2 V operational transconductance amplifier (OTA) designed in TSMC 65 nm CMOS technology. The proposed OTA achieves a constantgm characteristic over a full rail-to-rail input/output range by employing a constant gm rail-to-rail input stage, a folded cascode summing stage, and a quiescent current controlled Class-AB output stage. The input stage comprises complementary NMOS and PMOS differential pairs connected in parallel to achieve rail-to-rail input operation. Conventional parallel input stages suffer from significant gm variation with input common-mode voltage, leading to changes in small and large signal parameters such as open-loop gain, slew rate and unity-gain bandwidth (UGB). In this work, a current-injection mechanism dynamically adjusts the active input pair’s bias, maintaining a nearly constant transconductance across the entire common-mode range. Postlayout simulation shows that the proposed OTA achieves less than 4.8% gm variation across the entire input range while maintaining uniform large signal characteristics. Performance metrics include a DC gain of 80–87 dB, UGB of 9.72 MHz, phase margin of 64°, CMRR of 97.9 dB, and a power consumption of 260 μW when driving a 100 pF capacitive load.

Let Leaders Play Games: Improving Timing in Leader-based Consensus

International Conference on Autonomous Agents and Multiagent Systems, AAMAS, 2026

Core Rank : A* Google Rank :54

Abs bibTex

@inproceedings{bib_Let__2026, AUTHOR = {Ahmed, Mohammad Rasheed and Desai, Parth Nimish and Gujar, Sujit Prakash }, TITLE = {Let Leaders Play Games: Improving Timing in Leader-based Consensus}, BOOKTITLE = {International Conference on Autonomous Agents and Multiagent Systems}. YEAR = {2026}}

Let Leaders Play Games: Improving Timing in Leader-based Consensus

Abstract

Propagation latency is inherent to any distributed network, including blockchains. Typically, blockchain protocols allow some timing buffer for block propagation in the network. In leader-based blockchains, the leader -- block proposer -- is known a priori for each slot. A fast (or low-latency) proposer may delay the block proposal in anticipation of more rewards from the transactions that otherwise would have been in the subsequent block. Deploying such a strategy by manipulating the timing is known as timing games. It increases the risk of missed blocks due to reduced time for other nodes to vote on the block, affecting the overall efficiency of the blockchain. Additionally, as the proposers who play timing games essentially steal MEV that otherwise would have gone to the next block, it is unfair to the subsequent block-proposers. We propose a dual block-proposal mechanism, 2-Prop to curtail the timing games. 2-Prop selects two proposers per slot to propose blocks, out of which one is finalized. We design a reward-sharing policy for the proposers based on how fast these blocks are propagated to avoid strategic deviations. In the induced game, which we call the Latency Game, we show that it is a Nash Equilibrium for the proposers to propose the block as quickly as possible if both are under the same network conditions. Even under disparate network conditions, we study many configurations. Our analysis shows that a faster proposer would prefer not to delay unless the other proposer is extremely slow. Thus, the efficacy of 2-Prop in mitigating the effect of timing games is established.

STRinGS: Selective Text Refinement in Gaussian Splatting

Winter Conference on Applications of Computer Vision, WACV, 2026

Core Rank : - Google Rank :109

Abs PDF DOI bibTex

@inproceedings{bib_STRi_2026, AUTHOR = {Raundhal, Abhinav Digambar and Behera, Gaurav and J, Narayanan P and Sarvadevabhatla, Ravi Kiran and Tapaswi, Makarand }, TITLE = {STRinGS: Selective Text Refinement in Gaussian Splatting}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}

STRinGS: Selective Text Refinement in Gaussian Splatting

Abstract

Text as signs, labels, or instructions is a critical element of real-world scenes as they can convey important contextual information. 3D representations such as 3D Gaussian Splatting (3DGS) struggle to preserve fine-grained text details, while achieving high visual fidelity. Small errors in textual element reconstruction can lead to significant semantic loss. We propose STRinGS, a text-aware, selective refinement framework to address this issue for 3DGS reconstruction. Our method treats text and non-text regions separately, refining text regions first and merging them with non-text regions later for full-scene optimization. STRinGS produces sharp, readable text even in challenging configurations. We introduce a text readability measure OCR Character Error Rate (CER) to evaluate the efficacy on text regions. STRinGS results in a 63.6% relative improvement over 3DGS at just 7K iterations. We also introduce a curated dataset STRinGS-360 with diverse text scenarios to evaluate text readability in 3D reconstruction. Our method and dataset together push the boundaries of 3D scene understanding in text-rich environments, paving the way for more robust text-aware reconstruction methods.

VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning

IEEE Workshop on Applications of Computer Vision, IEEE WACV, 2026

Core Rank : A Google Rank :-

Abs PDF bibTex

@inproceedings{bib_VIZO_2026, AUTHOR = {Vardhan, Madhavaram Vivek and Sengar, Vartika and De, Arkadipta and Sharma, Charu }, TITLE = {VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning}, BOOKTITLE = {IEEE Workshop on Applications of Computer Vision}. YEAR = {2026}}

VIZOR: Viewpoint-Invariant Zero-Shot Scene Graph Generation for 3D Scene Reasoning

Abstract

Scene understanding and reasoning has been a fundamental problem in 3D computer vision, requiring models to identify objects, their properties, and spatial or comparative relationships among the objects. Existing approaches enable this by creating scene graphs using multiple inputs such as 2D images, depth maps, object labels, and annotated relationships from specific reference view. However, these methods often struggle with generalization and produce inaccurate spatial relationships like "left/right", which become inconsistent across different viewpoints. To address these limitations, we propose Viewpoint-Invariant ZerO-shot scene graph generation for 3D scene Reasoning (VIZOR). VIZOR is a training-free, end-to-end framework that constructs dense, viewpoint-invariant 3D scene graphs directly from raw 3D scenes. The generated scene graph is unambiguous, as spatial relationships are defined relative to each object’s front-facing direction, making them consistent regardless of the reference view. Furthermore, it infers open-vocabulary relationships that describe spatial and proximity relationships among scene objects without requiring annotated training data. We conduct extensive quantitative and qualitative evaluations to assess the effectiveness of VIZOR on scene graph generation and downstream tasks, such as query-based object grounding. VIZOR outperforms state-of-the-art methods, showing clear improvements in scene graph generation and achieving 22% and 4.81% gains in zero-shot grounding accuracy on the Replica and Nr3D datasets, respectively.

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

Winter Conference on Applications of Computer Vision, WACV, 2026

Core Rank : - Google Rank :109

Abs PDF bibTex

@inproceedings{bib_Inte_2026, AUTHOR = {Rajan, Sreehari and Bhosikar, Kunal Kamalkishor and Sharma, Charu }, TITLE = {InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}

InteracTalker: Prompt-Based Human-Object Interaction with Co-Speech Gesture Generation

Abstract

Generating realistic human motions that naturally respond to both spoken language and physical objects is crucial for interactive digital experiences. Current methods, however, address speech-driven gestures or object interactions independently, limiting real-world applicability due to a lack of integrated, comprehensive datasets. To overcome this, we introduce InteracTalker, a novel framework that seamlessly integrates prompt-based object-aware interactions with co-speech gesture generation. We achieve this by employing a multi-stage training process to learn a unified motion, speech, and prompt embedding space. To support this, we curate a rich human-object interaction dataset, formed by augmenting an existing text-to-motion dataset with detailed object interaction annotations. Our framework utilizes a Generalized Motion Adaptation Module that enables independent training, adapting to the corresponding motion condition, which is then dynamically combined during inference. To address the imbalance between heterogeneous conditioning signals, we propose an adaptive fusion strategy, which dynamically reweights the conditioning signals during diffusion sampling. InteracTalker successfully unifies these previously separate tasks, outperforming prior methods in both co-speech gesture generation and object-interaction synthesis, outperforming gesture-focused diffusion methods, yielding highly realistic, object-aware full-body motions with enhanced realism, flexibility, and control.(https://sreeharirajan.github.io/projects/InteracTalker/)

SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data

Winter Conference on Applications of Computer Vision, WACV, 2026

Core Rank : - Google Rank :109

Abs PDF bibTex

@inproceedings{bib_SegM_2026, AUTHOR = {Vanabhai, Ven Janaksinh and Sharma, Charu and Azeemuddin, Syed }, TITLE = {SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data}, BOOKTITLE = {Winter Conference on Applications of Computer Vision}. YEAR = {2026}}

SegMango: Early Deep Mango Yield Prediction based on Flower Segmentation and Weather Data

Abstract

Early-stage fruit yield prediction plays a key role in supporting timely agronomic decisions, enhancing market planning, and empowering farmers with data-driven insights. Over the years, most approaches to yield estimation have focused on fruit counting techniques, typically performed just before harvest. While these methods have proven useful, they often come into play late in the cultivation cycle, limiting their impact on early planning and resource optimization. In this work, we introduce a comprehensive baseline framework for predicting mango yield at an earlier stage - during flowering - using image-based learning. Our contributions are twofold. (i) Our approach combines a SegFormer-based segmentation model with a regression pipeline to estimate yield from images, while also exploring the role of contextual features such as weather and scale. (ii) This work introduces a novel benchmark and an enriched dataset, paving the way for scalable, automated tools that can assist farmers and stakeholders in making proactive decisions throughout the mango growing season. Our work demonstrates that for multi-modal yield prediction, integrating features that complement visual representations (like scale) can be more impactful than using features with a stronger standalone linear correlation (like weather). Our single-image model, based on the SegFormer-B1 encoder, achieved a mean absolute error (MAE) of 7.68, R² of 0.76, and mean squared error (MSE) of 115.48. These results highlight the promise of vision-based models for yield estimation from early-stage flowering cues. To the best of our knowledge, this is the first work to address the prediction of mango yield using images from the flowering stage and weather data.

LORETTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs

Association for the Advancement of Artificial Intelligence, AAAI, 2026

Core Rank : A* Google Rank :220

Abs PDF DOI bibTex

@inproceedings{bib_LORE_2026, AUTHOR = {Pal, Himanshu and Bachina, Venkata Sai Pranav and Gangwal, Ankit and Sharma, Charu }, TITLE = {LORETTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs}, BOOKTITLE = {Association for the Advancement of Artificial Intelligence}. YEAR = {2026}}

LORETTA: A Low Resource Framework To Poison Continuous Time Dynamic Graphs

Abstract

Temporal Graph Neural Networks (TGNNs) are increasingly used in high-stakes domains, such as financial forecasting, recommendation systems, and fraud detection. However, their susceptibility to poisoning attacks poses a critical security risk. We introduce LORETTA (Low Resource Twophase Temporal Attack), a novel adversarial framework on Continuous-Time Dynamic Graphs, which degrades TGNN performance by an average of 29.47% across 4 widely benchmark datasets and 4 State-of-the-Art (SotA) models. LORETTA operates through a two-stage approach: (1) sparsify the graph by removing high-impact edges using any of the 16 tested temporal importance metrics, (2) strategically replace removed edges with adversarial negatives via LORETTA’s novel degree-preserving negative sampling algorithm. Our plug-and-play design eliminates the need for expensive surrogate models while adhering to realistic unnoticeability constraints. LORETTA degrades performance by upto 42.0% on MOOC, 31.5% on Wikipedia, 28.8% on UCI, and 15.6%on Enron. LORETTA outperforms 11 attack baselines, remains undetectable to 4 leading anomaly detection systems, and is robust to 4 SotA adversarial defense training methods, establishing its effectiveness, unnoticeability, and robustness.