IIIT

Towards a Task-Agnostic Architecture for Natural Language Processing using Attention Networks

Author(s): Rahothvarman P
Advisor(s): Radhika Mamidi

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Towards a Task-Agnostic Architecture for Natural Language Processing using Attention Networks

Abstract

Natural Language Processing (NLP) has witnessed transformative progress with the advent of architectures based on transformers and their attention mechanisms. This thesis investigates the versatility and adaptability of attention networks across three increasingly complex tasks: fine-grained sentiment analysis, multilingual pronominal coreference resolution, and multimodal user-defined keyword spotting. By progressively applying attention architectures from single-modality, fine-grained classification to multilingual and multimodal problems, we establish a unified framework for sequence understanding and cross-domain transfer. We begin by exploring fine-grained sentiment analysis as a test bed to evaluate the ability of attention networks to model subtle emotional distinctions across multiple sentiment classes. Unlike coarse sentiment classification tasks, fine-grained sentiment analysis requires capturing nuanced affective expressions and their contextual dependencies. Using the English subset of “A Multilingual Dataset for Sentiment Analysis and Emotion Detection (XED)”, we train an attention-based architecture. Our model demonstrates a strong capacity to distinguish between emotions such as joy, anger, fear, and surprise, showing the efficacy of attention mechanisms in handling fine-grained multi-class classification. This phase serves as a foundational study of how attention models perform when semantic granularity is essential. Building on this, we investigate pronominal coreference resolution in a multilingual setting, aiming to determine whether attention networks can generalize across typologically diverse languages. Coreference resolution involves identifying which entities pronouns and noun phrases refer to within a discourse. We design an attention-based model capable of capturing long-range dependencies between mentions and their potential antecedents. Evaluated in a zero-shot transfer setup using our own multilingual coreference dataset, Multilingual GAP (mGAP), the model shows robust performance in resolving coreferences across multiple low-resource and morphologically rich languages, without requiring language-specific tuning. This demonstrates the potential of attention architectures in bridging linguistic variation and addressing the challenges of multilingual understanding. Finally, we extend and adapt the attention framework to the multi-modal task of user-defined keyword spotting, where the goal is to detect arbitrary text queries in continuous speech. Traditional keyword spotting systems rely on fixed vocabularies and specialized architectures, limiting their applicability. Our model uses a similar attention mechanism to align spoken utterances with textual queries in a flexible, vocabulary-agnostic manner. We evaluate our approach on the Google Speech Commands dataset and the Qualcomm Keyword Spotting dataset, introducing architectural optimizations to reduce inference latency while maintaining high retrieval accuracy, making the approach practical for real-time applications in resource-constrained environments. Through these three tasks, we highlight how attention networks can be systematically adapted for varied linguistic and multimodal challenges. Our findings emphasize the trade-offs between accuracy, generalizability, and computational efficiency, while underscoring the potential of cross-attention mechanisms to serve as a common foundation across diverse NLP tasks. This work contributes to the broader understanding of attention-based modeling and provides practical insights for designing adaptable neural architectures for real-world language technologies.

A Context-Based Quantitative Assessment of the Quality of Bias Benchmarks for Language Models

Author(s): Priyanshul Govil
Advisor(s): Ponnurangam Kumaraguru

Masters

June '25
Report no: IIIT/TH//
Center of C2S2

Abs PDF

A Context-Based Quantitative Assessment of the Quality of Bias Benchmarks for Language Models

Abstract

Large Language Models (LLMs) often inherit biases from the web data they are trained on, which contains stereotypes and prejudices. These biases emerge due to the uncurated nature of web-scale datasets, reflecting societal stereotypes, historical imbalances, and implicit prejudices. As a result, LLMs risk perpetuating harmful biases in their outputs, leading to fairness concerns and ethical implications in real-world applications. Current research has sought to mitigate these biases by developing debiasing techniques and evaluation methods. However, the effectiveness of these debiasing approaches hinges on the quality of bias-benchmark datasets, which are used to measure and validate improvements. These benchmarks typically assess bias by observing an LLM’s behavior on a set of biased statements. However, these statements often lack contextual considerations, treating bias as a static attribute rather than a phenomenon that varies based on situational context. To address this limitation, we introduce a contextual reliability framework, which evaluates model robustness to biased statements by considering the various contexts in which they may appear. We argue that assessing bias in isolation—without incorporating situational factors—can lead to unreliable conclusions, as the same statement may be perceived differently in different contexts. To quantify this, we develop the Context-Oriented Bias Indicator and Assessment Score (COBIAS), a novel metric that measures the contextual reliability of biased statements by evaluating the variance in model behavior when context is added. Existing bias benchmarks suffer from several limitations. Since bias is inherently subjective, current datasets struggle to capture its full complexity. Statements in these benchmarks often lack sufficient context, leading to ambiguous or misleading assessments of model bias. Furthermore, there exists no systematic way to quantitatively evaluate these datasets. Our work aims to bridge this gap by proposing a structured approach to assess the contextual reliability of bias benchmarks, ensuring that they provide meaningful and interpretable measures of bias. By evaluating whether LLMs demonstrate consistent behavior across various situational contexts, our framework helps identify benchmark datasets that may be unreliable for assessing model bias. To evaluate our proposed metric, we augmented 2,291 stereotyped statements from two widely used bias-benchmark datasets (CrowS-Pairs and StereoSet) by adding contextual information. This augmentation process allowed us to systematically analyze how model responses change when relevant context is introduced. Our findings show that COBIAS aligns strongly with human judgment on the contextual reliability of biased statements, achieving a Spearman correlation of ρ = 0.65 (p = 3.4 × 10−60). This demonstrates that our metric can serve as a robust tool for refining bias benchmarks and improving the evaluation of debiasing techniques. By incorporating context-awareness into bias assessment, our work provides a step toward more reliable and interpretable bias mitigation strategies in LLMs. Our data and code are publicly available: https://github.com/priyanshul-govil/COBIAS

Toward Multi Attribute Controllable Summarization

Author(s): Tathagato Roy
Advisor(s): Rahul Mishra

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Toward Multi Attribute Controllable Summarization

Abstract

Text summarization is a core NLP task, but traditional methods often fail to meet diverse user needs. This has led to a growing focus on controllable summarization (CTS), which allows users to guide the summarization process according to specific requirements. Despite this interest, a comprehensive survey detailing the various controllable attributes, their associated challenges, and existing solutions has been missing. This thesis addresses this gap by formally defining the CTS task and categorizing controllable attributes based on their characteristics and objectives. We provide an in-depth review of current datasets and methods within each category, identifying key limitations, and outlining future research directions in the field. Building on this foundational understanding of CTS, we then explore the less-developed area of Multi- Attribute Controllable Summarization (MACS). This thesis specifically investigates the MACS task using large language models (LLMs) and various learning paradigms, with a particular emphasis on low-rank adapters. We conducted experiments with different fine-tuning strategies to assess how effectively models can retain patterns linked to multiple controllable attributes. Furthermore, we propose a novel hierarchical adapter fusion technique designed to integrate knowledge from two distinct controllable attributes. Our findings, the challenges encountered, and suggestions for future advancements for MACS are presented.

Corrective Unlearning For Graph Neural Networks

Author(s): Varshita Kolipaka
Advisor(s): Ponnurangam Kumaraguru

Masters

June '25
Report no: IIIT/TH//
Center of C2S2

Abs PDF

Corrective Unlearning For Graph Neural Networks

Abstract

Graph Neural Networks (GNNs) are increasingly being used for a variety of ML applications on graph data. Because graph data does not follow the independently and identically distributed (i.i.d.) assumption, adversarial manipulations or incorrect data can propagate to other data points through message passing, which deteriorates the model’s performance. To allow model developers to remove the adverse effects of manipulated entities from a trained GNN, we study the recently formulated problem of Corrective Unlearning. We find that current graph unlearning methods fail to unlearn the effect of manipulations even when the whole manipulated set is known. We introduce a new graph unlearning method, Cognac, which can unlearn the effect of the manipulation set even when only 5% of it is identified. It recovers most of the performance of a strong oracle with fully corrected training data, even beating retraining from scratch without the deletion set, and is 8x more efficient while also scaling to large datasets. We hope our work assists GNN developers in mitigating harmful effects caused by issues in real-world data, post-training. Beyond introducing a novel method, this work advances scientific methodology in GNN unlearning. We first use adversarial evaluation for graph unlearning methods beyond privacy applications, showing that metrics must genuinely reflect unlearning efficacy in corrective settings. Our extensive baselining includes methods from other domains like image unlearning for the first time in GNN unlearning, revealing that non-graph-specific approaches can surprisingly outperform graph-specific SOTA. Furthermore, our rigorous ablations challenge prevalent assumptions in GNN unlearning literature; for example, we show that common practices like node unlinking are not universally beneficial.

Design and Fabrication of Metal Oxide Nanoparticle-Based Sensors for Gas Sensing and Disease Diagnosis Applications

Author(s): Kosuri Vikranth Varma
Advisor(s): Anshu Sarje

Masters

June '25
Report no: IIIT/TH//
Center of CVEST

Abs PDF

Design and Fabrication of Metal Oxide Nanoparticle-Based Sensors for Gas Sensing and Disease Diagnosis Applications

Abstract

This thesis investigates the design, fabrication, and characterization of zinc oxide (ZnO)-based nanoparticle sensors, focusing on applications in gas sensing and microorganism detection. The research addresses critical challenges in environmental monitoring and medical diagnostics by introducing innovative, cost-effective, and portable sensor solutions. A microheater design optimized for precise thermal control was developed using joule and induction heating techniques, enabling reliable operation in microfluidic and biochemical systems. ZnO nanostructures, particularly nanorods, were explored for their unique properties, including high surface-to-volume ratios, biocompatibility, and thermal stability. A ZnO nanorod-based biosensor integrated with an electrochemical impedance spectroscopy (EIS) circuit demonstrated high sensitivity in detecting yeast concentrations, showcasing potential applications in biomedical diagnostics. To further enhance biosensing performance, functionalization of ZnO nanorods with Concanavalin A (ConA) protein was implemented, resulting in improved selectivity and sensitivity. Additionally, the development of flexible ZnO-based CO2 gas sensors on PDMS substrates highlighted their effectiveness in real-time gas detection for industrial and environmental safety. The integration of microheaters with gas sensors significantly improved their sensitivity and response by optimizing temperature conditions. Microheaters were utilized to enhance the chemical reactivity of ZnO nanorods, ensuring faster and more accurate gas detection, especially for CO2. The precise thermal control offered by the microheaters allowed sensors to operate efficiently in varying environmental conditions, making them highly suitable for both wearable applications and harsh industrial environments. The results underscore the scalability and efficacy of ZnO-based sensors, balancing affordability with performance. The proposed systems address limitations in existing technologies, such as limited portability and low specificity, and emphasize their suitability for IoT-enabled real-time monitoring. Future research can expand on these findings by exploring additional gas and pathogen detection capabilities, integrating advanced machine learning for data analytics, and enhancing the durability and energy efficiency of flexible and wearable sensors. These advancements will contribute to smarter, more sustainable monitoring systems across diverse domains.

A Level Playing Field? Comparative Analysis of Political and SocialLandscapes in Highland and Lowland India

Author(s): Devesh Marwah
Advisor(s): Aniket Alam

Masters

June '25
Report no: IIIT/TH//
Center of HSRC

Abs PDF

A Level Playing Field? Comparative Analysis of Political and SocialLandscapes in Highland and Lowland India

Abstract

This thesis investigates the influence of geography on politics and social structures in India. This study specifically views geography as a driving force that actively shapes how politics evolve by particularly examining deviations from Duverger’s Law. According to Duverger’s Law, single-member plurality electoral systems typically result in two-party dominance. However, India is a notable exception to this law. To test the hypothesis we perform a quantitative analysis on electoral data comparing India’s Himalayan states to those in the Gangetic and Brahmaputra plains. The electoral data is operationalized using the Effective Number of Parties (ENP) which measures the degree of political fragmentation across parliamentary and assembly constituencies in both mountain and plain states. The results indicate that the plains are diverging from while the mountain states are converging towards Duverger’s two- party equilibrium. Complementing the electoral data, this study also explores the gender dynamics of mountain and plain societies by using data from the National Family Health Survey and Census data over various parameters – namely, literacy rates, child marriage prevalence, contraceptive use, and breastfeeding practices. The parameters indicate the personal liberties of women and their agency in making decisions that affect their personal and family lives. Upon aggregating these parameters into a composite ranking system, we find that women in mountain states generally have more autonomy than those in plain states. To explain this, this thesis draws on anthropological and historical scholarship which postulates that mountain societies are structurally different from plain societies with regard to political and gender dynamics. Revisiting the theories of identity politics and strategic voting suggests that strategic voting alone cannot account for the observed deviations from Duverger’s Law. This study engages with the idea of Zomia, first presented by Schendel and elaborated by Scott, which hypothesizes that highland communities across Asia are structurally different from plain communities, and have historically resisted state incorporation; developing more egalitarian and decentralized social systems. The results observed in our quantitative analyses are consistent with Scott’s hypothesis which suggests that geographic differences might be a reason for the difference. The study also includes detailed case studies of Himachal Pradesh and Manipur to illustrate how these dynamics manifest within individual states. By integrating electoral theory, sociological data, and regional case studies, this thesis offers geographical differences as a potential parameter to the understanding of Indian democracy.

Formal Languages for Mechanistic Interpretability

Author(s): Abhinav S Menon
Advisor(s): Manish Shrivastava

Masters

June '25
Report no: IIIT/TH//
Center of LTRC

Abs PDF

Formal Languages for Mechanistic Interpretability

Abstract

Neural models have seen exponential changes, both in terms of scale and deployment, in the years since transformers and large language models were developed. The scale of these models and of their training data have enabled them to reach near- (and in some cases super-) human performances in several tasks. However, this raises concerns of value misalignment and potential misbehaviour of these models in high-stakes situations. This creates the need for a more fine-grained, general, and mathematical understanding of the functioning of these models, with the objective of being able to reliably and generally predict and control their behaviour. This is the central effort of interpretability, a field of study aiming to reduce the heavily overparameterized functions implemented by neural nets to simple, sparse, and abstract causal models. However, the relative immaturity of the discipline has meant that the rigour of paradigms, techniques, and experiments has not seen consensus. In this thesis, we present a proof of concept that analogy with the natural sciences can form a valuable foundation for achieving the long-term aims of interpretability; in particular, we leverage the reductionist approach to understanding complex systems, and apply it to the study of deep models. We restrict our scope to models that operate on natural language – or, more generally, text – rather than other modalities like images, audio, or time series. We take inspiration, therefore, from computational linguistics, which in its incipient phases relied on a remarkably expressive reduction of natural language – formal grammars. We exploit this concept to idealize the conditions under which we examine neural language models, and present a study that operationalizes this intuition. Concretely, we examine the recently popular sparse autoencoder (SAE) method for interpretability. This method centres on using two-layer MLPs with a sparse, overcomplete hidden representation, trained to encode a latent space of a large model, in the hopes that meaningful semantic decompositions of this space arises. We use language models trained on formal grammars, attempt to uncover relevant features using this approach, and try to find properties of the approach that are significant to its usability. Our findings align for the most part with existing conclusions on the properties of SAEs (although these were based mostly on experiments in the image domain) such as their sensitivity to inductive biases and lack of robustness. Most significantly, we note that the features identified by SAEs are rarely causally relevant – ablating them fails to produce the expected effects most of the time. As causality has emerged as a widely agreed upon sine qua non among interpretability researchers, this is a major deficiency of the method. We propose, accordingly, a modification of the pipeline that aims to incentivize the causality of identified features, and demonstrate its efficacy in the same setting of formal grammars. Overall, we believe that our results demonstrate the potential of importing scientific modi operandi into interpretability, and more specifically, the capacity of reductionism to provide useful insights into the functioning of deep models.

Time Efficient, Space Efficient, and Fault Tolerant Social Network Algorithms for Static and Dynamic Graphs

Author(s): Subhajit Sahu
Advisor(s): Kishore Kothapalli

PhD

June '25
Report no: IIIT/TH//
Center of CSTAR

Abs PDF

Time Efficient, Space Efficient, and Fault Tolerant Social Network Algorithms for Static and Dynamic Graphs

Abstract

The explosive growth of interconnected data has elevated the role of graph analytics in domains ranging from social media and e-commerce to transportation and biological systems. However, the massive scale and dynamic nature of real-world graphs pose substantial challenges to traditional graph processing methods, which are often sequential, memory-intensive, and ill-suited for rapid updates. This thesis addresses these limitations by developing high-performance, memory-efficient, and fault-tolerant algorithms for analyzing both static and dynamic graphs, with a focus on community detection, link prediction, and PageRank computation. We first introduce GVE-Louvain and GVE-Leiden — parallel, shared-memory implementations that significantly accelerate community detection by optimizing both the local-moving and aggregation phases. These algorithms employ techniques such as preallocated CSR structures, per-thread hash tables, dynamic OpenMP scheduling, and a refinement step for Leiden. On a 3.8B-edge graph, GVE-Louvain and GVE-Leiden achieve processing rates of 560M and 403M edges/s, respectively, offering up to 50× mean speedup over existing methods while maintaining or improving modularity. To address memory constraints, we propose weighted-sketch-based variants of Louvain, Leiden, and LPA that replace per-thread hash tables with Misra-Gries and Boyer-Moore sketches. These methods maintain over 99% of the community quality while reducing memory usage to a few kilobytes per thread and incurring only modest runtime overhead. For link prediction, we introduce DLH (Disregard Large Hubs), a parallel algorithm that restricts similarity computations to 2-hop neighborhoods and skips high-degree hubs to improve both efficiency and accuracy. DLH achieves up to 1622× mean speedup over baseline methods and reaches processing rates of 38.1M edges/s on billion-scale graphs. In the dynamic setting, we develop asynchronous PageRank update strategies (DF and DF-P) that selectively recompute ranks based on local changes, as well as DFLF — a fault-tolerant, lock-free parallel implementation. These methods deliver up to 26× mean speedup over static recomputation and maintain high accuracy and scalability under thread failures. Finally, we extend the Dynamic Frontier approach to community detection on dynamic graphs. This technique identifies minimal affected regions using efficient heuristics and supports integration with parallel Louvain, LPA, and hybrid algorithms. Our methods consistently outperform current dynamic algorithms in speed and community quality on large-scale benchmarks. Together, these contributions represent a comprehensive suite of scalable, practical solutions for processing massive, evolving graphs using multicore architectures.

Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation

Author(s): Girmaji Rohit
Advisor(s): Vineet Gandhi

Masters

June '25
Report no: IIIT/TH//
Center of CVIT

Abs PDF

Cinematic Video Editing: Integrating Audio-Visual Perception and Dialogue Interpretation

Abstract

This thesis focuses on advancing automated video editing by analyzing raw, unedited footage to extract essential information such as speaker detection, video saliency, and dialogue interpretation. At the core of this work is EditIQ, an automated video editing pipeline that leverages speaker cues, saliency predictions, and large language model (LLM)-based dialogue understanding to optimize shot selection—the critical step in the editing process. The study begins with a comprehensive assessment of active speaker detection techniques tailored for automated editing. Using the BBC Old School Dataset, annotated with active speaker information, we propose a robust audio-based nearest-neighbor algorithm that integrates facial and audio features. This approach reliably identifies speakers even under challenging conditions such as occlusions and noise, outperforming existing methods and closely aligning with manual annotations. In the domain of video saliency prediction, we present ViNet-S and ViNet-A, compact yet effective models designed to predict saliency maps and identify salient regions in video frames. These models are computationally efficient, balancing high accuracy with reduced model complexity. Starting with a static, wide-angle camera feed, EditIQ generates multiple virtual camera feeds, mimicking a team of cinematographers. Speaker detection, saliency-based scene understanding, and LLMsdriven dialogue analysis guide shot selection, which is formulated as an energy minimization problem. This optimization ensures cinematic coherence, smooth transitions, and narrative clarity in the final output. The efficacy of EditIQ is validated through a psychophysical study involving twenty participants using the BBC Old School dataset. Results demonstrate EditIQ’s ability to produce aesthetically compelling and narratively coherent edits, surpassing competing baselines and showcasing its potential to transform raw footage into polished cinematic narratives.

Theoretical and Empirical Advances in Steering Neural Language Models

Author(s): Shashwat Singh
Advisor(s): Ponnurangam Kumaraguru

Masters

June '25
Report no: IIIT/TH//
Center of C2S2

Abs PDF

Theoretical and Empirical Advances in Steering Neural Language Models

Abstract

Language models often exhibit undesirable behavior, e.g., generating toxic or gender-biased text. In the case of neural language models, an encoding of the undesirable behavior is often present in the model’s representations. Thus, one natural (and common) approach to prevent the model from exhibiting undesirable behavior is to steer the model’s representations in a manner that reduces the probability of it generating undesirable text. In this thesis, we present work that investigates the formal and empirical properties of steering functions, i.e., transformation of the neural language model’s representations that alter its behavior. First, we derive two optimal, in the least-squares sense, affine steering functions under different constraints. Our theory provides justification for existing approaches and offers a novel, improved steering approach. Second, we offer a series of experiments that demonstrate the empirical effectiveness of the methods in mitigating bias and reducing toxic generation.