DATA ENGINEERING


Areas of focus

The aim of this centre is to conduct research, transfer cutting-edge technologies and disseminate knowledge in the broad are of data engineering. Data engineering deals with storage, management, retrieval and dissemination of data; knowledge discovery from data; and e-commerce, workflow and e-contract technologies for supporting business processes, and automatic processing of data. A major domain area for addressing these problems is bioinformatics. The term data is also used in its broadest sense including structured relational and object-oriented, and semi structured XML data, text/marked up documents, genome data, contract document, manuals, and scientific literature, spread within an organization or across the world-wide web.

Projects

      The aim is to build a query optimizer based on store query execution plans to optimize complex TPC-R/H queries less than       one second. It uses the notion of join-DAGs and unary operator sprinkling. This tool kit can also be used to teach query       optimization concepts and select materialized views and indices for data warehouses. The current work is on developing       novel index structures, and storage mechanisms for query processing.

      The 'data cube' operator has been added to mySQL DBMS. Notion of qualitative data cube 'Qube' has been developed and       is available with a cube visualizer. The current work is on providing support for materialized views for mySQL DBMS.

      The objective of this project is to provide both query and browsing support for accessing relational database. A prototype       has been built. Current, work is on improving the efficiency of this system and enhancing its functionality to support large       number of joins, and materialized views.

      Indic XML DBMS based on the level of semi-structured-ness of XML documents has been built with support for concurrency       control. Current work is on implementing an Indic XML DBMS for handheld devices, such as, simputers.

      The aim of this project is develop a data mining tool kit supporting the cutting-edge algorithms for association rules,       clustering and classification. Domain specific Indic Data Miners for web mining, micro array data, and banking are being       planned.

      Data clustering is by itself a very important sub-problem in data mining. We have developed a multiple clustering       algorithm, and are currently working of designing algorithms for generating top-k clusters from a given data set.

      Speculative locking takes advantage of cheap additional resources such as, main memory to scale up two phase locking       algorithms. A thorough evaluation using simulation of speculative locking has been done, and is currently being       implemented. Speculative locking for nested and mobile transactions is being researched.

      Discovering new communities in world-wide web is useful for sociological and organization point of view. Dense-bipartite       graphs based algorithms have been designed to efficiently discover communities in world-wide web. Current research       concentrates on evolution of communities.

      India-related, and especially multi-lingual web data is very difficult to come by, to conduct research on organizing, and       disseminating such data. The aim of this project is to collect all India related web pages and make them available to others.       Agricultural Information Dissemination System (P. Krishna Reddy)
      A solution for disseminating appropriate guidance to farmers at right time to help them with expert advice has been      developed. A prototype implementation is going on, and it is expected to be incorporated in the village and mandal level       Internet kiosks.

      Agent technologies provide functional encapsulation with the support for environmental percepts. We have implemented an       agent-based traffic simulation system for Indian traffic. Current work is on distributed and cluster computing based      solutions to speed up the simulation.

      The aim of this project is to use language processing technologies to extract information and logical statements for aiding       bioinformatics researcher in quickly browsing related scientific literature. A prototype system has been built and is being       tested. Current work includes processing experimental and result sections of bioinformatics research papers.

      An EREC data model and a framework for conceptual modeling e-contracts were developed. A methodology to map e-      contracts to workflows has been formulated and tested on couple of case studies. A system for e-contract modeling and      enactment is being built.
[Building Science] [Bioinformatics Research Center] [Communications Research Center] [Data Engineering] [IT for Education] [IT for Indian Society] [Language Technologies] [Open Software] [Power Systems] [Visual IT] [VLSI]