DATA ENGINEERING
Areas of focus
- Query Processing, browsing and optimization
- Transaction Management
- Database Design
- XML Data Management
- Data Warehousing
- Data Mining
- E-Contracts, E-Commerece and Workflow Management Systems
- Agent Systems
The aim of this centre is to conduct research, transfer cutting-edge technologies and disseminate knowledge in the broad are of data engineering. Data engineering deals with storage, management, retrieval and dissemination of data; knowledge discovery from data; and e-commerce, workflow and e-contract technologies for supporting business processes, and automatic processing of data. A major domain area for addressing these problems is bioinformatics. The term data is also used in its broadest sense including structured relational and object-oriented, and semi structured XML data, text/marked up documents, genome data, contract document, manuals, and scientific literature, spread within an organization or across the world-wide web.
Projects
- Indic Query Optimizer Tool Kit
The aim is to build a query optimizer based on store query execution plans to optimize complex TPC-R/H queries less than
one second. It uses the notion of join-DAGs and unary operator sprinkling. This tool kit can also be used to teach query
optimization concepts and select materialized views and indices for data warehouses. The current work is on developing
novel index structures, and storage mechanisms for query processing.
The 'data cube' operator has been added to mySQL DBMS. Notion of qualitative data cube 'Qube' has been developed and
is available with a cube visualizer. The current work is on providing support for materialized views for mySQL DBMS.
- Indic Query + Browsing Wizard for MS SQLServer 2000
The objective of this project is to provide both query and browsing support for accessing relational database. A prototype
has been built. Current, work is on improving the efficiency of this system and enhancing its functionality to support large
number of joins, and materialized views.
Indic XML DBMS based on the level of semi-structured-ness of XML documents has been built with support for concurrency
control. Current work is on implementing an Indic XML DBMS for handheld devices, such as, simputers.
The aim of this project is develop a data mining tool kit supporting the cutting-edge algorithms for association rules,
clustering and classification. Domain specific Indic Data Miners for web mining, micro array data, and banking are being
planned.
- Indic Multiclustering Tool Kit
Data clustering is by itself a very important sub-problem in data mining. We have developed a multiple clustering
algorithm, and are currently working of designing algorithms for generating top-k clusters from a given data set.
- Speculative Locking-based Transaction Manager
Speculative locking takes advantage of cheap additional resources such as, main memory to scale up two phase locking
algorithms. A thorough evaluation using simulation of speculative locking has been done, and is currently being
implemented. Speculative locking for nested and mobile transactions is being researched.
- Community Extraction in World-wide Web
Discovering new communities in world-wide web is useful for sociological and organization point of view. Dense-bipartite
graphs based algorithms have been designed to efficiently discover communities in world-wide web. Current research
concentrates on evolution of communities.
- Extraction of India Related Web Community Resources
India-related, and especially multi-lingual web data is very difficult to come by, to conduct research on organizing, and
disseminating such data. The aim of this project is to collect all India related web pages and make them available to others.
Agricultural Information Dissemination System (P. Krishna Reddy)
A solution for disseminating appropriate guidance to farmers at right time to help them with expert advice has been developed. A prototype implementation is going on, and it is expected to be incorporated in the village and mandal level
Internet kiosks.
- Agent based Simulation Tool Kit
Agent technologies provide functional encapsulation with the support for environmental percepts. We have implemented an
agent-based traffic simulation system for Indian traffic. Current work is on distributed and cluster computing based solutions to speed up the simulation.
The aim of this project is to use language processing technologies to extract information and logical statements for aiding
bioinformatics researcher in quickly browsing related scientific literature. A prototype system has been built and is being
tested. Current work includes processing experimental and result sections of bioinformatics research papers.
- E-Contracts enactment using EREC Data Model and Workflows
An EREC data model and a framework for conceptual modeling e-contracts were developed. A methodology to map e-
contracts to workflows has been formulated and tested on couple of case studies. A system for e-contract modeling and
enactment is being built.