Cloud Computing and Distributed Systems

Background
The Search and Information Extraction Lab (SIEL) at LTRC, IIIT Hyderabad is actively involved in research in many areas relevant to Cloud Computing. At SIEL, the need to work in a particular area is driven by the problems we face ourselves. One of the major limitations we faced when we began using Hadoop was resource management, and scheduling. The main motivation behind establishing a research team in cloud computing at SIEL was to enable researchers in the lab in experimenting with very large datasets, which are nowadays becoming a norm in search and information extraction research. To facilitate handling of such large datasets, we began exploring several methods for operating on the data sets using a cluster of machines. Eventually, we chose MapReduce as the preferred model as it suited very well for data intensive applications. We began exploring MapReduce, and its most popular implementation, Apache Hadoop. However, we soon realized that there was huge potential in research in improving the core MapReduce framework in various areas such as fault tolerance, resource management and user accessibility. As a result we established a team that does dedicated research on improving the aforementioned areas in Hadoop and related software.

Research Areas
Currently, we are working in the following areas in cloud computing:

Resource management for MapReduce: Scheduling, task assignment, throttling and federation of resources, power aware resource management
Virtualization and private clouds: Use of virtual machines to better utilize existing hardware, and to provide isolated and guaranteed resources to the researchers.
Easy to use interfaces for accessing large datasets: Innovative methods to handle large datasets such as web-crawl data, XML documents, and various other data formats.