Research Areas

  • Indian Language Search Engines
  • Indian language stemmers
  • Transliterated Roman to UTF-8 conversion for Indian language content
  • Language Identification on web pages
  • Clustering and Categorization CLIA
  • Cross language document classification
  • Multi language news clustering
  • Language specific and domain specific focused crawling
  • Page life calculation and Scheduling re-crawls


  • Sandhan: Cross Language Indian Search Engine
    Sandhan is a mission mode project being executed by a consortium of academic and research institutions and industry partners, and funded by TDIL, Ministry of Information Technology, Government of India.

    Sandhan (known as CLIA) was started on 29th August, 2006 with the aim of providing a Search Engine where:
    1. A user will be able to give a query in one Indian language and
    2. He/She will be able to access documents available in
    a. the language of the query,
    b. Hindi (if the query language is not Hindi), and
    c. English
    3. Results are presented to the user in the language of the query. The results can also be presented in the language in which the information originally resided. The languages involved will be Bengali, Hindi, Marathi, Punjabi, Tamil and Telugu.

  • WebKhoj - Indian Language Search Engine Technology
    An Indian language web search engine called WebKhoj was developed at SIEL. While general search engines like Google, Yahoo and others can search UTF-8 content, they are unable to search many Indian language sites which are encoded in proprietary encoding. This search engine overcomes this hurdle and also can overcome the agglutinative issues of morphologically rich languages of India.

    This search technology is licensed to a few commercial organizations to power real world search engines.

    Recent Publications

    • A. Mogadala and V. Varma. Finding Influence by Cross-Lingual Blog Mining through Multiple Language Lists, Information Systems for Indian Languages, Communications in Computer and Information Science, Volume 139, Part 1, 54--59. (2011)
    • A. Mogadala, RamBhupal K and V. Varma. Language modeling based retrieval for SMS and FAQ matching, SMS-based FAQ Retrieval Task, Forum for Information Retrieval Evaluation (FIRE), Mumbai, India. (2011)
    • Book Chapter
      V. Varma and A. Mogadala. Issues and Challenges in Building Multilingual Information Access Systems. Emerging Applications of Natural Language Processing: Concepts and New Research. Bandyopadhyay, S., Naskar, S.K., Ekbal, A. eds., IGI Global. (2012)
    • Team

      • Ram Bhupal Reddy (Research Engineer)
      • Aditya Mogadala
      • Nikhil Priyatam
      • Mahathi Bhagavatula
      • Krish Perumal (Project Engineer)