Wathsala Anupama Mohotti

    BSc(Hons) in Information Technology, University of Moratuwa, Sri Lanka,

    MSc in Information Technology, University of Moratuwa, Sri Lanka.

    Research interests:

    Text Mining, Clustering, Outlier Analysis

    Project details:

    Rapid increase in the amount of large repositories containing digital documents emphasizes on the development of effective methods to analyse the embedded content. The sophisticated analyses are required for users to have insight into document collections which are high in volume as well as in features. Document clustering is a popular method for discovering useful information from large document corpuses such as main themes, topics and interesting outliers. However, the high dimensionality of documents creates challenges related to accuracy, scalability and efficiency in existing document clustering methods. This research introduces novel methods based on density based document clustering to identify pure clusters and interesting anomalies in document collections. Hub and Information Retrieval system ranking concepts will be explored with density based clustering to deal with the challenges of efficiency and accuracy.