Ecology meets Big Data: Adventures in Extreme Clustering

Speaker: Dr. Alan Woodley, Research Fellow in Data Science, School of Electrical Engineering and Computer Science (EECS), QUT

Title: Ecology meets Big Data: Adventures in Extreme Clustering

Date: 9 May 2016

Abstract: Environmental scientists are increasing taking advantage of much larger and more complex datasets, colloquially known as big data, than they ever have before.  However, just processing these datasets, let alone analysing them, poses computational challenges. Clustering is a technique that can help make large datasets more manageable. However, most clustering techniques are too computationally expensive for datasets that are large (i.e. contain more than 10 million objects) or complex (i.e. that need to be clustered into more than 1,000 groups). At QUT’s school of Electrical Engineering and Computer Science, we have developed a clustering algorithm, called the k-tree, that is scalable and can work with very large datasets. The K-Tree was recently tested with 22 years of Landsat 5 Data from NSW (8TB) and clustered 500 billion objects into 8 billion clusters. We are investigating areas of mutually beneficial collaboration between environmental science and computer science.

Details:

Start Date: 09/05/2016