Data Science Under the Hood

Our next seminar

Stay tuned for our next seminar at the end of October 2020!

 

Previous seminars

Human interaction with data – Professor Margot Brereton

Date: 24 September 2020

While effective gathering and analysis of data promises to create better understandings of situations and systems, the human experience of data is often left unexamined. In this talk I will discuss case studies in maternal health, ageing and radiology, in order to reveal human experiences of data, and how we might rethink design of systems in order to support better human experiences.

Professor Margot Brereton is a national and international leader in the collaborative design (co-design) of new humanitarian technologies and their interfaces. She designs to support real user communities in selected challenging contexts, with a particular focus on agency and better futures for older people and people with intellectual disabilities. She also designs technologies to connect people to nature and to support the use of endangered indigenous languages.  Margot and her team’s prototypes are deployed and evolved over significant periods of time (6 months to years) within communities. Margot began her career as an apprentice at Rolls Royce aircraft engines and holds a PhD in Mechanical Engineering Design from Stanford University.


Practice, theory and future – Dr Sander Leemans

Date: 27 August 2020

Any organisation runs business processes, typically supported by information systems. In process mining, we aim to construct algorithms and techniques to analyse historical process behaviour recorded in event logs. Process mining has been applied in countless process optimisation projects, including at QUT, financial institutions, manufacturers, insurers, HR, mining, governments, and many more, to inform decision making. In this talk, we will take a look at the practical applicability of process mining with a few examples, and we take a look into some of the theoretical foundations that process mining techniques such as process model discovery, conformance checking and stochastic process mining are challenged with. Finally, we take a look at the areas of process mining research that researchers will take on in the future.

Born in Boxtel, the Netherlands, Dr Sander Leemans is a lecturer at the Queensland University of Technology, Brisbane, Australia, in the school of Information Systems and an Associate Investigator at the QUT Centre for Data Science. His research interests include process mining, process discovery, conformance checking, stochastic process mining, and robotic process automation. In particular, he specialises in making solid academic techniques available to end-users, analysts and industry partners. He teaches business process management, business process modelling and business process improvement.

Resources:


Using notebook technologies to support data analytics in research and teaching – Dr Andrew Gibson

Date: 30 July 2020

In this talk Dr Andrew Gibson provides a high level view of how he uses notebook technologies in both data analytics teaching and research. Specifically, he shows how he and Catarina Moreira are using a cloud based implementation of JupyterLab for Data Analytics at undergraduate and postgraduate level (and recently online), and how he uses both JupyterLab and Polynote together with AWS in my research. He also details how Polynote enables him to work easily between research and software development tasks that he undertakes when doing Reflective Writing Analytics. He also provide some information on technical infrastructure and workflow, and opportunities for scaling up within the QUT Data Science community.

Resources:


Parameter estimation using only model simulations – Associate Professor Chris Drovandi

Date: 26 June 2020

Complex statistical models are abundant in many areas such as ecology, biology and finance, and often possess several unknown parameters. A key task in statistical inference is to estimate the parameters based on observed data, which is often based on the likelihood function. Once a statistical model is ‘calibrated’, it can be used for hypothesis testing and prediction, and can inform decision making. However, off-the-shelf parameter estimation methods are often not applicable for complex models due to computational intractability of their likelihood functions. Fortunately, despite likelihood intractability, it is often feasible to simulate from complex models for any given parameter value. This talk will take a look under the hood at some parameter estimation methods, such as approximate Bayesian computation and Bayesian synthetic likelihood, that are applicable when only model simulation is feasible, and discuss some challenges.

Resources:


Detecting and analysing coordinated inauthentic behaviour on social media – Dr Timothy Graham

Date: 29 May 2020.

In recent years, the global surge of bots and trolls that manipulate public discussions on social media has created serious problems for political elections, natural disaster communications, and global health crises such as the COVID-19 pandemic. Although there has been progress in the use of supervised machine learning to accurately classify malicious actors such as bots, such tools and methods are not suited to detect coordinated inauthentic behaviour, for example orchestrated disinformation campaigns that involve hundreds or even thousands of coordinated accounts that amplify particular content and narratives. This seminar presents a novel approach to detecting coordinated activity on social media using a pairwise comparison algorithm and network analysis methods. It provides case study examples of application areas and preliminary results, along with challenges and open problems in this new field of research.

Resources:


Factorisation based text mining algorithms and application – Assoc. Professor Richi Nayak

Date: April 2020

The rapid increase in the number of large repositories containing digital documents emphasizes the development of effective methods to manage the embedded content. Factorization methods have been used to analyze and obtain meaningful insights from the text collections. This talk presents how a factorization method can be used to represent the high-dimensional data and obtain a low-order representation. It further provides insights on how the low-order representation can be used in applications such as clustering anomaly detection, community detection, and personalisation.

Resources: