Professor Antonietta Mira
-
Università della Svizzera italiana, Switzerland and University of Insubria, Italy
Public Lecture: “Data Science Meets Life Science: Some Success Stories”
How can Data Science help improve our lives?
I will review some personal research success stories that range from analyzing gene expression, protein folding and fMRI data, to the study of optimal positioning of defibrillators, Great Barrier Reef biodiversity and COVID-19 data dynamics. More in details, some genes are differentially expressed in patients when compared to controls. Identifying them is of paramount importance for diagnosis and for identifying possible treatments. The Intrinsic Dimension (ID) is a novel methodology that allows the identification of differentially expressed genes and, at the same time, to quantify the uncertainty of the resulting nonlinear clustering of patients. We have used the same methodology to study when a protein is in a folded versus unfolded state thus allowing to better understand its properties and functionality: failure to fold into a correct three-dimensional structure can produce inactive or even toxic proteins. Functional magnetic resonance imaging (fMRI) measures brain activity by detecting changes associated with blood flow. By analyzing fMRI data using the ID we better understand when an area of the brain is in use because blood flow to that region increases. Using statistical models, we design risk maps of out of hospital cardiac arrests and help with optimal positioning new defibrillators or relocating existing ones. Finally, I will present two research lines joint with QUT Centre for Data Science on studying the evolution of biodiversity of the Great Barrier Reef and on clustering countries based on COVID-19 data dynamics. All these examples share an interdisciplinary flavor and highlight the ability of complex dataset analyzed with the proper statistical tools to improve our understanding of the world, ultimately making our lives better in many ways.
Distinguished Visitor Lecture: “Bayesian Estimation of Data Intrinsic Dimensions”
With the advent of big data, it is increasingly common to deal with cases where data is collected in a high-dimensional space, and little is known a priori about their distribution. Quite often, however, this distribution has support on a subspace (manifold) whose dimension, called the intrinsic dimension (ID) of the data, is much lower than the dimensionality of the data embedding space. Under very weak assumptions on the data generating mechanism, the nearest-neighbor (NN) distances among points follow distributions that depend parametrically on the ID. Facco et al. (Scientific Reports, 2017) leveraged this, developing an ID estimator (TWO-NN) based on the ratio of distances between the first two NN of each data point. This result was then extended to ratios to ratios of distances between NN of generic orders, deriving alternative estimators (GRIDE) more robust to noise in the data (Allegra at al. Scientific Reports, 2020) We also extended TWO-NN to the case where the ID is not constant within the data, i.e., the distribution has support on the union of several manifolds with different IDs. This situation may trivially occur if data sets with heterogeneous IDs are merged, but, as we reveal, it also happens quite naturally in data from diverse disciplines. Within a Bayesian framework, we can robustly estimate the IDs of the manifolds and assign each data point to one of the manifolds. In many real-world datasets, we find widely heterogeneous collections of IDs corresponding to variations in data core properties. This allows us for example to identify different financial risk levels using companies’ balance sheets or key stages of offensive actions using NBA basketball players’ tracking data.
Details:
Location: | QUT Gardens Point, P-419 and Online (zoom) |
Start Date: | 05/06/2023 [add to calendar] |