This is an Australian Research Council (ARC) Linkage Project
Project Overview & Aims
In-situ sensors produce high-volume, high-velocity water quality data describing fine-scale patterns, trends and extremes throughout river networks. There is great potential for data from in-situ sensors to improve our understanding and management of water resources, biodiversity, agriculture and industry, but numerous challenges must be addressed before these benefits can be achieved. For example, the
- data are often prone to errors caused by miscalibration, biofouling, battery and technical failures;
- technical anomalies and the ability to detect them can differ according to the geographic characteristics of the environmental system and spatial placement of the sensors;
- sheer volume and velocity of data mean we can no longer continue the current practice of manual anomaly identification; and
- locations within river networks have unique spatial relationships (e.g. branching network structure, within-network connectivity, flow direction, and flow volume) that must be accounted for in statistical models.
Our aim is to develop novel statistical methods to detect technical anomalies in high-frequency in-situ sensor data collected on branching river networks using computationally efficient spatio-temporal models; with the applied goals of automating anomaly detection in water-quality data and generating predictions of sediment and nutrient concentrations throughout river networks in near-real time.
- Activity 1: Investigate relationships between water-quality parameters collected using in-situ sensor data from different river networks, and use those relationships to distinguish between technical anomalies and real water quality events;
- Activity 2: Develop new methods for detecting technical anomalies (e.g. miscalibration, biofouling, battery and technical failures) in near real-time in-situ sensor data;
- Activity 3: Develop space-time models based on in-situ sensor data that can be used to predict at unsampled locations and/or times;
- Activity 4: Develop adaptive sampling designs for river networks to optimise the deployment of in-situ sensors; and
- Activity 5: Build skills and capacity in partner organisations through real-time monitoring workflows and open source tools, ensuring that new methods lead to useful industry outcomes.
This interdisciplinary research addresses a significant research gap relevant to our partner organisations, and any organisation transitioning to environmental monitoring using spatially distributed, low-cost in-situ sensors.
- Better adaptive management of natural resources
- Clean trustworthy and comprehensive data from in-situ sensors in near real time
- Improved scientific understanding of assisted pollutant source identification, quantitative real-time feedback for landholders
- Optimal placement of low-and high-cost sensors that trade off costs and information gained
- New research that enhances national and international water resource management
- Professor Kerrie Mengersen (Project Leader) Queensland University of Technology
- Katie Buchhorn, Queensland University of Technology
- Dr Puwasala Gamakumara, Monash University
- Professor Rob Hyndman, Monash University
- Professor Jay Jones, University of Alaska
- Dr Claire Kermorvant, University of Pau
- Dr Catherine Leigh, RMIT University
- Professor Benoit Liquet-Weiland, Macquarie University
- Professor James McGree, Queensland University of Technology
- Dr Catherine Neelamraju, Queensland Department of Environment and Science
- Dr Erin Peterson, EP Consulting and Queensland University of Technology
- Dr Emily Saeck, Healthy Land and Water
- Dr Edgar Santos-Fernandez, Queensland University of Technology
- Dr Priyanga Dilini Talagala, University of Moratuwa
- Dr Ryan Turner, University of Queensland
- Valentina di Marco, Monash University
- Rachel White, RMIT University
- Dr Dan Isaak, Rocky Mountain Research Station, US Forest Service
- Dr Guy Litt, US National Ecological Observatory Network (NEON)
- Dr Jay Ver Hoef, Marine Mammal Laboratory, US NOAA-NMFS Alaska Fisheries Science Center
Kermorvant, C., Liquet, B., Litt, G., Jones, J.B., Mengersen, K., Peterson, E.E., Hyndman, R.J. and Leigh, C., 2021. Reconstructing missing and anomalous data collected from high-frequency in-situ sensors in fresh waters. International Journal of Environmental Research and Public Health, 18(23), p.12803. https://www.mdpi.com/1660-4601/18/23/12803
Kermorvant C., Liquet B., Litt G., Mengersen K., Peterson E.E., Hyndman R., Jones Jr. J.B., and Leigh C. (In Review) Understanding links between water-quality variables and nitrate concentration in freshwater streams using high-frequency sensor data. https://arxiv.org/abs/2106.01719
Leigh, C., Alsibai, O., Hyndman, R.J., Kandanaarachchi, S., King, O.C., McGree, J.M., Neelamraju, C., Strauss, J., Talagala, P.D., Turner, R.D., Mengersen, K. and Peterson, E.E., 2019. A framework for automated anomaly detection in high frequency water-quality data from in situ sensors. Science of The Total Environment, 664, pp.885-898. https://doi.org/10.1016/j.scitotenv.2019.02.085
Leigh, C., Kandanaarachchi, S., McGree, J.M., Hyndman, R.J., Alsibai, O., Mengersen, K. and Peterson, E.E., 2019. Predicting sediment and nutrient concentrations from high-frequency water-quality data. PloS One, 14(8), p.e0215503. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0215503
Pearse, A.R., McGree, J.M., Som, N.A., Leigh, C., Maxwell, P., Ver Hoef, J.M. and Peterson, E.E., 2020. SSNdesign—An R package for pseudo-Bayesian optimal and adaptive sampling designs on stream networks. PloS One, 15(9), p.e0238422. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0238422
Rodriguez-Perez, J., Leigh, C., Liquet, B., Kermorvant, C., Peterson, E., Sous, D. and Mengersen, K., 2020. Detecting technical anomalies in high-frequency water-quality data using artificial neural networks. Environmental Science & Technology, 54(21), pp.13719-13730. https://pubs.acs.org/doi/abs/10.1021/acs.est.0c04069
Santos-Fernandez, E., Ver Hoef, J.M., Peterson, E.E., McGree J., Isaak, D.J. and Mengersen, K., 2022. Bayesian spatio-temporal models for stream networks, Computational Statistics & Data Analysis. Volume 170, 107446. https://www.sciencedirect.com/science/article/pii/S0167947322000263
Santos-Fernandez, E., Ver Hoef, J.M., McGree, J.M., Isaak, D.J., Mengersen, K. and Peterson, E.E., 2022. SSNbayes: An R package for Bayesian spatio-temporal modelling on stream networks. https://arxiv.org/abs/2202.07166 (under review).
Talagala, P.D., Hyndman, R.J., Leigh, C., Mengersen, K. and Smith‐Miles, K., 2019. A feature‐based procedure for detecting technical outliers in water‐quality data from in situ sensors. Water Resources Research, 55(11), pp.8547-8568. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019WR024906
Buchhorn, K., Mengersen, K., Santos-Fernandez, E., Peterson, E.E., and McGree, J.M., In Review. Bayesian design with sampling windows for complex spatial processes. https://arxiv.org/abs/2206.05369
Open Source Software, training materials and example datasets
|oddwater – R package: Feature-based outlier detection in data from water-quality sensors described in Talagala et al. (2019)|
|conduits – R package: Conditional normalisation of time series data and graphical tools for visualisation. conduits can also be used to estimate the time delay between two sensor locations in rivers. For an overview, watch this demo of conduits and follow along with the R code.|
|SSNbayes – R package: Bayesian spatio-temporal models for data collected on stream networks, as described in Santos-Fernandez et al. 2022 and Santos-Fernandez et al. (In Review).||
|SSNdatasets – R package: containing datasets to recreate the examples in the SSNBayes vignette using the SSNBayes package.|
Project Stakeholder Workshop – December 2021: Playlist of videos from December 2021 Project Workshop
- Overview of ARC Linkage Project: Revolutionising water quality monitoring in the information age: Erin Peterson gives a brief overview of the ARC Linkage Project and what we aim to achieve.
- Great Barrier Reef Loads Monitoring Program – Needs and Challenges: Ryan Turner from the Queensland Department of Environment and Science, Water Quality and Investigations team provides an overview of their monitoring program and describes some of the needs and challenges they face when it comes to in-situ water-quality sensing.
- An Anomaly Detection Framework: Catherine Leigh describes a general anomaly detection framework used to prioritize different anomaly types, with a case study highlighting how it can be implemented with real data from Far North Queensland, Australia.
- Developing an anomaly reporting framework –improving shared interpretation of anomalous data: Rachel White provides an overview of current anomaly identification, labelling and reporting practices used by national and international organisations collecting in-situ water quality data in rivers.
- Technical Anomalies in Water-Quality Data from In-Situ Sensors: What, Why, and How?: Priyanga Dilini Talagala explains the difference between a technical anomaly and a statistical anomaly and how conditional correlation and cross-correlation can be used to distinguish between these two cases in in-situ water quality data using the R packages oddwater and conduits packages.
- Detecting anomalies in water-quality variables utilising the temporal correlation: Puwasala Gamakumara provides an overview of a novel anomaly detection approach implemented in the R package conduits. The generalised additive model uses conditional cross-correlation between data from pairs of sensors to estimate lag times and can accommodate covariates, which improves the performance of the algorithm.
- Spatio-temporal modelling in stream networks and anomaly detection: Edgar Santos-Fernandez describes a new suite of Bayesian space-time models for stream networks implemented in the SSNbayes package for R. In the second half of the talk, he compares the ability of more traditional time series models and space-time models to detect anomalies in data from multiple in-situ sensors distributed across a branching river network.
- The Ecosystem Health Monitoring program (EHMP) – Current spatial-temporal design challenges & the barriers to resolving them: Emily Saeck from Healthy Land & Water provides an overview of the long-term EHMP monitoring program for Southeast Queensland, with a particular emphasis on survey design challenges associated with evolving stakeholder needs and potential benefits that data from new technologies and novel models could provide.
- Bayesian Design on River Networks: Katie Buchhorn describes her PhD research focussing on robust Bayesian survey design and sampling windows for river networks, demonstrating the utility and flexibility of the approach using a case study from the Clearwater River Basin, USA.
- A framework to infill missing data from freshwater high-frequency sensor data: Removing anomalous data creates missing values in in-situ sensor data. Benoit Weiland-Liquet demonstrates how time series models that include other water quality variables as covariates can be implemented in a computationally efficient manner, using freely available software, and used to infill missing water-quality data from in-situ sensors deployed in three diverse river systems within the USA.
- R demonstration: Data wrangling for near-real time in-situ sensor data: Priyanga Dilini Talagala demonstrates a variety of data wrangling techniques used to transform raw in-situ sensor data from the National Ecological Observatory Network (NEON) into more readily usable formats and then visualise outliers in the data.
- R demonstration: Anomaly detection using the conduits package: Puwasala Gamakumara demonstrates how to apply implement anomaly detection algorithms to NEON in-situ data from a single sensor and then pairs of sensors using the conduits package.
- Download the R code: https://github.com/PuwasalaG/ARCLP-workshop/tree/main/code-sharing/src
- R demonstration: Spatio-temporal modelling using the SSNbayes package: Edgar Santos-Fernandez demonstrates how to implement Bayesian spatio-temporal models for river networks using stream temperature data from the Clearwater Basin, USA; including visualisation of space-time data, model fitting, prediction at unobserved locations and times, and the calculation of exceedance probabilities. Access the tutorial in the Kaggle Notebook and download the R code.