Automated model selection for structured data (inaugural core project)

Project Overview

This project will focus on two related challenges in non-linear regression.

Automated model selection for structured data

Selecting non-linear model forms for regression is a largely manual process informed by exploratory data analysis and an iterative improvement approach.

The number of different ways that variables can interact in non-linear relationships increases exponentially with the number of variables. Particularly for complex observed systems, where there is limited intuition on the relationships between variables, a manual exploration of the model space is limited and can lead to unsuccessful or sub-optimal model selection.

Mathematical expressions can be represented by tree-like data structures which is convenient for perturbation operators like those used in meta-heuristics (approximate optimisation algorithms). This project proposes to develop techniques for automating and optimising the selection of non-linear regression models for complex data.

Application and selection of meta-models for stochastic optimisation

There are many examples of problems where discrete-event simulation and optimisation can be used cooperatively to determine optimal strategic or operational plans for systems with high complexity and uncertainty. One example could be determining the optimal allocation of infrastructure and resources in a large hospital to optimise patient flow and minimise waiting lists.

Combining simulation and optimisation is computationally challenging, and so the use of meta-models to approximate the simulation model has proven to be an effective technique to reduce the computational burden.

This project aims to apply automated model selection techniques to determine good-fitting mathematical forms for meta models, where previous research has always used a priori defined forms. The project will also aim to improve on existing techniques for incorporating meta-models into optimisation frameworks.


Automating the non-linear model selection process would be a useful contribution to data-science researchers and practitioners, resulting in significant efficiencies. It will possibly also make the use of regression more favourable compared to machine learning approaches, benefitting end-users with the additional insights that come with a regression approach.

The motivation for dealing with stochastic optimisation comes from a recent Food Agility projects involving both the beef and vegetable production industries, along with an ARC Linkage project involving Queensland Health. Stakeholders in these projects have a common the desire to optimise allocation of resources across their value chains, under complex and uncertain operating environments.

Project team

Paul Corry (project leader)