Contextually Informed Joint Perception and Localization for Autonomous Vehicles

Positioning is a critical competency for multiple autonomous vehicle competencies including prior map-based techniques. Current approaches are dominated by range-based systems, utilizing expensive and active sensor types such as LIDAR. Camera-based positioning approaches offer the potential to utilize cheaper, already installed car camera systems, but must be robust to the wide range of appearance change experienced in the environment and provide accurate metric pose estimation. This project will develop new visual positioning techniques incorporating machine learning and semantic understanding of the environment and conditions, and benchmark their performance against existing range-based techniques under real-world conditions.

This 200,000 USD project is running from 2021 to 2023 and involves a joint collaboration between QUT and Ford Motor Corporation.


Ming Xu, Sourav Garg, Michael Milford, Punarjay Chakravarty, Shubham Shrivastava, “Vehicle localization”,

Research Publications

S Hausler, M Xu, S Garg, P Chakravarty, S Shrivastava, A Vora, M Milford, “Improving worst case visual localization coverage via place-specific sub-selection in multi-camera systems”, in IEEE Robotics and Automation Letters, 7 4, 10112-10119 2022

Abstract: 6-DoF visual localization systems utilize principled approaches rooted in 3D geometry to perform accurate camera pose estimation of images to a map. Current techniques use hierarchical pipelines and learned 2D feature extractors to improve scalability and increase performance. However, despite gains in typical recall@0.25m type metrics, these systems still have limited utility for real-world applications like autonomous vehicles because of their `worst’ areas of performance – the locations where they provide insufficient recall at a certain required error tolerance. Here we investigate the utility of using `place specific configurations’, where a map is segmented into a number of places, each with its own configuration for modulating the pose estimation step, in this case selecting a camera within a multi-camera system. On the Ford AV benchmark dataset, we demonstrate substantially improved worst-case localization performance compared to using off-the-shelf pipelines – minimizing the percentage of the dataset which has low recall at a certain error tolerance, as well as improved overall localization performance. Our proposed approach is particularly applicable to the crowdsharing model of autonomous vehicle deployment, where a fleet of AVs are regularly traversing a known route.

[IEEE Link] [ArXiV Link]

S Hausler, S Garg, P Chakravarty, S Shrivastava, A Vora, M Milford, “DisPlacing Objects: Improving Dynamic Vehicle Detection via Visual Place Recognition under Adverse Conditions”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023.

Abstract: Can knowing where you are assist in perceiving objects in your surroundings, especially under adverse weather and lighting conditions? In this work we investigate whether a prior map can be leveraged to aid in the detection of dynamic objects in a scene without the need for a 3D map or pixel-level map-query correspondences. We contribute an algorithm which refines an initial set of candidate object detections and produces a refined subset of highly accurate detections using a prior map. We begin by using visual place recognition (VPR) to retrieve a reference map image for a given query image, then use a binary classification neural network that compares the query and mapping image regions to validate the query detection. Once our classification network is trained, on approximately 1000 query-map image pairs, it is able to improve the performance of vehicle detection when combined with an existing off-the-shelf vehicle detector. We demonstrate our approach using standard datasets across two cities (Oxford and Zurich) under different settings of train-test separation of map-query traverse pairs. We further emphasize the performance gains of our approach against alternative design choices and show that VPR suffices for the task, eliminating the need for precise ground truth localization.

[IEEE Link] [ArXiV Link]

S Hausler, S Garg, P Chakravarty, S Shrivastava, A Vora, M Milford, “Locking On: Leveraging Dynamic Vehicle-Imposed Motion Constraints to Improve Visual Localization”, in IEEE/RSJ International Conference on Intelligent Robots and Systems, 2023.

Abstract: Most 6-DoF localization and SLAM systems use static landmarks but ignore dynamic objects because they cannot be usefully incorporated into a typical pipeline. Where dynamic objects have been incorporated, typical approaches have attempted relatively sophisticated identification and localization of these objects, limiting their robustness or general utility. In this research, we propose a middle ground, demonstrated in the context of autonomous vehicles, using dynamic vehicles to provide limited pose constraint information in a 6-DoF frame-by-frame PnP-RANSAC localization pipeline. We refine initial pose estimates with a motion model and propose a method for calculating the predicted quality of future pose estimates, triggered based on whether or not the autonomous vehicle’s motion is constrained by the relative frame-to-frame location of dynamic vehicles in the environment. Our approach detects and identifies suitable dynamic vehicles to define these pose constraints to modify a pose filter, resulting in improved recall across a range of localization tolerances from 0.25m to 5m, compared to a state-of-the-art baseline single image PnP method and its vanilla pose filtering. Our constraint detection system is active for approximately 35% of the time on the Ford AV dataset and localization is particularly improved when the constraint detection is active.

[IEEE Link] [ArXiV Link]

Media Coverage

8 August, 2023: Location key to improved autonomous vehicle vision. Also here, here.

Chief Investigators