Scene Understanding and Semantic SLAM

Making a robot understand what it sees is one a fascinating goal in our current research. We develop novel methods for Semantic Mapping and Semantic SLAM by combining object detection with simultaneous localisation and mapping (SLAM) techniques. Semantic SLAM creates semantically meaningful maps by combining geometric and semantic information. We believe such semantically enriched maps will help robots understand our complex world and will ultimately increase the range and sophistication of interactions that robots can have in domestic and industrial deployment scenarios.

Chief Investigators

Selected Publications

QuadricSLAM: Constrained Dual Quadrics from Object Detections as Landmarks in Object-oriented SLAM

Lachlan Nicholson, Michael Milford, Niko Sünderhauf. IEEE Robotics and Automation Letters (RA-L), 2018.

In this paper, we use 2D object detections from multiple views to simultaneously estimate a 3D quadric surface for each object and localize the camera position. We derive a SLAM formulation that uses dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object, and show how 2D object detections can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for object detectors that addresses the challenge of partially visible objects, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph-based SLAM with a general perspective camera.


QuadricSLAM uses constrained dual quadrics as 3D landmark representations, exploiting their ability to compactly represent the size, position and orientation of an object.

Our paper shows how 2D object detections can directly constrain the quadric parameters via a novel geometric error formulation. We develop a sensor model for object detectors that addresses the challenge of partially visible objects, and demonstrate how to jointly estimate the camera pose and constrained dual quadric parameters in factor graph based SLAM with a general perspective camera.

QuadricSLAM uses objects as landmarks and represents them as constrained dual quadrics in 3D space. QuadricSLAM jointly estimates camera poses and quadric parameters from odometry measurements and object detections, implicitly performing loop closures based on the object observations.

With this research, we make the following contributions:

  • We show how to parametrize object landmarks in factor-graph based SLAM as constrained dual quadrics.
  • We demonstrate that visual object detection systems such as Faster R-CNN, SSD, or Mask R-CNN can be used as sensors in SLAM, and that their observations – the bounding boxes around objects – can directly constrain dual quadric parameters via our novel geometric error formulation.
  • To incorporate quadrics into SLAM, we derive a factor graph-based SLAM formulation that jointly estimates the dual quadric and robot pose parameters.
  • We provide a large-scale evaluation using 250 indoor trajectories through a high-fidelity simulation environment in combination with real world experiments on the TUM RGB-D dataset to show how object detections and dual quadric parametrization aid the SLAM solution.

 

Meaningful Maps With Object-Oriented Semantic Mapping

Niko Sünderhauf, Trung T. Pham Pham, Yasir Latif, Michael Milford. In Proc. of IEEE International Conference on Intelligent Robots and Systems (IROS), 2017.

For intelligent robots to interact in meaningful ways with their environment, they must understand both the geometric and semantic properties of the scene surrounding them. The majority of research to date has addressed these mapping challenges separately, focusing on either geometric or semantic mapping. In this paper we address the problem of building environmental maps that include both semantically meaningful, object-level entities and point- or mesh-based geometrical representations. We simultaneously build geometric point cloud models of previously unseen instances of known object classes and create a map that contains these object models as central entities. Our system leverages sparse, feature-based RGB-D SLAM, image-based deep-learning object detection and 3D unsupervised segmentation.

 

Multi-Modal Trip Hazard Affordance Detection On Construction Sites

Sean McMahon, Niko Sünderhauf, Ben Upcroft, Michael J Milford. IEEE Robotics and Automation Letters (RA-L), 2017.

Trip hazards are a significant contributor to accidents on construction and manufacturing sites. We conduct a comprehensive investigation into the performance characteristics of 11 different colors and depth fusion approaches, including four fusion and one nonfusion approach, using color and two types of depth images. Trained and tested on more than 600 labeled trip hazards over four floors and 2000 m2 in an active construction site, this approach was able to differentiate between identical objects in different physical configurations. Outperforming a color-only detector, our multimodal trip detector fuses color and depth information to achieve a 4% absolute improvement in F1-score. These investigative results and the extensive publicly available dataset move us one step closer to assistive or fully automated safety inspection systems on construction sites.

 

Place Categorization and Semantic Mapping on a Mobile Robot

Niko Sünderhauf, Feras Dayoub, Sean McMahon, Ben Talbot, Ruth Schulz, Peter Corke, Gordon Wyeth, Ben Upcroft, Michael Milford. Proc. of IEEE International Conference on Robotics and Automation (ICRA), 2016.

In this paper we focus on the challenging problem of place categorization and semantic mapping on a robot without environment-specific training. Motivated by their ongoing success in various visual recognition tasks, we build our system upon a state-of-the-art convolutional network. We overcome its closed-set limitations by complementing the network with a series of one-vs-all classifiers that can learn to recognize new semantic classes online. Prior domain knowledge is incorporated by embedding the classification system into a Bayesian filter framework that also ensures temporal coherence. We evaluate the classification accuracy of the system on a robot that maps a variety of places on our campus in real-time. We show how semantic information can boost robotic object detection performance and how the semantic map can be used to modulate the robot’s behaviour during navigation tasks. The system is made available to the community as a ROS module.