We are pleased to announce the launch of the second Robotic Vision Scene Understanding Challenge. Powered by our novel BenchBot framework, this challenge provides a platform for researchers to develop robotic vision systems that can understand both the geometric and semantic aspects of their environment. The challenge consists of object-based semantic simultaneous localization and mapping (SLAM) and scene change detection (SCD) tasks. This challenge is an Australian Centre for Robotic Vision (ACRV) challenge created in association with the Queensland University of Technology Centre for Robotics (QCR) with prizes sponsored by both the ACRV and NVIDIA.
In order to operate in household environments, robots should be able to understand what objects are in a building and where they are. This is the task presented in our semantic SLAM task as a robot must explore an environment, find all objects of interest and add them to a 3D map. The SCD task takes this one step further as a robot finds itself in the same environment at a different point in time and must find what objects have been added or removed since it was first explored. Robot agents are not provided a map of the environment in advance but have access to RGB+D, flatscan laser, and some pose data.
These tasks are already challenging research problems but are even harder to evaluate given the need for consistent environments and robot setup. Our BenchBot framework handles this issue by providing a consistent means for operating robot agents between simulated and real-world environments. Our challenge utilises environments simulated within NVIDIA’s Isaac Simulator, providing consistent environments with well-defined ground truth labels for robust testing. High-performing participants may also be invited to test their work directly on one of QUT’s mobile robot platforms without any code changes being required.
As this is a complex new research task, we enable participation at multiple levels of robotic realism and provide prizes to the best participants at every level. The lowest level (PGT) provides passive robot control and ground-truth robot pose, stripping away the exploration and self-localization aspects of the challenge. The second level (AGT) still gives ground-truth pose of the robot but requires active exploration using a simple AI-Gym-style API. The hardest level (ADR) requires active exploration and self-localization as noisy pose data is provided. Participants can compete and win prizes for all challenges. The prize pool is outlined below:
- Scene Change Detection (ADR) – $900 USD, 1 Titan RTX GPU & up to 5 Jetson GPUs
- Semantic SLAM (ADR) – $800 USD, 1 Titan RTX GPU & up to 5 Jetson GPUs
- Semantic SLAM (AGT) – $500 USD
- Semantic SLAM (PGT) – $300 USD
The Robotic Vision Scene Understanding Challenge launches at the 2021 Embodied AI Workshop at the Conference on Computer Vision and Pattern Recognition (CVPR), in coordination with eight other embodied AI challenges supported by 15 academic and research organizations. The joint launch of these challenges offers the embodied AI/robotics research community an unprecedented opportunity to move toward a common framework for the field, converging around a unified set of tasks, simulation platforms, and 3D assets. The organizers will collectively share results across all these challenges at CVPR in June, providing a unique viewpoint on the state of embodied AI research and new directions for the subfield.
Please review the submission guidelines before entering and note that participants must submit their results to EvalAI. The winning team from each track will be invited to nominate a team member to share their work at a CVPR 2021 virtual event, where we will also share the challenge leaderboards.
We look forward to all the incredible new research and ideas that competing teams will produce.
Partners and embodied AI challenges at CVPR 2021:
- iGibson Challenge 2021, hosted by Stanford Vision and Learning Lab and Robotics at Google
- Habitat Challenge 2021, hosted by Facebook AI Research (FAIR) and Georgia Tech
- Navigation and Rearrangement in AI2-THOR, hosted by the Allen Institute for AI
- ALFRED: Interpreting Grounded Instructions for Everyday Tasks, hosted by the University of Washington, Carnegie Mellon University, the Allen Institute for AI, and the University of Southern California
- Room-Across-Room Habitat Challenge (RxR-Habitat), hosted by Oregon State University, Google, and Facebook AI
- SoundSpaces Challenge, hosted by the University of Texas at Austin and the University of Illinois at Urbana-Champaign
- TDW-Transport, hosted by the Massachusetts Institute of Technology
- Robotic Vision Scene Understanding, hosted by the Australian Centre for Robotic Vision in association with the Queensland University of Technology Centre for Robotics
- MultiON: Multi-Object Navigation, hosted by the Indian Institute of Technology Kanpur, the University of Illinois at Urbana-Champaign, and Simon Fraser University
- February 17th – Challenge launch
- May 7th – Results and paper submissions due
- May 19th – Notification to challenge winners
- June 19th-25th – CVPR 2021