The Scene Understanding and Modeling Challenge

The SUMO challenge encourages the development of algorithms for complete understanding of 3D indoor scenes from 360° RGB-D panoramas with the goal of enabling social AR and VR research and experiences. The target 3D models of indoor scenes include all visible layout elements and objects complete with pose, semantic information, and texture. Algorithms submitted are evaluated at 3 levels of complexity corresponding to 3 tracks of the challenge: oriented 3D bounding boxes, oriented 3D voxel grids, and oriented 3D meshes.

360° RGB-D Input

360° RGB

360° Depth

Complete 3D Scene Output

3D Texture + Pose

3D Semantic + Instance


The SUMO challenge dataset is derived from processing scenes from the SUNCG dataset to produce 360° RGB-D images represented as cubemaps and corresponding 3D mesh models of all visible scene elements. The mesh models are further processed into a bounding box and voxel-based representation. The dataset format is described in detail here.

59 K

Indoor Scenes







1024 X 1024 RGB images

1024 X 1024 Depth Maps

2D Semantic Information

3D Semantic Information

3D Object Pose

3D Element Texture

3D Bounding Boxes Scene Representation

3D Voxel Grid Scene Representation

3D Mesh Scene Representation


The SUMO Challenge is organized into three performance tracks based on the output representation of the scene. A scene is represented as a collection of elements, each of which models one object in the scene (e.g., a wall, the floor, or a chair). An element is represented in one of three increasingly descriptive representations: bounding box, voxel grid, or surface mesh. For each element in the scene, a submission contains the following outputs listed per track.

3D Bounding Box Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

3D Voxel Grid Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Location and RGB Color of Occupied 3D Voxels

3D Mesh Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Element's textured mesh (in .glb format)


Evaluation of a 3D scene focuses on 4 keys aspects: Geometry, Appearance, Semantic and Perceptual - (GASP) Details of the metrics for each track are provided here.


Upload your results to the EvalAI submission page.

Detailed submission instructions
  1. Download the SUMO test data from AWS (
  2. Choose your performance track (bounding boxes, voxels, or meshes) and choose which test set to use for evaluation.
    • Quick test set - 2 scenes - Useful as an initial test for verifying your data format.
    • Dev test set - 360 scenes - Has provided ground truth - Useful for evaluating your algorithm prior to final submission.
    • Contest test set - 360 scenes - No provided ground truth - The data set on which the contest performance will be evaluated. This should be submitted for the final contest submission.
  3. Run your algorithm on the selected test set to generate a project scene for each test scene. Each project scene should be in a separate directory with the scene_id as its directory name. See the SUMO white paper for details. Compress the directory containing the output project scenes into a zip file and upload it to a publicly visible web location.
  4. Create a json submission file with the following format: { "result": "[some-public-url]/[filename].zip" } for example: { "result": "" }
  5. Go to the EvalAI web site.
  6. Log in if you already have an account, or sign up for a new account if you don't.
  7. Click on "All Challenges" in the left menu, and then click on "2018 SUMO Challenge" in the list of ongoing challenges.
  8. Click the "Participate" menu item.
  9. If you do not already have a participation team, create one in the "Create a New Team" dialog box on the right.
  10. Once you have a participation team, it will show up in the list on the left. Select your team from the list by clicking on the circle. Then click "Participate".
  11. Select the evaluation phase that corresponds to the performance track and data set you chose above. In the "Upload File" box, upload the json file you created above. Enter any of the other optional information you would like to include. Press "Submit".
  12. Be patient. Evaluation can take a minute per scene due to the complexity of 3D metrics.
  13. Once the evaluation is complete, the results can be seen on the leaderboard page of the SUMO Challenge on EvalAI. Note that you must select the appropriate challenge phase to see the results.


Winners of the 2018 pilot edition of the SUMO Challenge will be announced shortly after the challenge deadline, which is January 14th. See the official SUMO Challenge Contest Rules.

3D Mesh Track - 1st Prize

$2500 cash prize

Titan X GPU

Oral Presentation

3D Voxel Track - 2nd Prize

$2000 cash prize

Titan X GPU

Oral Presentation

3D Bounding Box Track - 3rd Prize

$1500 cash prize

Titan X GPU

Oral Presentation


SUMO Announcement at CVPR18

June 22nd, 2018

SUMO Challenge Launch

August 29th, 2018

SUMO Workshop - Perth, Australia

December 3rd, 2018

Challenge Submission Deadline

January 14th, 2019


Daniel Huber


Lyne Tchapmi

Stanford University

Frank Dellaert


Advisory Board

Ilke Demir


T. Funkhouser

Princeton University

Leo. Guibas

Stanford University

Jitendra Malik

UC Berkeley

Silvio Savarese

Stanford University

Facebook Team

Challenge Advisors

Bahram Dahi

Jay Huang

Nandita Nayak

John Princen

Ruben Sethi

Iro Armeni

Angel Chang

Kevin Chen

Christopher Choy

JunYoung Gwak

Manolis Savva

Alexander(Sasha) Sax

Richard Skarbez

Shuran Song

Amir R. Zamir