The 2019 Scene Understanding and Modeling Challenge

360° RGB-D Input

360° RGB

360° Depth

Comprehensive 3D Scene Output

3D Texture + Pose

3D Semantic + Instance

The SUMO challenge targets the development of algorithms for comprehensive understanding of 3D indoor scenes from 360° RGB-D panoramas. The target 3D models of indoor scenes include all visible layout elements and objects complete with pose, semantic information, and texture. Algorithms submitted are evaluated at 3 levels of complexity corresponding to 3 tracks of the challenge: oriented 3D bounding boxes, oriented 3D voxel grids, and oriented 3D meshes.


The SUMO challenge dataset is derived from processing scenes from the SUNCG dataset to produce 360° RGB-D images represented as cubemaps and corresponding 3D mesh models of all visible scene elements. The mesh models are further processed into a bounding box and voxel-based representation. The dataset format is described in detail here.

59 K

Indoor Scenes







1024 X 1024 RGB images

1024 X 1024 Depth Maps

2D Semantic Information

3D Semantic Information

3D Object Pose

3D Element Texture

3D Bounding Boxes Scene Representation

3D Voxel Grid Scene Representation

3D Mesh Scene Representation


The SUMO Challenge is organized into three performance tracks based on the output representation of the scene. A scene is represented as a collection of elements, each of which models one object in the scene (e.g., a wall, the floor, or a chair). An element is represented in one of three increasingly descriptive representations: bounding box, voxel grid, or surface mesh. For each element in the scene, a submission contains the following outputs listed per track. To get started, download the toolbox and training data from the links above. Visit the SUMO360 API web site for documentation, example code, and additional help.

3D Bounding Box Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

3D Voxel Grid Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Location and RGB Color of Occupied 3D Voxels

3D Mesh Track

3D Bounding Box

3D Object Pose

Semantic Category of Element

Element's textured mesh (in .glb format)


Evaluation of a 3D scene focuses on 4 keys aspects: Geometry, Appearance, Semantic and Perceptual (GASP). Details of the metrics for each track are provided here.


We are currently reconfiguring the EvalAI submission site for the 2019 Challenge. The new site will be available shortly.


Winners of the 2019 SUMO Challenge will be announced at the CVPR SUMO Challenge Workshop, which will be held Jun 16th or 17th. See the official SUMO Challenge Contest Rules.

3D Mesh Track - 1st Prize

$2500 cash prize

Titan X GPU

Oral Presentation

3D Voxel Track - 2nd Prize

$2000 cash prize

Titan X GPU

Oral Presentation

3D Bounding Box Track - 3rd Prize

$1500 cash prize

Titan X GPU

Oral Presentation

2019 SUMO Challenge Workshop

In computer vision, scene understanding and modeling comprise a diverse array of research problems, ranging from low-level geometric modeling (e.g., SLAM algorithms) to image-based object instance classification to 3D room layout estimation. These tasks are often addressed separately, yielding only a partial understanding and representation of the underlying scene. In contrast, comprehensive scene understanding and modeling aims to represent scene geometry, appearance, semantics, and perceptual qualities in an integrated manner.

In parallel, the recent rise in popularity of 360° cameras has encouraged the digitization of the real world into augmented and virtual realities, enabling new applications, such as virtual social interactions and semantically leveraged augmented reality. Creating a complete 3D digital scene from such imagery is a challenging task, in part due to occlusions, varying illumination conditions, and material properties of the objects in the scene. Furthermore, little existing work addresses the challenge presented by 360° input in the 3D modeling process.

The 2019 SUMO Challenge Workshop, held in conjunction with CVPR in Long Beach, California, will bring together computer vision researchers working on 3D scene understanding and modeling for a day of keynote speakers, oral presentations, posters, and panel discussions on the topic. The two primary goals of the workshop are:

  • Encourage the development of comprehensive 3D scene understanding and modeling algorithms that address the aforementioned problems in a single framework.
  • Foster research on the unique challenges of generating comprehensive digital representations from 360° imagery.

The workshop is soliciting papers covering various problems related to 3D scene understanding and modeling from RGB and RGB-D imagery. The topics mainly focus on indoor scene modeling and include, but are not limited to:

  • 360° data processing and scene understanding
  • Object detection
  • Object localization
  • Layout estimation
  • “Stuff” detection and modeling
  • Instance segmentation
  • Object completion and 3D reconstruction
  • Object pose estimation
  • Generative models
  • Articulated object modeling
  • Texture and appearance modeling
  • Material property estimation
  • Lighting recognition

Submissions must be written in English and must be sent in PDF format. Each submitted paper must be no longer than four pages, excluding references. Please refer to the CVPR author submission guidelines for instructions regarding formatting, templates, and policies. The review process will be double blind, in that the authors will not know the names of the reviewers, and the reviewers will not know the names of the authors. Also, selected papers will be published in the IEEE CVPRW proceedings, visible in IEEE Xplore and on the CVF Website.


SUMO Challenge Launch and Data Release

Feb 5, 2019

Paper Submission Deadline

March 15, 2019

Notification to Authors

April 3rd, 2019

Camera Ready Paper Deadline and Final Challenge Submissions Due

April 10th, 2019

2019 SUMO Challenge Workshop at CVPR

June 16th or 17th, 2019


Daniel Huber


Lyne Tchapmi

Stanford University

Frank Dellaert

Georgia Tech

Ilke Demir


Shuran Song

Columbia University

Rachel Luo

Stanford University

Advisory Board

T. Funkhouser

Princeton University

Leo. Guibas

Stanford University

Jitendra Malik

UC Berkeley

Silvio Savarese

Stanford University

Challenge Advisors

Iro Armeni

Angel Chang

Kevin Chen

Christopher Choy

JunYoung Gwak

Manolis Savva

Alexander(Sasha) Sax

Richard Skarbez

Shuran Song

Amir R. Zamir