Shelf-Supervised Mesh Prediction in the Wild



We aim to infer 3D shape and pose of object from a single image and propose a learning-based approach that can train from unstructured image collections, supervised by only segmentation outputs from off-the-shelf recognition systems (i.e. ‘shelf-supervised’). We first infer a volumetric representation in a canonical frame, along with the camera pose. We enforce the representation geometrically consistent with both appearance and masks, and also that the synthesized novel views are indistinguishable from image collections. The coarse volumetric prediction is then converted to a mesh-based representation, which is further refined in the predicted camera frame. These two steps allow both shape-pose factorization from image collections and per-instance reconstruction in finer details. We examine the method on both synthetic and the real-world datasets and demonstrate its scalability on 50 categories in the wild, an order of magnitude more classes than existing works. Project page:

Related Publications

All Publications

ICCV - October 11, 2021

Neural-GIF: Neural Generalized Implicit Functions for Animating People in Clothing

Garvita Tiwari, Nikolaos Sarafianos, Tony Tung, Gerard Pons-Moll

ICCV - October 11, 2021

Worldsheet: Wrapping the World in a 3D Sheet for View Synthesis from a Single Image

Ronghang Hu, Nikhila Ravi, Alexander C. Berg, Deepak Pathak

CVPR - June 21, 2021

img2pose: Face Alignment and Detection via 6DoF, Face Pose Estimation

Vítor Albiero, Xingyu Chen, Xi Yin, Guan Pang, Tal Hassner

ISMAR - July 29, 2021

Instant Visual Odometry Initialization for Mobile AR

Alejo Concha, Michael Burri, Jesus Briales, Christian Forster, Luc Oth

To help personalize content, tailor and measure ads, and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookies Policy