LASR: Learning Articulated Shape Reconstruction from a Monocular Video

CVPR 2021

Gengshan Yang¹ Deqing Sun² Varun Jampani² Daniel Vlasic² Forrester Cole²

Huiwen Chang² Deva Ramanan¹ William T. Freeman² Ce Liu²

¹Carnegie Mellon University ²Google Research

Many existing approaches on nonrigid shape reconstruction heavily rely on category-specific 3D shape templates, such as SMPL for human and SMAL for quadrupeds. In contrast, LASR jointly recovers the object shape, articulation and camera parameters from a monocular video without using category-specific shape templates. By combining generic shape and motion priors with differentiable rendering, LASR applies to a wide range of nonrigid shapes and obtains faithfull 3D reconstruciotn.

Abstract

Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to the under-constrained nature of this problem. While template-based approaches, such as parametric shape models, have achieved great success in terms of modeling the ``closed world" of known object categories, their ability to handle the ``open-world" of novel object categories and outlier shapes is still limited. In this work, we introduce a template-free approach for 3D shape learning from a single video. It adopts an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixels intensities to compare against video observations, which generates gradients signals to adjust the camera, shape and motion parameters. Without relying on a category-specific shape template, our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes in the wild.

[Paper] [Supp] [Code] [Poster] [Slides]

Bibtex

@inproceedings{yang2021lasr, title={LASR: Learning Articulated Shape Reconstruction from a Monocular Video}, author={Yang, Gengshan and Sun, Deqing and Jampani, Varun and Vlasic, Daniel and Cole, Forrester and Chang, Huiwen and Ramanan, Deva and Freeman, William T and Liu, Ce}, booktitle={CVPR}, year={2021} }

Video

Results on DAVIS

Dance-twirl

Scooter-board

Soapbox

Car-turn

Camel

Bear

Dog

Cows

Horsejump-low

Horsejump-high

Cat (Pikachu)

Related projects on shape and motion recovery

Neural Dense Non-Rigid Structure from Motion with Latent Space Constraints. ECCV 2020.
Learning Category-Specific Mesh Reconstruction from Image Collections. ECCV 2018.
Self-supervised Single-view 3D Reconstruction via Semantic Consistency. ECCV 2020.
Shape and Viewpoints without Keypoints. ECCV. 2020.
Articulation Aware Canonical Surface Mapping. CVPR 2020.
Creatures great and SMAL: Recovering the shape and motion of animals from video. ACCV 2018.
Three-D Safari: Learning to Estimate Zebra Pose, Shape, and Texture from Images "In the Wild". ICCV 2019.
VIBE: Video Inference for Human Body Pose and Shape Estimation. CVPR 2020.

Acknowledgments

This work was partially done during internship at Google. Thanks to Xueting Li, Nilesh Kulkarni and Benjamin Biggs for providing pre-trained models/implementations, Tyler Zhu for providing detailed feedback to the manuscript, Angjoo Kanazawa, Tali Dekel and Zhoutong Zhang for valuable suggestions.