Gengshan Yang1 | Deqing Sun2 | Varun Jampani2 | Daniel Vlasic2 | Forrester Cole2 |
---|
Huiwen Chang2 | Deva Ramanan1 | William T. Freeman2 | Ce Liu2 |
---|
1Carnegie Mellon University | 2Google Research |
---|
Remarkable progress has been made in 3D reconstruction of rigid structures from a video or a collection of images. However, it is still challenging to reconstruct nonrigid structures from RGB inputs, due to the under-constrained nature of this problem. While template-based approaches, such as parametric shape models, have achieved great success in terms of modeling the ``closed world" of known object categories, their ability to handle the ``open-world" of novel object categories and outlier shapes is still limited. In this work, we introduce a template-free approach for 3D shape learning from a single video. It adopts an analysis-by-synthesis strategy that forward-renders object silhouette, optical flow, and pixels intensities to compare against video observations, which generates gradients signals to adjust the camera, shape and motion parameters. Without relying on a category-specific shape template, our method faithfully reconstructs nonrigid 3D structures from videos of human, animals, and objects of unknown classes in the wild.
@inproceedings{yang2021lasr, title={LASR: Learning Articulated Shape Reconstruction from a Monocular Video}, author={Yang, Gengshan and Sun, Deqing and Jampani, Varun and Vlasic, Daniel and Cole, Forrester and Chang, Huiwen and Ramanan, Deva and Freeman, William T and Liu, Ce}, booktitle={CVPR}, year={2021} }
Neural Dense Non-Rigid Structure from Motion with
Latent Space Constraints. ECCV 2020.
Learning Category-Specific Mesh Reconstruction from Image Collections.
ECCV 2018.
Self-supervised Single-view 3D Reconstruction via
Semantic Consistency. ECCV 2020.
Shape and Viewpoints without Keypoints. ECCV. 2020.
Articulation Aware Canonical Surface Mapping. CVPR 2020.
Creatures great and SMAL: Recovering the shape and motion of
animals from video. ACCV 2018.
Three-D Safari: Learning to Estimate Zebra Pose, Shape, and
Texture from Images "In the Wild". ICCV 2019.
VIBE: Video Inference for Human Body Pose and Shape Estimation. CVPR
2020.
This work was partially done during internship at Google. Thanks to Xueting Li, Nilesh Kulkarni and Benjamin Biggs for providing pre-trained models/implementations, Tyler Zhu for providing detailed feedback to the manuscript, Angjoo Kanazawa, Tali Dekel and Zhoutong Zhang for valuable suggestions.