Search by item HOME > Access full text > Search by item

JBE, vol. 26, no. 1, pp.70-78, January, 2021


Recent Trends of Weakly-supervised Deep Learning for Monocular 3D Reconstruction

Seungryong Kim

C.A E-mail:


Estimating 3D information from a single image is one of the essential problems in numerous applications. Since a 2D image inherently might originate from an infinite number of different 3D scenes, thus 3D reconstruction from a single image is notoriously challenging. This challenge has been overcame by the advent of recent deep convolutional neural networks (CNNs), by modeling the mapping function between 2D image and 3D information. However, to train such deep CNNs, a massive training data is demanded, but such data is difficult to achieve or even impossible to build. Recent trends thus aim to present deep learning techniques that can be trained in a weakly-supervised manner, with a meta-data without relying on the ground-truth depth data. In this article, we introduce recent developments of weakly-supervised deep learning technique, especially categorized as scene 3D reconstruction and object 3D reconstruction, and discuss limitations and further directions.

Keyword: 3D reconstruction, depth ambiguity, deep learning, weakly-supervised learning

[1] D. Scharstein and R. Szeliski, “A Taxonomy and Evaluation of Dense Two-Frame Stereo Correspondence Algorithms,“ IJCV, Vol. 47, pp. 7-42, April 2002.
[2] M. Poggi, F. Tosi, K. Batsos, P. Mordohai, and S. Mattoccia, “On the Synergies between Machine Learning and Stereo: a Survey,” arXiv:2004.08566, 2020.
[3] R. Ranftl, K. Lasinger, D. Hafner, K. Schindler, and V. Koltun, “Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer,” TPAMI, 2020.
[4] C. Godard, O. M. Aodha, and G. J. Browstow, “Unsupervised Monocular Depth Estimation with Left-Right Consistency,” CVPR, 2017.
[5] C. Godard, O. M. Aodha, M. Firman, and G. J. Browstow, “Digging into Self-Supervised Monocular Depth Prediction,” ICCV, 2019.
[6] A. Kanazawa, S. Tulsiani, A. A. Efros, and J. Malik, “Learning Category-Specific Mesh Reconstruction from Image Collections,” ECCV, 2016.
[7] A. Kanazawa, M. J. Black, D. W. Jacobs, and J. Malik, “End-to-end Recovery of Human Shape and Pose,” CVPR, 2018.
[8] A. Saxena, M, Sun, A. Y. Ng, “Make3D: Learning 3D Scene Structure from a Single Still Image,” TPAMI, Vol. 31, No. 5, pp. 824-840, May 2009. [9] D. Eigen, C. Puhrsch, and R. Fergus, “Depth Map Prediction from a Single Image Using a Multi-Scale Deep Network,” NeurIPS, 2014.
[10] J. Xie, R. Girshick, and A. Farhadi, “Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks,” ECCV, 2016.
[11] R. Garg, V. Kumar, G. Carneiro, I. Reid, “Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue,” ECCV, 2016.
[12] T. Zhou, M. Brown, N. Snavely, and D. G. Lowe, “Unsupervised Learning of Depth and Ego-Motion frrom Video,” CVPR, 2017.
[13] A. Ranjan, V. Jampani, L. Balles, K. Kim, D. Sun, J. Wulff, M. J. Black, “Competitive Collaboration: Joint Unsupervised Learning of Depth, Camera, Motion, Optical Flow and Motion Segmentation,” CVPR, 2019.
[14] S. Zhu, G. Brazil, X. Liu, “The Edge of Depth: Explicit Constraints between Segmentation and Depth,” CVPR, 2020.
[15] N. Kulkarni, A. Gupta, S. Tulsiani, “Canonical Surface Mapping via Geometric Cycle Consistency,” ICCV, 2019.
[16] S. Goel, A. Kanazawa, and J. Malik, “Shape and Viewpoint without Keypoints,” ECCV, 2020.
[17] N. Kolotouros, G. Pavlakos, M. J. Black, K. Daniilidis, “Learning to Reconstruct 3D Human Pose and Shape via Model-fitting in the Loop,” ICCV, 2019.


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved