Search by item HOME > Access full text > Search by item

JBE, vol. 26, no. 5, pp.519-532, September, 2021


FBX Format Animation Generation System Combined with Joint Estimation Network using RGB Images

Yujin Lee, Sangjoon Kim, and Gooman Park

C.A E-mail:


Recently, in various fields such as games, movies, and animation, content that uses motion capture to build body models and create characters to express in 3D space is increasing. Studies are underway to generate animations using RGB-D cameras to compensate for problems such as the cost of cinematography in how to place joints by attaching markers, but the problem of pose estimation accuracy or equipment cost still exists. Therefore, in this paper, we propose a system that inputs RGB images into a joint estimation network and converts the results into 3D data to create FBX format animations in order to reduce the equipment cost required for animation creation and increase joint estimation accuracy. First, the two-dimensional joint is estimated for the RGB image, and the three-dimensional coordinates of the joint are estimated using this value. The result is converted to a quaternion, rotated, and an animation in FBX format is created. To measure the accuracy of the proposed method, the system operation was verified by comparing the error between the animation generated based on the 3D position of the marker by attaching a marker to the body and the animation generated by the proposed system.

Keyword: Pose estimation, Quaternion, Joint rotation, FBX, 3D animation

[1] S. Kim, “Realtime 3D Human Full-Body Convergence Motion Capture using a Kinect Sensor,” Journal of Digital Convergence, Vol.14, No.1, pp.189-194, Jan 2016,
[2] J. Jeong, M. Yoon, S. Kim, and G. Park, “Design and production of real-time 3D animation viewer engine based on motion capture,” The Institute of Electronics and Information Engineers, 531-535, Jun 2019.
[3] Kinect animation studio, Studio/index.html
[4] Y. Yang and D. Ramanan, "Articulated Human Detection with Flexible Mixtures of Parts," IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol.35, No.12, pp.2878-2890, Dec 2013
[5] B. Sapp and B. Taskar, “MODEC: Multimodal decomposable models for human pose estimation,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp.3674–3681, 2013.
[6] J. Tompson, A. Jain, Y. LeCun, and C. Bregler, “Joint training of a convolutional network and a graphical model for human pose estimation,” Adv. Neural Inf. Process. Syst., Vol.2, pp.1799–1807, Jan 2014.
[7] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh, “Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields,“ Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7291-7299, 2017.
[8] J. Martinez, R. Hossain, J. Romero, and J. J. Little, “A Simple Yet Effective Baseline for 3d Human Pose Estimation,” Proc. IEEE Int. Conf. Comput. Vis., vol. 2017-October, pp.2659–2668, 2017.
[9] OpenMMD,
[10] Kumarapu, Laxman and Prerana Mukherjee. “AnimePose: Multi-person 3D pose estimation and animation.” Pattern Recognit. Lett. 147, pp.16-24. 2021.
[11] Iro Laina, Christian Rupprecht, Vasileios Belagiannis, Federico Tombari, and Nassir Navab. “Deeper depth prediction with fully convolutional residual networks,” In 3D Vision (3DV), 2016 Fourth International Conference on, pp.239–248. IEEE, 2016.
[12] Bo Li, Chunhua Shen, Yuchao Dai, A. van den Hengel and Mingyi He, "Depth and surface normal estimation from monocular images using regression on deep features and hierarchical CRFs," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.1119-1127, 2015, doi: 10.1109/CVPR.2015.7298715.
[13] Liu, Fayao, Chunhua Shen and Guosheng Lin. “Deep convolutional neural fields for depth estimation from a single image.” 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.5162-5170, 2015.
[14] K. He, X. Zhang, S. Ren and J. Sun, "Deep Residual Learning for Image Recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp.770-778, 2016. doi: 10.1109/CVPR. 2016.90.
[15] FBX SDK, technologies/fbx-sdk-2020-0
[16] MS COCO dataset, [17] MPII dataset, [18] AI Challenger dataset, [19] M. Yoon, Research of FBX generation using deep learning, Master’s Thesis of Seoul National University of Science and Technology, Seoul, Korea, 2020. [20] S. Kim, Y. Lee, and G. Park.  “Real-Time Joint Animation Production and Expression System using Deep Learning Model and Kinect Camera.“ Journal of Broadcast Engineering 26(3), pp.269-282, May 2021. [21] Human3.6M dataset,


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved