Search by item HOME > Access full text > Search by item

JBE, vol. 25, no. 7, pp.1081-1094, December, 2020


Experiment on Intermediate Feature Coding for Object Detection and Segmentation

Min Hyuk Jeong, Hoe-Yong Jin, Sang-Kyun Kim, Heekyung Lee, Hyon-Gon Choo, Hanshin Lim, and Jeongil Seo

C.A E-mail:


With the recent development of deep learning, most computer vision-related tasks are being solved with deep learning-based network technologies such as CNN and RNN. Computer vision tasks such as object detection or object segmentation use intermediate features extracted from the same backbone such as Resnet or FPN for training and inference for object detection and segmentation. In this paper, an experiment was conducted to find out the compression efficiency and the effect of encoding on task inference performance when the features extracted in the intermediate stage of CNN are encoded. The feature map that combines the features of 256 channels into one image and the original image were encoded in HEVC to compare and analyze the inference performance for object detection and segmentation. Since the intermediate feature map encodes the five levels of feature maps (P2 to P6), the image size and resolution are increased compared to the original image. However, when the degree of compression is weakened, the use of feature maps yields similar or better inference results to the inference performance of the original image.

Keyword: Deep learning, intermediate features, video coding for machine, object detection, object segmentation

[1] Huaizu Jiang, Erik Learned-Miller, “Face Detection with the Faster R-CNN”, IEEE 12th International Conference on Automatic Face & Gesture Recognition, 2017.
[2] Yuanyuan Yang, Yixiong Zou, Qingsheng Yuan, Yaowei Wang, Yonghong Tian, “Fast Compressed Domain Copy Detection with Motion Vector Imaging”, IEEE Conference on Multimedia Information Processing and Retrieval, 2018.
[3] Peiliang Li, Xiaozhi Chen, Shaojie Shen, “Stereo R-CNN based 3D Object Detection for Autonomous Driving”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7644-7652, 2019.
[4] Christian herrmann, Miriam Ruf, Jürgen Beyerer, “CNN-based thermal infrared person detection by domain adaptation”, SPIE 10643 Autonomous Systems: Sensors, Vehicles, Security, and the Internet of Everything, May 2018.
[5] Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh, “Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291-7299, 2017.
[6] Vishiwanath A. Sindagi, Vishal M. Patel, “CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting”, 14th IEEE International Conference on Advanced Video and Signal Based Surveillance, 2017.
[7] Hui Zhang, Shurong Ning, Shuo Yang, Yu Du, Yonghua Zhang, Chen Du, “Pedestrian detection method based on Faster R-CNN”, 13th IEEE International Conference on Computational Intelligence and Security, 2017.
[8] Lin Ding, Yonghong Tian, Hongfei Fan, Changhuai Chen, Tiejun Huang, “Joint Coding of Local and Global Deep Features in Videos for Visual Search”, IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL 29, January 2020.
[9] Zhuo Chen, Weisi Lin, Shiqi Wang, Lingyu Duan, Alex C. Kot, “Intermediate Deep Feature Compression: the Next Battlefield of Intelligent Sensing”, arXiv:1809.06196, September 2018.
[10] Hyomin Choi, Ivan V. Bajic, “Deep Feature Compression for Collaborative Object Detection”, 25th IEEE International Conference on Image Processing, 2018.
[11] Brain Chmiel, Chaim Baskin, Evgenii heltonozhskii, Ron Banner, Yevgeny Yermolin, Alex Karbachevsky, Alex M. Bronstein, Avi Mendelson, “Feature Map Transform Coding for Energy-Efficient CNN Inference”, International Joint Conference on Neural Network, 2020
[12] N19507, “Draft Evaluation Framework for Video Coding for Machines,” 131st MPEG Online Meeting.
[13] COCO: Common Objects in Context,
[14] Cityscapes dataset,
[15] Detectron2,


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved