Search by item HOME > Access full text > Search by item

JBE, vol. 23, no. 1, pp.86-92, January, 2018

DOI: https://doi.org/10.5909/JBE.2018.23.1.86

Coding History Detection of Speech Signal using Deep Neural Network

Hyo-Jin Cho, Won Jang, Seong-Hyeon Shin, and Hochong Park

C.A E-mail: hcpark@kw.ac.kr

Abstract:

In this paper, we propose a method for coding history detection of digital speech signal. In digital speech communication and storage, the signal is encoded to reduce the number of bits. Therefore, when a speech signal waveform is given, we need to detect its coding history so that we can determine whether the signal is an original or an coded one, and if coded, determine the number of times of coding. In this paper, we propose a coding history detection method for 12.2kbps AMR codec in terms of original, single coding, and double coding. The proposed method extracts a speech-specific feature vector from the given speech, and models the feature vector using a deep neural network. We confirm that the proposed feature vector provides better performance in coding history detection than the feature vector computed from the general spectrogram.

 



Keyword: coding history, feature vector, speech parameter, DNN

Reference:
[1] B. D’Alessandro and Y. Q. Shi, “MP3 bit rate quality detection through frequency spectrum analysis,” Proc. 11th ACM Workshop on Multimedia and Security, pp. 57­61, 2009.
[2] T. Bianchi, A. De Rosa, M. Fontani, G. Rocciolo and A. Piva, “Detection and classification of double compressed MP3 audio tracks,” Proc. 1st ACM Workshop on Information Hiding and Multimedia Security, pp. 159­164, 2013.
[3] D. Luo, W. Luo, R. Yang and J. Huang, “Identifying compression history of wave audio and its applications,” ACM Trans. on Multimedia Computing, Communications, and Applications, vol. 10, no. 3, pp. 30:1­30:19, 2014.
[4] D. Seichter, L. Cuccovillo and P. Aichroth, “AAC encoding detection and bitrate estimation using a convolutional neural network,” Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 2069­2073, 2016.
[5] D. Luo, R. Yang, B. Li and J. Huang, “Detection of Double Compressed AMR Audio Using Stacked Autoencoder,” IEEE Trans. on Information Forensics and Security, vol. 12, no. 2, pp. 432­444, 2017.
[6] Y. LeCun, Y. Bengio and G. Hinton, “Deep learning,” Nature, 521.7553: 436­444, 2015.
[7] K. L. Priddy and P. E. Keller, Artificial neural networks: an introduction, SPIE Press, 2005.
[8] S. Ioffe and C. Szegedy, "Batch normalization: accelerating deep network training by reducing internal covariate shift," Int. Conf. on Machine Learning(ICML), pp. 448­456, 2015.
[9] H.-W. Yun, S.-H. Shin, W.-J. Jang and H. Park, “On-line audio genre classification using spectrogram and deep neural network,” J. of Broadcast Engineering, vol. 21, no. 6, pp. 977­985, Nov. 2016.

Comment


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: www.kibme.org TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved