Search by item HOME > Access full text > Search by item

JBE, vol. 23, no. 6, pp.855-865, November, 2018


Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset

Sangwon Suh, Wootaek Lim, Youngho Jeong, Taejin Lee, and Hui Yong Kim

C.A E-mail:


Sound event detection is one of the research areas to model human auditory cognitive characteristics by recognizing events in an environment with multiple acoustic events and determining the onset and offset time for each event. DCASE, a research group on acoustic scene classification and sound event detection, is proceeding challenges to encourage participation of researchers and to activate sound event detection research. However, the size of the dataset provided by the DCASE Challenge is relatively small compared to ImageNet, which is a representative dataset for visual object recognition, and there are not many open sources for the acoustic dataset. In this study, the sound events that can occur in indoor and outdoor are collected on a larger scale and annotated for dataset construction. Furthermore, to improve the performance of the sound event detection task, we developed a dual CNN structured sound event detection system by adding a supplementary neural network to a convolutional neural network to determine the presence of sound events. Finally, we conducted a comparative experiment with both baseline systems of the DCASE 2016 and 2017.

Keyword: Machine learning, Deep learning, Audio signal processing, Sound event detection, Dataset

[1] A. Temko et al., “CLEAR evaluation of acoustic event detection and classification systems,” Lecture Notes in Computer Science, vol.4122, pp.311-322, 2007.
[2] D. Stowell et al., “Detection and classification of acoustic scenes and events,” IEEE Transactions on Multimedia, vol.17, no.10, pp.1733-1746, 2015.
[3] DCASE Community,
[4] J. Portêlo et al., “Non-Speech Audio Event Detection,” IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009.
[5] DCASE 2016 Task3 Sound event detection in real life audio,
[6] DCASE 2017 Task3 Sound event detection in real life audio, http://
[7] A. Mesaros et al., “DCASE 2017 Challenge Setup: Tasks, Datasets and Baseline System,” Detection and Classification of Acoustic Scenes and Events (DCASE), 2017.
[8] A. Mesaros, T. Heittola, and T. Virtanen, “TUT Database for Acoustic Scene Classification and Sound Event Detection,” 24th European Signal Processing Conference (EUSIPCO), pp. 1128-1132, 2016.
[9] S. Adavanne, G. Parascandolo, P. Pertila, T. Heittola, and T. Virtanen, “Sound event detection in multichannel audio using spatial and harmonic features,” Detection and Classification of Acoustic Scenes and Events (DCASE), 2016.
[10] I. Jeong, S. Lee, Y. Han, and K. Lee, “Audio event detection using multiple-input convolutional neural network,” Detection and Classification of Acoustic Scenes and Events (DCASE), 2017.
[11] S. Adavanne, and T. Virtanen, “A report on sound event detection with different binaural features,” Detection and Classification of Acoustic Scenes and Events (DCASE), 2017.
[12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” IEEE Conference on Computer Vision and Patter Recognition (CVPR), 2016.
[13] Large Scae Visual Recognition Challenge (LSVRC),, [14] Y. Jung, S. Seo, W. Lim, and H. Kim, “Design and construction of Acoustic Database for developing Sound Event Detection technique,” IEIE Summer General Conference, June, 2018
[15] D. P. Kingma, and J. Ba, “Adam: A method for stochastic optimization,” Proceedings of the 3rd International Conference on Learning Representations (ICLR), 2014.
[16] TensorFlow,,
[17] Metrics For sound event detection tasks, dcase2017/challenge/metrics


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved