Search by item HOME > Access full text > Search by item

JBE, vol. 22, no. 6, pp.693-701, November, 2017

DOI: https://doi.org/10.5909/JBE.2017.22.6.693

Music Genre Classification using Spikegram and Deep Neural Network

Woo-Jin Jang, Ho-Won Yun, Seong-Hyeon Shin, Hyo-Jin Cho, Won Jang and Hochong Park

C.A E-mail: hcpark@kw.ac.kr

Abstract:

In this paper, we propose a new method for music genre classification using spikegram and deep neural network. The human auditory system encodes the input sound in the time and frequency domain in order to maximize the amount of sound information delivered to the brain using minimum energy and resource. Spikegram is a method of analyzing waveform based on the encoding function of auditory system. In the proposed method, we analyze the signal using spikegram and extract a feature vector composed of key information for the genre classification, which is to be used as the input to the neural network. We measure the performance of music genre classification using the GTZAN dataset consisting of 10 music genres, and confirm that the proposed method provides good performance using a low-dimensional feature vector, compared to the current state-of-the-art methods.



Keyword: music genre, genre classification, spikegram, deep neural network

Reference:
[1] M. Henaff, K. Jarrett, K. Kavukcuoglu and Y. LeCun, "Unsupervised Learning of Sparse Features for Scalable Audio Classification," Proceeding of International Society for Music Information Retrieval Conference (ISMIR), pp.681-686, Sep. 2011.

[2] S. H. Kim, D. S. Kim and B. W. Suh, "Music Genre Classification Using Multimodal Deep Learning," Proceeding of Human Computer Interaction Korea, pp.389-395, Jan. 2016.

[3] D. Bhalke, B. Rajesh and D. Bormane, "Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features," Archives of Acoustics, Vol.42, No.2, pp.213-222, 2017.

[4] M. Patil and U. Nemade, "Music Genre Classification Using MFCC, K-NN and SVM Classifier," International Journal of Computer Engineering In Research Trends, Vol.4, No.2, pp.43-47, Feb. 2017.

[5] P. Manzagol, T. Bertin-Mahieux and D. Eck, "On The Use of Sparse Time-Relative Auditory Codes for Music," Proceeding of International Society for Music Information Retrieval Conference (ISMIR), pp.603-608, Sep. 2008.

[6] G. Tzanetakis and P. Cook, "Musical Genre Classification of Audio Signals," IEEE Transactions on Speech and Audio Processing, Vol.10, No.5, pp. 293-302, July 2002.

[7] E. Smith and M. Lewicki, "Efficient Auditory Coding," Nature, Vol.439, No.7079, pp.978-982, Feb. 2006.

[8] G. Mather, Foundations of Perception, Psychology Press, 2006.

[9] J. Tropp and A. Gilbert, "Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit," IEEE Transactions on Information Theory, Vol.53, No.12, Dec. 2007.

[10] N. Srivastava, G. Hinton, A. Krizhevsky and R. Salakhutdinov, "Dropout: A Simple Way to Prevent Neural Networks from Overfitting," Journal of Machine Learning Research, Vol.15, No.1, pp.1929-1958, June 2014.

Comment


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: www.kibme.org TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved