Search by item HOME > Access full text > Search by item

JBE, vol. 24, no. 4, pp.553-563, July, 2019

DOI: https://doi.org/10.5909/JBE.2019.24.4.553

Video Highlight Prediction Using Multiple Time-Interval Information of Chat and Audio

Eunyul Kim and Gyemin Lee

C.A E-mail: gyemin@seoultech.ac.kr

Abstract:

As the number of videos uploaded on live streaming platforms rapidly increases, the demand for providing highlight videos is increasing to promote viewer experiences. In this paper, we present novel methods for predicting highlights using chat logs and audio data in videos. The proposed models employ bi-directional LSTMs to understand the contextual flow of a video. We also propose to use the features over various time-intervals to understand the mid-to-long term flows. The proposed Our methods are demonstrated on e-Sports and baseball videos collected from personal broadcasting platforms such as Twitch and Kakao TV. The results show that the information from multiple time-intervals is useful in predicting video highlights.



Keyword: Video highlight, Multiple time-interval models, Bi-directional LSTM, Chat logs, Audio

Reference:
[1] Twitch, https://www.twitch.tv/ (accessed Mar. 08, 2019).
[2] Kakao TV, https://tv.kakao.com/ (accessed Mar. 08, 2019).
[3] M. Sun, A. Farhadi, and S. Seitz, “Ranking Domain -specific Highlights by Analyzing Edited Videos,” European Conference on Computer Vision, Zurich, Switzerland, pp. 708-802, 2014, doi:10.1007/978-3-319-10590-1_51.
[4] H. Tang, V. Kwatra, ME. Sargin, and U. Gargi, "Detecting highlights in sports videos: Cricket as a test case," IEEE International Conference on Multimedia and Expo, Barcelona, Spain, pp. 1–6, 2011, doi:10.1109/ICME.2011.6012139.
[5] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” The IEEE Conference on Computer Vision and Pattern Recognition, Boston, Massachusetts, pp. 1-9, 2015, doi: 10.1109/ CVPR.2015.7298594 .
[6] K. Zhang, WL. Chao, F. Sha, and K. Grauman, “Video Summarization with Long Short-term Memory,” European Conference on Computer Vision, Amsterdam, Netherlands, pp. 766-782, 2016, doi:10.1007/ 978-3-319-46478-7_47.
[7] Z. Xiong, R. Radhakrishnan, A. Divakaran, and TS. Huang, “Highlights extraction from sports video based on an audio-visual marker detection framework”, IEEE International Conference on Multimedia and Expo, Amsterdam, Netherlands, pp. 29-32, 2005, doi:10.1109/ICME.2005.1521352.
[8] LC. Hsieh, CW. Lee, TH. Chiu, and W. Hsu, “Live semantic sport highlight detection based on analyzing tweets of twitter,” IEEE International Conference on Multimedia and Expo, Melbourne, Australia, pp. 949-954, 2012, doi:10.1109/ICME.2012.135.
[9] J. Li, Z. Liao, C. Zhang, and J. Wang, “Event Detection on Online Videos using Crowdsourced Time-Sync Comment,” International Conference on Cloud Computing and Big Data, Macau, China, pp. 52-57, 2016, doi:10.1109/CCBD.2016.021.
[10] Q. Ping, C. Chen, “Video Highlights Detection and Summarization with Lag-Calibration based on Concept-Emotion Mapping of Crowd-sourced Time-Sync Comments,” Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1-11, 2017, doi:10.18653/v1/W17-4501.
[11] E. Kim, G. Lee, "Highlight Detection in Personal Broadcasting by Analysing Chat Traffic : Game Contests as a Test Case," Journal of Broadcast Engineering, Vol.23, No.2, pp.218-226, 2018, doi: http://dx.doi.org/10.5909/JBE.2018.23.2.218 .
[12] CY. Fu, J. Lee, M. Bansal, and AC. Berg, “Video Highlight Prediction Using Audience Chat Reactions,” Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 972-978, 2017.
[13] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of Tricks for Efficient Text Classification,” European Chapter of the Association for Computational Linguistics, Valencia, Spain, pp. 427-431, 2016, doi:10.18653/v1/E17-2068.
[14] S. Davis, P.Mermelstein, “Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences,” IEEE Transactions on Acoustics, Speech, and Signal Processing, Vol.28, No.4, pp.357-366, 1980, doi:https://doi.org/ 10.1109/tassp.1980.1163420.
[15] T. Mikolov, K. Chen, G. Corrado, and J. Dean. “Efficient Estimation of Word Representations in Vector Space,” Journal of Biomedical Science and Engineering, Vol.9, No.1, pp.7-16 2016
[16] S. Hochreiter, J. Schmidhuber, “Long short-Term Memory,” Neural Computation, Vol.9, No.8, pp.1735-1780, 1997, doi:10.1162/neco. 1997.9.8.1735 .

Comment


Editorial Office
1108, New building, 22, Teheran-ro 7-gil, Gangnam-gu, Seoul, Korea
Homepage: www.kibme.org TEL: +82-2-568-3556 FAX: +82-2-568-3557
Copyrightⓒ 2012 The Korean Institute of Broadcast and Media Engineers
All Rights Reserved