Journal of Multimedia, Vol 7, No 2 (2012), 159-169, Apr 2012
doi:10.4304/jmm.7.2.159-169

Making Czech Historical Radio Archive Accessible and Searchable for Wide Public

Jan Nouza, Karel Blavka, Petr Cerva, Jindrich Zdansky, Jan Silovsky, Marek Bohac, Jan Prazak

Abstract


In this paper we describe a complex software platform that is being developed for the automatic transcription and indexation of the Czech Radio archive of spoken documents. The archive contains more than 100.000 hours of audio recordings covering almost ninety years of public broadcasting in the Czech Republic and former Czechoslovakia. The platform is based on modern speech processing technology and includes modules for speech, speaker and language recognition, and tools for multimodal information retrieval. The aim of the project supported by the Czech Ministry of Culture is to make the archive accessible and searchable both for researchers as well as for wide public. After the first project’s year, the key modules have been already implemented and tested on a 27.400-hour subset of the archive. A web-based full-text search engine allows for the demonstration of the project’s current state.


Keywords


audio archive processing, spoken document transcription, speech and speaker recognition, audio search

References


 

[1] J. H. L. Hansen, J. Deller, and M. Seadle, “Engineering challenges in the creation of a National Gallery of the Spoken Word: Transcript-free search of audio archives,” in Proc. IEEE ACM Joint Conf. Digital Libraries, Roanoke, VA, Jun. 2001, pp. 235–236.

[2] J. H. L. Hansen, R. Huang, B. Z. M. Seadle, J. J. R. Deller, A. R. Gurijala, M. Kurimo, and P. Angkititrakul, “Speechfind: Advances in spoken document retrieval for a national gallery of the spoken word”, IEEE Trans. Speech Audio Processing, vol. 13, no. 5, pp.712 - 730, 2005.
http://dx.doi.org/10.1109/TSA.2005.852088

[3] R.J.F. Ordelman, F.M.G. de Jong, and W.F.L. Heeren. “Exploration of audiovisual heritage using audio indexing technology“, Proc. of 1st workshop on intelligent technologies for cultural heritage exploitation, Trento, Italy, Sept 2006, pp 36–39.

[4] W. Byrne, D. Doermann, M. Franz, S. Gustman, J. Hajic, D. Oard, M. Picheny, J. Psutka, B. Ramabhadran, D. Soergel, T. Ward, and W.-J. Zhu, “Automatic recognition of spontaneous speech for access to multilingual oral history archives“, IEEE Trans. Speech Audio Process., vol. 12, no. 4, pp.420 - 435, 2004.
http://dx.doi.org/10.1109/TSA.2004.828702

[5] J. Nouza, J. Zdansky, P. Cerva, J. Kolorenc, “A system for information retrieval from large records of Czech spoken data”. In: Text, Speech and Dialogue, Lecture Notes in Computer Science, LNCS (LNAI), vol. 4188, Springer, Heidelberg (2006), pp. 401-408.

[6] J. Nouza, J. Zdansky, P. Cerva, “System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search” Proc. of 15th IEEE MELECON conference, Malta, April 2010, pp. 202-205

[7] J. Nouza, K. Blavka, M. Bohac, P. Cerva, J. Zdansky, J. Silovsky, and J. Prazak, “Voice Technology to Enable Sophisticated Access to Historical Audio Archive of the Czech Radio”, In Multimedia for Cultural Heritage. Communications in Computer and Information Science. Springer Berlin Heidelberg, 2012, vol. 247, pp.27-38.

[8] FFmpeg converter available at http://www.ffmpeg.org/

[9] J. Nouza, D. Nejedlova, J. Zdansky, J. Kolorenc, "Very large vocabulary speech recognition system for automatic transcription of Czech broadcast programs", In Proc. of Interspeech-2004, Jeju, Korea, Oct. 2004, pp. 409-412.

[10] J. Nouza, J. Silovsky, J. Zdansky, P. Cerva, M. Kroul, J. Chaloupka, "Czech-to-Slovak adapted broadcast news transcription system", In Proc. of Interspeech-2008, Brisbane, Australia, Sept. 2008, pp. 2683-2686.

[11] P. Cerva, K. Palecek, J. Silovsky, J. Nouza, “Using Unsupervised Feature-Based Speaker Adaptation for Improved Transcription of Spoken Archives”, in Proc. of Interspeech-2011, Florence, Italy, August 2011, pp. 2565- 2568

[12] M. Bohac, K. Blavka, “Automatic segmentation and annotation of audio archive documents”, In Proc. of ECMS-2011, Liberec, Czech Rep., May 2011, pp. 1 - 6

[13] V. Hanzl, P. Pollak, “Accuracy Analysis of Generalized Pronunciation Variant Selection in ASR Systems”. In: Cross-Modal Analysis of Speech, Gestures, Gaze and Facial Expressions LNCS, vol. 5641, Springer Heidelberg (2009), pp. 399-408.

[14] S. Chen and P. Gopalakrishnan, “Speaker, environment and channel change detection and clustering via the bayesian information criterion,” in Proc. of 1998 DARPA Broadcast News Transcription and Understanding Workshop, 1998, pp. 127–132.

[15] J. Silovsky, J. Prazak, P. Cerva, J. Zdansky, J. Nouza, “PLDA-based Clustering for Speaker Diarization of Broadcast Streams”, in Proc. of Interspeech-2011, Florence, Italy, August 2011, pp. 2909-2912

[16] J. Silovsky, P. Cerva, J. Zdansky, “Comparison of Generative and Discriminative Approaches for Speaker Recognition with Limited Data”, Radioengineering, Vol. 18, pp. 307-316, 2009.

[17] J. Navratil, “Spoken language recognition - a step toward multilinguality in speech processing”, IEEE Transactions on Speech and Audio Processing, vol. 9, pp. 678-685, Sept. 2001.
http://dx.doi.org/10.1109/89.943345

[18] MySQL platform available at http://www.mysql.com/

[19] SPHINX platform available at http://sphinxsearch.com/

[20] Demo of APAP available at http://ahmed.ite.tul.cz/demo/

[21] L. Lamel, J.-L. Gauvain, “Speech processing for audio indexing”. In Advances in Natural Language Processing, LNCS Springer Heidelberg (2008), vol. 5221, pp. 4-15.
http://dx.doi.org/10.1007/978-3-540-85287-2_2

[22] P. Cerva, J. Nouza, J. Silovsky, “Study on Cross-lingual Adaptation of a Czech LVCSR System towards Slovak”. In Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues. LNCS, vol. 6800, Springer Heidelberg (2011), pp. 81-87.

[23] Transcriber - a tool for segmenting, transcribing speech: http://trans.sourceforge.net/en/presentation.php

[24] M. Huijbregts, R. Ordelman, F. de Jong, "Speech-based Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition". CTIT-technical Report, May 2007.

[25] P. Pollak, M. Behunek, "Accuracy of MP3 speech recognition under real-word conditions: Experimental study", Proc. SIGMAP 2011, Seville, July 2011, pp. 5-10


Full Text: PDF


Journal of Multimedia (JMM, ISSN 1796-2048)

Copyright @ 2006-2013 by ACADEMY PUBLISHER – All rights reserved.