自动化技术、电信技术 |
|
|
|
|
说话人日志中可靠静音模型语音活动检测方法 |
杨登舟1,2,徐嘉明1,2,刘加3,夏善红1 |
1.中国科学院 电子学研究所,北京 100190;2.中国科学院大学 电子电气与通信工程学院,北京 100049;3.清华大学 电子工程系,北京 100084 |
|
Reliable silence model based voice activity detection approach in speaker diarization |
YANG Deng zhou1,2, XU Jia ming1,2, LIU Jia3, XIA Shan hong1 |
1. Institute of Electronics, Chinese Academy of Sciences, Beijing 100190, China;2. School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China;3. Department of Electronic Engineering, Tsinghua University, Beijing 100084, China |
[1] The 2009 (RT09) rich transcription meeting recognition evaluation plan.2009 02 24. http:∥itl.nist.gov/iad/mig/tests/rt/2009/docs/rt09 meeting eval plan v2.pdf.
[2] NWE T L, MA B, LI H, et al. Speaker diarization in meeting audio for single distant microphone. [J]. Interspeech, 2010(1): 4073-4076.
[3] MA Y, NISHIHARA A. Efficient voice activity detection algorithm using long term spectral flatness measure [J]. EURASIP Journal on Audio, Speech, and Music Processing, 2013, 2013(1): 1-18.
[4] MALEGAONKAR S, ARIYAEEINIA A M, SIVAKUMARAN P. Efficient speaker change detection using adapted Gaussian mixture models [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2007, 15(6): 1859-1869.
[5] FRIEDLAND G, JANIN A, IMSENG D, et al. The ICSI RT 09 speaker diarization system [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2): 371-381.
[6] RABINER L R, SAMBUR M R. An algorithm for determining the endpoints of isolated utterances [J]. The Bell System Technical Journal, 1975, 54(2): 297-315.
[7] SHEN J L, HUANG J W, LEE L S. Robust entropy based endpoint detection for speech recognition in noisy environments [C]∥ Proceeding of International Conference on Spoken Language Processing. Sydney: ICSLP, 1998: 232-235.
[8] YANG C H. A novel approach to robust speech endpoint detection in car environments [C]∥ IEEE International Conference on Acoustics, Speech, and Signal Processing. Istanbul: IEEE, 2000:1751-1754.
[9] WANG H, XU Y, LI M. Study on the MFCC similarity based voice activity detection algorithm [C]∥ 2011 2nd International Conference on Artificial Intelligence, Management Science and Electronic Commerce. Zhengzhou: IEEE, 2011: 4391-4394.
[10] KINNUNEN T, CHERNENKO E, TUONONEN M, et al. Voice activity detection using MFCC features and support vector machine [C]∥International Conference on Speech and Computer. Moscow: Springer, 2007: 556-561.
[11] RAMIRZE J, SEGURA J C, BENITEZ C, et al. An effective subband OSF based VAD with noise reduction for robust speech recognition [J]. IEEE Transactions on Speech and Audio Processing, 2005, 13(6): 1119-1129.
[12] RESTREPO A, HINCAPIE G, PARRA A. On the detection of edges using order statistic filters [C]∥ 1994 IEEE International Conference of Image Processing. Austin: IEEE, 1994: 308-312.
[13] SADJADI S O, HANSEN J H L. Unsupervised speech activity detection using voicing measures and perceptual spectral flux [J]. Signal Processing Letters, 2013, 20(3): 197-200.
[14] DUNN R B, REYNOLDS D A, QUATIERI T F. Approaches to speaker detection and tracking in conversational speech [J]. Digital Signal Processing, 2000, 10(1): 93-112.
[15] REYNOLDS D A, QUATIERI T F, DUNN R B. Speaker verification using adapted Gaussian mixture models [J]. Digital Signal Processing, 2000, 10(1): 19-41.
[16] SCALART P. Speech enhancement based on a priori signal to noise estimation [C]∥ 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing. Atlanta: IEEE, 1996: 629-632.
[17] NWE T L, SUN H, LI H, et al. Speaker diarization in meeting audio [C]∥ IEEE International Conference on Acoustics, Speech and Signal Processing. Taipei: IEEE, 2009: 4073-4076.
[18] DEMPSTER A P, LAIRD N M, RUBIN D B. Maximum likelihood from incomplete data via the EM algorithm [J]. Journal of the Royal Statistical Society, Series B (Methodological),1977,39(1): 1-38.
[19] YU S Z, KOBAYASHI H. Practical implementation of an efficient forward backward algorithm for an explicit duration hidden Markov model [J]. IEEE Transactions on Signal Processing, 2006, 54(5): 1947-1951.
[20] NIST, Rich Transcription Spring 2006 Evaluation. 2006 02 27.http:∥www.itl.nist.gov/iad/mig/tests/rt/2006 spring/docs/rto6s meeting eval plan V2.pdf.
[21] TEMKO A, MACHO D, NADEU C. Enhanced SVM training for robust speech activity detection [C]∥ IEEE International Conference on Acoustics, Speech and Signal Processing. Honolulu: IEEE, 2007: IV 1025 IV 10-28.
[22] FREDOUILLE C, BOZONNET S, EVANS N. The LIA EURECOM RT09 speaker diarization system [C]∥RT09, NIST Rich Transcription Workshop.Melbourne:NIST,2009.
[23] NWE T L, SUN H, MA B, et al. Speaker clustering and cluster purification methods for RT07 and RT09 evaluation meeting data [J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(2): 461-473. |
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|