Universal background model reduction based efficient speaker recognition

doi:10.3785/j.issn.1008973X.2009.

2009, Vol. 43

Issue (6): 978-982 DOI: 10.3785/j.issn.1008973X.2009.

Universal background model reduction based efficient speaker recognition

CHAN Zhen-Yu, YANG Ying-Chun

( College of Computer Science and Tachnology, Zhejiang University, Hangzhou 310027, China)

Download:

PDF(523KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A universal background model (UBM) reduction method was proposed to speed up the Gaussian mixture modeluniversal background model (GMMUBM) based speaker recognition system. A highorder UBM was trained by expectation maximization (EM) algorithm and then clustered into a new UBM with lower order.The Gaussian component with the shortest distance was adopted to replace the empty set to solve the empty mapping set problem. Three methods of initialization loworder UBM were experimentally analyzed to find out that different initialization methods converged to similar recognition results. The experiments on NIST2001 SRE Corpora showed that the equal error rate (EER) of the system only increased 459%, while the computation speed increased by 8 times. The UBM reduction method can considerably improve the efficiency of the GMMUBM system while maintaining the performance．

Published: 01 June 2009

CLC:

TP391

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors

Cite this article:

CHAN Zhen-Yu, YANG Ying-Chun. Universal background model reduction based efficient speaker recognition. J4, 2009, 43(6): 978-982.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008973X.2009. OR http://www.zjujournals.com/eng/Y2009/V43/I6/978

基于UBM降阶算法的高效说话人识别系统

为了提高基于高斯混合模型-通用背景模型（GMMUBM）说话人识别系统的运算速度,提出了通用背景模型（UBM）降阶算法,该方法采用极大似然估计法训练一个高阶UBM,再采用UBM降阶算法得到低阶UBM.采用最短距离高斯分量替换空映射集合的方法解决了空映射集问题.通过实验方法分析了3种初始化低阶UBM方法的识别结果,发现不同的初始化方法对结果影响很小.在NIST2001 SRE数据库上的实验显示,该算法使基于GMMUBM说话人识别系统的运算速度提高了8倍,而等错误率仅上升了459%,表明了UBM降阶算法在小幅降低系统识别率的情况下,可大幅度提高GMMUBM系统的运行效率.

［1］ REYNOLDS D, QUATIERI T F, DUNN R B. Speaker Verification Using Adapted Gaussian Mixture Models［J］. Digital Signal Processing, 2000,10(3):1941．
［2］ HUNT M. An Investigation of PLP and IMELDA acoustic representations and of their potential for combination［C］∥ Proceedings of ICASSP91.Toronto:IEEE, 1991:881884．
［3］ HERMANSKY H, MALAYATH N. Speaker verification using speaker-specific mappings［C］∥ Proceedings of RLA2C98.France:［s.n.］,1998:111114．
［4］ MCLAUGHLIN J, REYNOLDS D, GLEASON T. A study of computation Speed-Ups of the GMM-UBM speaker recognition system［C］∥ Proceedings of Eurospeech99. Budapest:ISCA, 1999: 12151218．
［5］ VUUREN S, HERMANSKY H. MESS: Modular, efficient speaker verification system ［C］∥ Proceedings of RLA2C98. France:［s.n.］, 1998:198201．
［6］ BEIGI H, MAES S, CHAUDHARI U, et al. A hierarchical approach to large-scale speaker recognition［C］∥ Proceedings of Eurospeech95.Budapest:ISCA,1999:22032206.
［7］ XIANG B, BERGER T. Efficient text-independent speaker verification with structural Gaussian mixture models and neural network［J］. IEEE Transaction on Speech Audio Process, 2003, 11(5):447456．
［8］ XIONG Z, ZHENG T, SONG Z, et al. Combining selection tree with observation reordering pruning for efficient speaker identification using GMM-UBM［C］∥ Proceedings of ICASSP 05.Philadelphia:IEEE, 2005:625628．
［9］ XIONG Z, ZHENG T, SONG Z, et al. Tree-structure universal background model-based efficient speaker identification［J］. Journal of Tsinghua University ：Science and Technology, 2006,46(7):13051308．
［10］ GOLDBERGER J, ROWEIS S. Hierarchical clustering of a mixture model［C］∥Neural Information Processing Systems. Canada:NIPS, 2004:464471．
［11］ NIST 2001 Speaker ID Evaluation protocol［EB/OL］.［2001-3-14］. Http:∥www.nist.gov/speech/tests/spk/2001．
［12］ HERMANSKY H, MORGAN N, BAYYA A. RASTA-PLP speech analysis technique［C］∥ Proceedings of ICASSP92. San Francisco:IEEE, 1992:121124．
［13］ AUCHENTHALER R, CAREY H, LLOYD H. Score normalization for text-independent speaker verification systems［J］. Digital Signal Processing, 2000,10(3): 4254．

[1]	HU Qiu-Er, OU Yang-Yi, ZHANG San-Yuan, ZHANG Yin. Mesh deformation transfer based on meanvalue skeleton[J]. J4, 2010, 44(4): 710-714.

[2]	BIAN Ke-Ke, WANG Jing, LI Jiang-Xiong, et al. Local consistent mending technique for complex freeform surface model[J]. J4, 2009, 43(6): 1118-1123.

[3]	CA Hua-Hui, WANG Guo-Jin. Approximating logarithmic spiral segments by polynomial and C-Bézier[J]. J4, 2009, 43(6): 999-1004.

[4]	HUANG Feng, BO Jia-Dun, CHEN Chun, et al. Improving question classification via weighted feature model[J]. J4, 2009, 43(6): 994-998.

[5]	SHU Beng, HONG Guo-Zhao. Multi-degree B-spline curves[J]. J4, 2009, 43(5): 789-795.

[6]	LOU Bin, CHEN Hai-Bin, DIAO Wu-Feng, et al. Structural similarity image quality assessment based on distortion model[J]. J4, 2009, 43(5): 864-868.

[7]	XU Jing-hua, ZHANG Shu-you. Shape retrieval method of 3D models based on shape distribution graph and BP neural network[J]. J4, 2009, 43(5): 877-883.

Viewed

Full text

Abstract

Cited

Shared

Discussed