Please wait a minute...
J4  2009, Vol. 43 Issue (6): 978-982    DOI: 10.3785/j.issn.1008973X.2009.
计算机技术、自动化技术     
基于UBM降阶算法的高效说话人识别系统
单振宇,杨莹春
(浙江大学 计算机科学与技术学院,浙江 杭州 310027)
Universal background model reduction based efficient speaker recognition
 CHAN Zhen-Yu, YANG Ying-Chun
( College of Computer Science and Tachnology, Zhejiang University, Hangzhou  310027,  China)
 全文: PDF(523 KB)   HTML
摘要:

为了提高基于高斯混合模型-通用背景模型(GMMUBM)说话人识别系统的运算速度,提出了通用背景模型(UBM)降阶算法,该方法采用极大似然估计法训练一个高阶UBM,再采用UBM降阶算法得到低阶UBM.采用最短距离高斯分量替换空映射集合的方法解决了空映射集问题.通过实验方法分析了3种初始化低阶UBM方法的识别结果,发现不同的初始化方法对结果影响很小.在NIST2001 SRE数据库上的实验显示,该算法使基于GMMUBM说话人识别系统的运算速度提高了8倍,而等错误率仅上升了459%,表明了UBM降阶算法在小幅降低系统识别率的情况下,可大幅度提高GMMUBM系统的运行效率.

Abstract:

A universal background model (UBM) reduction method was proposed to speed up the Gaussian mixture modeluniversal background model (GMMUBM) based  speaker recognition system. A highorder UBM was trained by expectation maximization (EM) algorithm and then clustered into a new UBM with lower order.The Gaussian component with the shortest distance was adopted to replace the empty set to solve the empty mapping set problem. Three methods of initialization loworder UBM were experimentally analyzed to find out that different initialization methods converged to similar recognition results. The experiments on NIST2001 SRE Corpora showed that the equal error rate (EER) of the system only increased 459%, while the computation speed increased  by 8 times. The UBM reduction method can considerably improve the efficiency of the GMMUBM system while maintaining the performance.

出版日期: 2009-06-01
:  TP391  
基金资助:

国家杰出青年基金资助项目(60525202);国家自然科学基金资助项目(60533040);教育部新世纪优秀人才计划资助项目(NCET040545);国家“863”高技术研究发展计划资助项目(2006AA01Z136);长江学者和创新团队发展计划资助项目(IRT0652),浙江省自然科学基金资助项目(Y106705).

通讯作者: 杨莹春,女,副教授.     E-mail: yyc@zju.edu.cn
作者简介: 单振宇(1981-),男,浙江台州人,博士生,从事说话人识别研究.
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

单振宇, 杨莹春. 基于UBM降阶算法的高效说话人识别系统[J]. J4, 2009, 43(6): 978-982.

CHAN Zhen-Yu, YANG Ying-Chun. Universal background model reduction based efficient speaker recognition. J4, 2009, 43(6): 978-982.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008973X.2009.        http://www.zjujournals.com/eng/CN/Y2009/V43/I6/978

[1] REYNOLDS D, QUATIERI T F, DUNN R B. Speaker Verification Using Adapted Gaussian Mixture Models[J]. Digital Signal Processing, 2000,10(3):1941.
[2] HUNT M. An Investigation of PLP and IMELDA acoustic representations and of their potential for combination[C]∥ Proceedings of ICASSP91.Toronto:IEEE, 1991:881884.
[3] HERMANSKY H, MALAYATH N. Speaker verification using speaker-specific mappings[C]∥ Proceedings of RLA2C98.France:[s.n.],1998:111114.
[4] MCLAUGHLIN J, REYNOLDS D, GLEASON T. A study of computation Speed-Ups of the GMM-UBM speaker recognition system[C]∥ Proceedings of Eurospeech99. Budapest:ISCA, 1999: 12151218.
[5] VUUREN S, HERMANSKY H. MESS: Modular, efficient speaker verification system [C]∥ Proceedings of RLA2C98. France:[s.n.], 1998:198201.
[6] BEIGI H, MAES S, CHAUDHARI U, et al. A hierarchical approach to large-scale speaker recognition[C]∥ Proceedings of Eurospeech95.Budapest:ISCA,1999:22032206.
[7] XIANG B, BERGER T. Efficient text-independent speaker verification with structural Gaussian mixture models and neural network[J]. IEEE Transaction on  Speech Audio Process, 2003, 11(5):447456.
[8] XIONG Z, ZHENG T, SONG Z, et al. Combining selection tree with observation reordering pruning for efficient speaker identification using GMM-UBM[C]∥ Proceedings of ICASSP 05.Philadelphia:IEEE, 2005:625628.
[9] XIONG Z, ZHENG T, SONG Z, et al. Tree-structure universal background model-based efficient speaker identification[J]. Journal of Tsinghua University :Science and Technology, 2006,46(7):13051308.
[10] GOLDBERGER J, ROWEIS S. Hierarchical clustering of a mixture model[C]∥Neural Information Processing Systems. Canada:NIPS, 2004:464471.
[11] NIST 2001 Speaker ID Evaluation protocol[EB/OL].[2001-3-14]. Http:∥www.nist.gov/speech/tests/spk/2001.
[12] HERMANSKY H, MORGAN N, BAYYA A. RASTA-PLP speech analysis technique[C]∥ Proceedings of ICASSP92. San Francisco:IEEE, 1992:121124.
[13] AUCHENTHALER R, CAREY H, LLOYD H. Score normalization for text-independent speaker verification systems[J]. Digital Signal Processing, 2000,10(3): 4254.

[1] 许秋儿, 欧阳毅, 张三元, 张引. 基于均值骨架的网格变形复制[J]. J4, 2010, 44(4): 710-714.
[2] 蔡华辉, 王国瑾. 对数螺线段的多项式逼近与C-Bézier逼近[J]. J4, 2009, 43(6): 999-1004.
[3] 边柯柯, 王青, 李江雄, 等. 复杂自由曲面模型的局部协调设计技术[J]. J4, 2009, 43(6): 1118-1123.
[4] 黄鹏, 卜佳俊, 陈纯, 等. 利用加权特征模型改进问句分类[J]. J4, 2009, 43(6): 994-998.
[5] 楼斌, 沈海斌, 赵武锋, 等. 基于失真模型的结构相似度图像质量评价[J]. J4, 2009, 43(5): 864-868.
[6] 徐敬华, 张树有. 基于形态分布图与BP神经网络的三维模型检索方法[J]. J4, 2009, 43(5): 877-883.
[7] 朱平, 汪国昭. 变次数B-样条曲线[J]. J4, 2009, 43(5): 789-795.