后验概率图与补白模型二次融合的关键词识别

doi:10.3785/j.issn.1008-973X.2020.06.014

浙江大学学报(工学版)

2020, Vol. 54

Issue (6): 1170-1176 DOI: 10.3785/j.issn.1008-973X.2020.06.014

计算机技术

后验概率图与补白模型二次融合的关键词识别

陈太波(

),张翠芳*(

)

西南交通大学信息科学与技术学院，四川成都 611756

Keyword recognition based on twice fusion of Posteriorgram and filler model

Tai-bo CHEN(

),Cui-fang ZHANG*(

)

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

全文: PDF(894 KB) HTML

摘要：

使用全连接神经网络结合Softmax分类器对汉语的408个音节建立音节分类器，利用等长处理后的特征向量训练Softmax分类器，将Softmax分类器输出概率作为后验概率图，与隐马尔科夫补白模型（HMM/Filler）进行第一次融合，得到子后验概率图隐马尔科夫模型（Posteriorgram-HMM）. 针对关键词训练样本较少的问题，将标注样本进行强制切分，得到HMM每个状态上的训练数据. 将隐马尔科夫最大后验概率基线模型（HMM-MAP）与Posteriorgram-HMM进行第二次融合，提出最大后验概率图隐马尔科夫模型（Posteriorgram-HMM-MAP）. 在数据集上训练模型后，使用测试数据对其进行测试. 结果表明：Posteriorgram-HMM-MAP的综合识别率相比Posteriorgram-HMM提升了3.55%，相比HMM/Filler提升了10.29%.

关键词： 关键词识别; 隐马尔可夫模型（HMM）; 补白模型; Softmax分类器; 后验概率图; 最大后验概率（MAP）

Abstract:

A fully-connected neural network, combined with Softmax classifier, was used to build a syllable classifier for 408 syllables in Chinese based on hidden Markov filler model (HMM/Filler). With the equal-length processing of the input feature vector of network, the output probability of the Softmax classifier was used as a Posteriorgram, to make first fusion with HMM/Filler model for the Posteriorgram hidden Markov model (Posteriorgram-HMM). Aiming at the problem of less keyword training samples, the Force-Align was used to obtain the training data for each state of HMM. Make second fusion of Maximum a posteriori estimation HMM (HMM-MAP) with Posteriorgram-HMM, and the Posteriorgram hidden Markov model (Posteriorgram-HMM-MAP) was obtained. After being trained on data set, the model was tested with test data. Results show that the comprehensive accuracy of the Posteriorgram-HMM-MAP was increased by 3.55% compared with Posteriorgram-HMM, and 10.29% higher than HMM/Filler.

Key words: keyword recognition hidden Markov model (HMM) filler model Softmax classifier Posteriorgram maximum a posteriori estimation (MAP)

收稿日期: 2019-05-15 出版日期: 2020-07-06

CLC:

TP 391.4

基金资助: 国家自然科学基金资助项目（61503059）；四川省科技计划资助项目（2018GZ0008）

通讯作者: 张翠芳 E-mail: booookchen@outlook.com;cfzhang_scce@swjtu.cn

作者简介: 陈太波（1995—），男，硕士生，从事语音识别研究. orcid.org/0000-0002-6646-3302. E-mail： booookchen@outlook.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章
	陈太波
	张翠芳

引用本文:

陈太波,张翠芳. 后验概率图与补白模型二次融合的关键词识别[J]. 浙江大学学报(工学版), 2020, 54(6): 1170-1176.

Tai-bo CHEN,Cui-fang ZHANG. Keyword recognition based on twice fusion of Posteriorgram and filler model. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1170-1176.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.06.014 或 http://www.zjujournals.com/eng/CN/Y2020/V54/I6/1170

图 1 音节等长分段示意图

表 1 实验选取的关键词列表

图 2 408个音节的Softmax分类器模型示意图

图 3 Posteriorgram-HMM结构框图

图 4 融合参数 λ 对综合识别率影响的拟合曲线

图 5 HMM-MAP的训练流程

图 6 融合参数 β 对综合识别率影响的拟合曲线

图 7 Posteriorgram-HMM-MAP框图

表 2 不同模型的关键词识别率对比

1	孙成立. 语音关键词识别技术的研究[D]. 北京: 北京邮电大学, 2008. SUN Cheng-li. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008.
2	侯靖勇, 谢磊, 杨鹏, 等基于DTW的语音关键词检出[J]. 清华大学学报: 自然科学版, 2017, 57 (1): 18- 23 HOU Jing-yong, XlE Lei, YANG Peng, et al Spoken term detection based on DTW[J]. Journal of Tsinghua University: Science and Technology, 2017, 57 (1): 18- 23
3	汪鹏, 刘加, 刘润生基于离散HMM的非特定人关键词提取语音识别系统[J]. 吉林大学学报: 理学版, 2003, 41 (3): 347- 351 WANG Peng, LIU Jia, LIU Run-sheng Discrete HMM based speaker independent keyword spotting speech recognition system[J]. Journal of Jilin University: Science Edition, 2003, 41 (3): 347- 351
4	TOSELLI A H, VIDAL E. Fast HMM-Filler approach for key word spotting in handwritten documents [C] // Proceedings of 2013 12th International Conference on IEEE. NewYork: IEEE, 2013: 341-358.
5	LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics [C] // Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-volume. Association for Computational Linguistics, 2003.
6	孙彦楠, 夏秀渝基于深度神经网络的关键词识别系统[J]. 计算机系统应用, 2018, 27 (5): 41- 48 SUN Yan-nan, XIA Xiu-yu Keyword recognition system based on deep neural network[J]. Computer Systems and Applications, 2018, 27 (5): 41- 48
7	GRAVES A, MOHAMED A, HINTON G. Speech recognition with deep recurrent neural networks [C] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
8	王满洪, 张二华, 王明合基于双门限算法的端点检测改进研究[J]. 计算机与数字工程, 2017, 45 (11): 2223- 2228 WANG Man-hong, ZHANG Er-hua, WANG Ming-he Research and improvement on endpoint detection based on dual-threshold algorithm[J]. Computer and Digital Engineering, 2017, 45 (11): 2223- 2228 doi: 10.3969/j.issn.1672-9722.2017.11.030
9	邵明强, 徐志京基于改进MFCC特征的语音识别算法[J]. 微型机与应用, 2017, 21 (1): 52- 54 SHAO Ming-qiang, XU Zhi-jing A speech recognition algorithm based on improved MFCC[J]. Microcomputer and Its Applications, 2017, 21 (1): 52- 54
10	SHAHNAWAZUDDIN S, SINHA R, PRADHAN G Pitch-normalized acoustic features for robust children ’s speech recognition[J]. IEEE Signal Processing Letters, 2017, 24 (8): 1128- 1132 doi: 10.1109/LSP.2017.2705085
11	花静. 基于HMM/SVM混合架构的连续语音识别模型的研究[D]. 哈尔滨: 哈尔滨工业大学, 2006. HUA Jing. Research on continuous speech recognition based on a hybrid HMM/SVM framework[D]. Harbin: Harbin Institute of Technology, 2006.
12	JIANG M, LIANG Y, FENG X, et al Text classification based on deep belief network and softmax regression[J]. Neural Computing and Applications, 2016, (7): 1- 10
13	NEYSHABUR B, SALAKHUTDINOV R R, SREBRO N. Path-SGD: path-normalized optimization in deep neural networks [C] // Advances in Neural Information Processing Systems 28. Montreal: [s.n.], 2015: 2422-2430.
14	陈洁群基于Viterbi改进算法的汉语数码语音识别模型[J]. 微型机与应用, 2017, 36 (14): 11- 13 CHEN Jie-qun The research on improved Viterbi algorithm for Chinese digital speech recognition system[J]. Microcomputer and its Applications, 2017, 36 (14): 11- 13
15	GAUVAIN J L, LEE C H Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains[J]. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291- 298 doi: 10.1109/89.279278

[1]	宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[2]	郑英杰,吴松荣,韦若禹,涂振威,廖进,刘东. 基于目标图像FCM算法的地铁定位点匹配及误报排除方法[J]. 浙江大学学报(工学版), 2021, 55(3): 586-593.
[3]	晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.
[4]	冯毅雄,李康杰,高一聪,郑浩. 面向视觉伺服的工业机器人轮廓曲线角点识别[J]. 浙江大学学报(工学版), 2020, 54(8): 1449-1456.
[5]	沈宗礼,余建波. 基于迁移学习与深度森林的晶圆图缺陷识别[J]. 浙江大学学报(工学版), 2020, 54(6): 1228-1239.
[6]	杨冰,莫文博,姚金良. 融合局部特征与深度学习的三维掌纹识别[J]. 浙江大学学报(工学版), 2020, 54(3): 540-545.
[7]	周金海,王依川,佟京鲆,周世镒,吴翔飞. 基于慢时间分割的超宽带雷达步态识别[J]. 浙江大学学报(工学版), 2020, 54(2): 283-290.
[8]	扶建辉,王进,陆国栋,JUNGYoong-ho. 基于体素的汽车装配体漏水缝隙识别与可视化[J]. 浙江大学学报(工学版), 2020, 54(2): 357-364.
[9]	许爱东,黄文琦,明哲,陈伟亮,胡浩基,杨航. 基于级联网络和残差特征的人脸特征点定位[J]. 浙江大学学报(工学版), 2019, 53(12): 2365-2371.
[10]	尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[11]	董巍, 沈会良. 双向纹理函数稀疏采集与重建[J]. 浙江大学学报(工学版), 2016, 50(12): 2364-2370.
[12]	郭浩东, 陈岭, 丁永锋, 陈根才. 运动识别中基于主题的特征构建方法[J]. 浙江大学学报(工学版), 2016, 50(6): 1149-1154.
[13]	唐有宝, 卜巍, 邬向前. 多层次MSER自然场景文本检测[J]. 浙江大学学报(工学版), 2016, 50(6): 1134-1140.
[14]	沈延斌, 陈岭, 郭浩东, 陈根才. 基于深度学习的放置方式和位置无关运动识别[J]. 浙江大学学报(工学版), 2016, 50(6): 1141-1148.
[15]	陈实, 郑楷洪, 孙凌云, 李彦. 可穿戴计算设备中振动表情的设计与应用[J]. 浙江大学学报(工学版), 2015, 49(12): 2298-2304.

Viewed

Full text

Abstract

Cited

Shared

Discussed