Keyword recognition based on twice fusion of Posteriorgram and filler model

doi:10.3785/j.issn.1008-973X.2020.06.014

Journal of ZheJiang University (Engineering Science)

2020, Vol. 54

Issue (6): 1170-1176 DOI: 10.3785/j.issn.1008-973X.2020.06.014

Computer Technology

Keyword recognition based on twice fusion of Posteriorgram and filler model

Tai-bo CHEN(

),Cui-fang ZHANG*(

)

School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China

Download:

HTML

PDF(894KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A fully-connected neural network, combined with Softmax classifier, was used to build a syllable classifier for 408 syllables in Chinese based on hidden Markov filler model (HMM/Filler). With the equal-length processing of the input feature vector of network, the output probability of the Softmax classifier was used as a Posteriorgram, to make first fusion with HMM/Filler model for the Posteriorgram hidden Markov model (Posteriorgram-HMM). Aiming at the problem of less keyword training samples, the Force-Align was used to obtain the training data for each state of HMM. Make second fusion of Maximum a posteriori estimation HMM (HMM-MAP) with Posteriorgram-HMM, and the Posteriorgram hidden Markov model (Posteriorgram-HMM-MAP) was obtained. After being trained on data set, the model was tested with test data. Results show that the comprehensive accuracy of the Posteriorgram-HMM-MAP was increased by 3.55% compared with Posteriorgram-HMM, and 10.29% higher than HMM/Filler.

Key words： keyword recognition hidden Markov model (HMM) filler model Softmax classifier Posteriorgram maximum a posteriori estimation (MAP)

Received: 15 May 2019 Published: 06 July 2020

CLC:

TP 391.4

Corresponding Authors: Cui-fang ZHANG E-mail: booookchen@outlook.com;cfzhang_scce@swjtu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Tai-bo CHEN
	Cui-fang ZHANG

Cite this article:

Tai-bo CHEN,Cui-fang ZHANG. Keyword recognition based on twice fusion of Posteriorgram and filler model. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1170-1176.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.06.014 OR http://www.zjujournals.com/eng/Y2020/V54/I6/1170

后验概率图与补白模型二次融合的关键词识别

使用全连接神经网络结合Softmax分类器对汉语的408个音节建立音节分类器，利用等长处理后的特征向量训练Softmax分类器，将Softmax分类器输出概率作为后验概率图，与隐马尔科夫补白模型（HMM/Filler）进行第一次融合，得到子后验概率图隐马尔科夫模型（Posteriorgram-HMM）. 针对关键词训练样本较少的问题，将标注样本进行强制切分，得到HMM每个状态上的训练数据. 将隐马尔科夫最大后验概率基线模型（HMM-MAP）与Posteriorgram-HMM进行第二次融合，提出最大后验概率图隐马尔科夫模型（Posteriorgram-HMM-MAP）. 在数据集上训练模型后，使用测试数据对其进行测试. 结果表明：Posteriorgram-HMM-MAP的综合识别率相比Posteriorgram-HMM提升了3.55%，相比HMM/Filler提升了10.29%.

关键词： 关键词识别, 隐马尔可夫模型（HMM）, 补白模型, Softmax分类器, 后验概率图, 最大后验概率（MAP）

Fig.1 Diagram of syllable equal segmentation

Tab.1 List of keywords selected in experiment

Fig.2 Diagram of Softmax classifier model for 408 syllables

Fig.3 Basic block diagram of Posteriorgram-HMM

Fig.4 Fitting curve for effect of fusion parameter λ on comprehensive recognition rate

Fig.5 Training process of HMM-MAP

Fig.6 Fitting curve for effect of fusion parameter β on comprehensive recognition rate

Fig.7 Block diagram of Posteriorgram-HMM-MAP

Tab.2 Comparison of keyword recognition rates of different models


[1]	孙成立. 语音关键词识别技术的研究[D]. 北京: 北京邮电大学, 2008. SUN Cheng-li. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008.

[2]	侯靖勇, 谢磊, 杨鹏, 等基于DTW的语音关键词检出[J]. 清华大学学报: 自然科学版, 2017, 57 (1): 18- 23 HOU Jing-yong, XlE Lei, YANG Peng, et al Spoken term detection based on DTW[J]. Journal of Tsinghua University: Science and Technology, 2017, 57 (1): 18- 23

[3]	汪鹏, 刘加, 刘润生基于离散HMM的非特定人关键词提取语音识别系统[J]. 吉林大学学报: 理学版, 2003, 41 (3): 347- 351 WANG Peng, LIU Jia, LIU Run-sheng Discrete HMM based speaker independent keyword spotting speech recognition system[J]. Journal of Jilin University: Science Edition, 2003, 41 (3): 347- 351

[4]	TOSELLI A H, VIDAL E. Fast HMM-Filler approach for key word spotting in handwritten documents [C] // Proceedings of 2013 12th International Conference on IEEE. NewYork: IEEE, 2013: 341-358.

[5]	LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics [C] // Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-volume. Association for Computational Linguistics, 2003.

[6]	孙彦楠, 夏秀渝基于深度神经网络的关键词识别系统[J]. 计算机系统应用, 2018, 27 (5): 41- 48 SUN Yan-nan, XIA Xiu-yu Keyword recognition system based on deep neural network[J]. Computer Systems and Applications, 2018, 27 (5): 41- 48

[7]	GRAVES A, MOHAMED A, HINTON G. Speech recognition with deep recurrent neural networks [C] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.

[8]	王满洪, 张二华, 王明合基于双门限算法的端点检测改进研究[J]. 计算机与数字工程, 2017, 45 (11): 2223- 2228 WANG Man-hong, ZHANG Er-hua, WANG Ming-he Research and improvement on endpoint detection based on dual-threshold algorithm[J]. Computer and Digital Engineering, 2017, 45 (11): 2223- 2228 doi: 10.3969/j.issn.1672-9722.2017.11.030

[9]	邵明强, 徐志京基于改进MFCC特征的语音识别算法[J]. 微型机与应用, 2017, 21 (1): 52- 54 SHAO Ming-qiang, XU Zhi-jing A speech recognition algorithm based on improved MFCC[J]. Microcomputer and Its Applications, 2017, 21 (1): 52- 54

[10]	SHAHNAWAZUDDIN S, SINHA R, PRADHAN G Pitch-normalized acoustic features for robust children ’s speech recognition[J]. IEEE Signal Processing Letters, 2017, 24 (8): 1128- 1132 doi: 10.1109/LSP.2017.2705085

[11]	花静. 基于HMM/SVM混合架构的连续语音识别模型的研究[D]. 哈尔滨: 哈尔滨工业大学, 2006. HUA Jing. Research on continuous speech recognition based on a hybrid HMM/SVM framework[D]. Harbin: Harbin Institute of Technology, 2006.

[12]	JIANG M, LIANG Y, FENG X, et al Text classification based on deep belief network and softmax regression[J]. Neural Computing and Applications, 2016, (7): 1- 10

[13]	NEYSHABUR B, SALAKHUTDINOV R R, SREBRO N. Path-SGD: path-normalized optimization in deep neural networks [C] // Advances in Neural Information Processing Systems 28. Montreal: [s.n.], 2015: 2422-2430.

[14]	陈洁群基于Viterbi改进算法的汉语数码语音识别模型[J]. 微型机与应用, 2017, 36 (14): 11- 13 CHEN Jie-qun The research on improved Viterbi algorithm for Chinese digital speech recognition system[J]. Microcomputer and its Applications, 2017, 36 (14): 11- 13

[15]	GAUVAIN J L, LEE C H Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains[J]. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291- 298 doi: 10.1109/89.279278

[1]	WANG Hua, YING Jing, JIANG Chao. Proactive self-adaptation of software based on inspecting uncertainty[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(2): 213-219.

Viewed

Full text

Abstract

Cited

Shared

Discussed