Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (6): 1170-1176    DOI: 10.3785/j.issn.1008-973X.2020.06.014
Computer Technology     
Keyword recognition based on twice fusion of Posteriorgram and filler model
Tai-bo CHEN(),Cui-fang ZHANG*()
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
Download: HTML     PDF(894KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A fully-connected neural network, combined with Softmax classifier, was used to build a syllable classifier for 408 syllables in Chinese based on hidden Markov filler model (HMM/Filler). With the equal-length processing of the input feature vector of network, the output probability of the Softmax classifier was used as a Posteriorgram, to make first fusion with HMM/Filler model for the Posteriorgram hidden Markov model (Posteriorgram-HMM). Aiming at the problem of less keyword training samples, the Force-Align was used to obtain the training data for each state of HMM. Make second fusion of Maximum a posteriori estimation HMM (HMM-MAP) with Posteriorgram-HMM, and the Posteriorgram hidden Markov model (Posteriorgram-HMM-MAP) was obtained. After being trained on data set, the model was tested with test data. Results show that the comprehensive accuracy of the Posteriorgram-HMM-MAP was increased by 3.55% compared with Posteriorgram-HMM, and 10.29% higher than HMM/Filler.



Key wordskeyword recognition      hidden Markov model (HMM)      filler model      Softmax classifier      Posteriorgram      maximum a posteriori estimation (MAP)     
Received: 15 May 2019      Published: 06 July 2020
CLC:  TP 391.4  
Corresponding Authors: Cui-fang ZHANG     E-mail: booookchen@outlook.com;cfzhang_scce@swjtu.cn
Cite this article:

Tai-bo CHEN,Cui-fang ZHANG. Keyword recognition based on twice fusion of Posteriorgram and filler model. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1170-1176.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.06.014     OR     http://www.zjujournals.com/eng/Y2020/V54/I6/1170


后验概率图与补白模型二次融合的关键词识别

使用全连接神经网络结合Softmax分类器对汉语的408个音节建立音节分类器,利用等长处理后的特征向量训练Softmax分类器,将Softmax分类器输出概率作为后验概率图,与隐马尔科夫补白模型(HMM/Filler)进行第一次融合,得到子后验概率图隐马尔科夫模型(Posteriorgram-HMM). 针对关键词训练样本较少的问题,将标注样本进行强制切分,得到HMM每个状态上的训练数据. 将隐马尔科夫最大后验概率基线模型(HMM-MAP)与Posteriorgram-HMM进行第二次融合,提出最大后验概率图隐马尔科夫模型(Posteriorgram-HMM-MAP). 在数据集上训练模型后,使用测试数据对其进行测试. 结果表明:Posteriorgram-HMM-MAP的综合识别率相比Posteriorgram-HMM提升了3.55%,相比HMM/Filler提升了10.29%.


关键词: 关键词识别,  隐马尔可夫模型(HMM),  补白模型,  Softmax分类器,  后验概率图,  最大后验概率(MAP) 
Fig.1 Diagram of syllable equal segmentation
编号 关键词 音节 Ntrain Ntest
kw_1 爱国 ai guo 51 12
kw_2 爱情 ai qing 42 11
kw_3 安全 an quan 30 8
kw_4 八十 ba shi 62 14
kw_5 白天 bai tian 56 17
kw_6 彬彬有礼 bin bin you li 18 4
kw_7 采访 cai fang 70 21
kw_8 藏起来 cang qi lai 46 10
kw_9 差别 cha bie 67 19
kw_10 长江 chang jiang 85 24
kw_11 成功 cheng gong 97 29
kw_12 吃饭 chi fan 175 49
kw_13 大多数 da duo shu 95 24
kw_14 大夫 dai fu 39 10
kw_15 大海 dai hai 134 31
kw_16 单词 dan ci 63 14
kw_17 返回 fan hui 98 19
kw_18 犯罪分子 fan zui fen zi 55 14
kw_19 父母亲 fu mu qin 93 32
kw_20 改变 gai bian 183 46
kw_21 科学 ke xue 86 18
kw_22 老师 lao shi 112 26
kw_23 乱七八糟 luan qi ba zao 63 17
kw_24 每天 mei tian 152 30
kw_25 平均 ping jun 161 30
kw_26 四川省 si chuan sheng 43 9
kw_27 兴趣 xing qu 76 11
kw_28 眼睛 yan jing 62 15
kw_29 找不到 zhao bu dao 95 23
Tab.1 List of keywords selected in experiment
Fig.2 Diagram of Softmax classifier model for 408 syllables
Fig.3 Basic block diagram of Posteriorgram-HMM
Fig.4 Fitting curve for effect of fusion parameter λ on comprehensive recognition rate
Fig.5 Training process of HMM-MAP
Fig.6 Fitting curve for effect of fusion parameter β on comprehensive recognition rate
Fig.7 Block diagram of Posteriorgram-HMM-MAP
编号 Posteriorgram-HMM Posteriorgram-HMM-MAP
NT ψ /% NT ψ /%
kw_1 9 84.31 11 91.67
kw_2 8 78.57 10 90.90
kw_3 6 75.00 6 75.00
kw_4 12 85.71 12 85.71
kw_5 14 82.14 15 88.24
kw_6 4 100.00 4 100.00
kw_7 18 85.71 18 85.71
kw_8 9 90.00 10 100.00
kw_9 15 78.95 15 78.95
kw_10 20 83.33 22 91.67
kw_11 23 79.31 24 82.76
kw_12 44 89.79 42 85.71
kw_13 20 84.21 21 87.50
kw_14 8 80.00 10 100.00
kw_15 28 90.30 28 90.32
kw_16 11 78.57 13 92.86
kw_17 15 78.95 16 84.21
kw_18 12 90.90 13 92.86
kw_19 26 81.25 29 90.62
kw_20 43 94.54 41 89.13
kw_21 14 77.78 16 88.89
kw_22 22 84.62 23 88.46
kw_23 14 82.35 16 94.12
kw_24 26 86.67 25 83.33
kw_25 25 83.33 25 83.33
kw_26 8 90.69 9 100.00
kw_27 9 81.81 9 81.81
kw_28 12 80.00 13 86.67
kw_29 19 82.61 20 86.96
Tab.2 Comparison of keyword recognition rates of different models
[1]   孙成立. 语音关键词识别技术的研究[D]. 北京: 北京邮电大学, 2008.
SUN Cheng-li. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008.
[2]   侯靖勇, 谢磊, 杨鹏, 等 基于DTW的语音关键词检出[J]. 清华大学学报: 自然科学版, 2017, 57 (1): 18- 23
HOU Jing-yong, XlE Lei, YANG Peng, et al Spoken term detection based on DTW[J]. Journal of Tsinghua University: Science and Technology, 2017, 57 (1): 18- 23
[3]   汪鹏, 刘加, 刘润生 基于离散HMM的非特定人关键词提取语音识别系统[J]. 吉林大学学报: 理学版, 2003, 41 (3): 347- 351
WANG Peng, LIU Jia, LIU Run-sheng Discrete HMM based speaker independent keyword spotting speech recognition system[J]. Journal of Jilin University: Science Edition, 2003, 41 (3): 347- 351
[4]   TOSELLI A H, VIDAL E. Fast HMM-Filler approach for key word spotting in handwritten documents [C] // Proceedings of 2013 12th International Conference on IEEE. NewYork: IEEE, 2013: 341-358.
[5]   LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics [C] // Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-volume. Association for Computational Linguistics, 2003.
[6]   孙彦楠, 夏秀渝 基于深度神经网络的关键词识别系统[J]. 计算机系统应用, 2018, 27 (5): 41- 48
SUN Yan-nan, XIA Xiu-yu Keyword recognition system based on deep neural network[J]. Computer Systems and Applications, 2018, 27 (5): 41- 48
[7]   GRAVES A, MOHAMED A, HINTON G. Speech recognition with deep recurrent neural networks [C] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
[8]   王满洪, 张二华, 王明合 基于双门限算法的端点检测改进研究[J]. 计算机与数字工程, 2017, 45 (11): 2223- 2228
WANG Man-hong, ZHANG Er-hua, WANG Ming-he Research and improvement on endpoint detection based on dual-threshold algorithm[J]. Computer and Digital Engineering, 2017, 45 (11): 2223- 2228
doi: 10.3969/j.issn.1672-9722.2017.11.030
[9]   邵明强, 徐志京 基于改进MFCC特征的语音识别算法[J]. 微型机与应用, 2017, 21 (1): 52- 54
SHAO Ming-qiang, XU Zhi-jing A speech recognition algorithm based on improved MFCC[J]. Microcomputer and Its Applications, 2017, 21 (1): 52- 54
[10]   SHAHNAWAZUDDIN S, SINHA R, PRADHAN G Pitch-normalized acoustic features for robust children ’s speech recognition[J]. IEEE Signal Processing Letters, 2017, 24 (8): 1128- 1132
doi: 10.1109/LSP.2017.2705085
[11]   花静. 基于HMM/SVM混合架构的连续语音识别模型的研究[D]. 哈尔滨: 哈尔滨工业大学, 2006.
HUA Jing. Research on continuous speech recognition based on a hybrid HMM/SVM framework[D]. Harbin: Harbin Institute of Technology, 2006.
[12]   JIANG M, LIANG Y, FENG X, et al Text classification based on deep belief network and softmax regression[J]. Neural Computing and Applications, 2016, (7): 1- 10
[13]   NEYSHABUR B, SALAKHUTDINOV R R, SREBRO N. Path-SGD: path-normalized optimization in deep neural networks [C] // Advances in Neural Information Processing Systems 28. Montreal: [s.n.], 2015: 2422-2430.
[14]   陈洁群 基于Viterbi改进算法的汉语数码语音识别模型[J]. 微型机与应用, 2017, 36 (14): 11- 13
CHEN Jie-qun The research on improved Viterbi algorithm for Chinese digital speech recognition system[J]. Microcomputer and its Applications, 2017, 36 (14): 11- 13
[15]   GAUVAIN J L, LEE C H Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains[J]. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291- 298
doi: 10.1109/89.279278
[1] WANG Hua, YING Jing, JIANG Chao. Proactive self-adaptation of software based on inspecting uncertainty[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(2): 213-219.