Please wait a minute...
浙江大学学报(工学版)  2020, Vol. 54 Issue (6): 1170-1176    DOI: 10.3785/j.issn.1008-973X.2020.06.014
计算机技术     
后验概率图与补白模型二次融合的关键词识别
陈太波(),张翠芳*()
西南交通大学 信息科学与技术学院,四川 成都 611756
Keyword recognition based on twice fusion of Posteriorgram and filler model
Tai-bo CHEN(),Cui-fang ZHANG*()
School of Information Science and Technology, Southwest Jiaotong University, Chengdu 611756, China
 全文: PDF(894 KB)   HTML
摘要:

使用全连接神经网络结合Softmax分类器对汉语的408个音节建立音节分类器,利用等长处理后的特征向量训练Softmax分类器,将Softmax分类器输出概率作为后验概率图,与隐马尔科夫补白模型(HMM/Filler)进行第一次融合,得到子后验概率图隐马尔科夫模型(Posteriorgram-HMM). 针对关键词训练样本较少的问题,将标注样本进行强制切分,得到HMM每个状态上的训练数据. 将隐马尔科夫最大后验概率基线模型(HMM-MAP)与Posteriorgram-HMM进行第二次融合,提出最大后验概率图隐马尔科夫模型(Posteriorgram-HMM-MAP). 在数据集上训练模型后,使用测试数据对其进行测试. 结果表明:Posteriorgram-HMM-MAP的综合识别率相比Posteriorgram-HMM提升了3.55%,相比HMM/Filler提升了10.29%.

关键词: 关键词识别隐马尔可夫模型(HMM)补白模型Softmax分类器后验概率图最大后验概率(MAP)    
Abstract:

A fully-connected neural network, combined with Softmax classifier, was used to build a syllable classifier for 408 syllables in Chinese based on hidden Markov filler model (HMM/Filler). With the equal-length processing of the input feature vector of network, the output probability of the Softmax classifier was used as a Posteriorgram, to make first fusion with HMM/Filler model for the Posteriorgram hidden Markov model (Posteriorgram-HMM). Aiming at the problem of less keyword training samples, the Force-Align was used to obtain the training data for each state of HMM. Make second fusion of Maximum a posteriori estimation HMM (HMM-MAP) with Posteriorgram-HMM, and the Posteriorgram hidden Markov model (Posteriorgram-HMM-MAP) was obtained. After being trained on data set, the model was tested with test data. Results show that the comprehensive accuracy of the Posteriorgram-HMM-MAP was increased by 3.55% compared with Posteriorgram-HMM, and 10.29% higher than HMM/Filler.

Key words: keyword recognition    hidden Markov model (HMM)    filler model    Softmax classifier    Posteriorgram    maximum a posteriori estimation (MAP)
收稿日期: 2019-05-15 出版日期: 2020-07-06
CLC:  TP 391.4  
基金资助: 国家自然科学基金资助项目(61503059);四川省科技计划资助项目(2018GZ0008)
通讯作者: 张翠芳     E-mail: booookchen@outlook.com;cfzhang_scce@swjtu.cn
作者简介: 陈太波(1995—),男,硕士生,从事语音识别研究. orcid.org/0000-0002-6646-3302. E-mail: booookchen@outlook.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
陈太波
张翠芳

引用本文:

陈太波,张翠芳. 后验概率图与补白模型二次融合的关键词识别[J]. 浙江大学学报(工学版), 2020, 54(6): 1170-1176.

Tai-bo CHEN,Cui-fang ZHANG. Keyword recognition based on twice fusion of Posteriorgram and filler model. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1170-1176.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.06.014        http://www.zjujournals.com/eng/CN/Y2020/V54/I6/1170

图 1  音节等长分段示意图
编号 关键词 音节 Ntrain Ntest
kw_1 爱国 ai guo 51 12
kw_2 爱情 ai qing 42 11
kw_3 安全 an quan 30 8
kw_4 八十 ba shi 62 14
kw_5 白天 bai tian 56 17
kw_6 彬彬有礼 bin bin you li 18 4
kw_7 采访 cai fang 70 21
kw_8 藏起来 cang qi lai 46 10
kw_9 差别 cha bie 67 19
kw_10 长江 chang jiang 85 24
kw_11 成功 cheng gong 97 29
kw_12 吃饭 chi fan 175 49
kw_13 大多数 da duo shu 95 24
kw_14 大夫 dai fu 39 10
kw_15 大海 dai hai 134 31
kw_16 单词 dan ci 63 14
kw_17 返回 fan hui 98 19
kw_18 犯罪分子 fan zui fen zi 55 14
kw_19 父母亲 fu mu qin 93 32
kw_20 改变 gai bian 183 46
kw_21 科学 ke xue 86 18
kw_22 老师 lao shi 112 26
kw_23 乱七八糟 luan qi ba zao 63 17
kw_24 每天 mei tian 152 30
kw_25 平均 ping jun 161 30
kw_26 四川省 si chuan sheng 43 9
kw_27 兴趣 xing qu 76 11
kw_28 眼睛 yan jing 62 15
kw_29 找不到 zhao bu dao 95 23
表 1  实验选取的关键词列表
图 2  408个音节的Softmax分类器模型示意图
图 3  Posteriorgram-HMM结构框图
图 4  融合参数 λ 对综合识别率影响的拟合曲线
图 5  HMM-MAP的训练流程
图 6  融合参数 β 对综合识别率影响的拟合曲线
图 7  Posteriorgram-HMM-MAP框图
编号 Posteriorgram-HMM Posteriorgram-HMM-MAP
NT ψ /% NT ψ /%
kw_1 9 84.31 11 91.67
kw_2 8 78.57 10 90.90
kw_3 6 75.00 6 75.00
kw_4 12 85.71 12 85.71
kw_5 14 82.14 15 88.24
kw_6 4 100.00 4 100.00
kw_7 18 85.71 18 85.71
kw_8 9 90.00 10 100.00
kw_9 15 78.95 15 78.95
kw_10 20 83.33 22 91.67
kw_11 23 79.31 24 82.76
kw_12 44 89.79 42 85.71
kw_13 20 84.21 21 87.50
kw_14 8 80.00 10 100.00
kw_15 28 90.30 28 90.32
kw_16 11 78.57 13 92.86
kw_17 15 78.95 16 84.21
kw_18 12 90.90 13 92.86
kw_19 26 81.25 29 90.62
kw_20 43 94.54 41 89.13
kw_21 14 77.78 16 88.89
kw_22 22 84.62 23 88.46
kw_23 14 82.35 16 94.12
kw_24 26 86.67 25 83.33
kw_25 25 83.33 25 83.33
kw_26 8 90.69 9 100.00
kw_27 9 81.81 9 81.81
kw_28 12 80.00 13 86.67
kw_29 19 82.61 20 86.96
表 2  不同模型的关键词识别率对比
1 孙成立. 语音关键词识别技术的研究[D]. 北京: 北京邮电大学, 2008.
SUN Cheng-li. A study of speech keyword recognition technology [D]. Beijing: Beijing University of Posts and Telecommunications, 2008.
2 侯靖勇, 谢磊, 杨鹏, 等 基于DTW的语音关键词检出[J]. 清华大学学报: 自然科学版, 2017, 57 (1): 18- 23
HOU Jing-yong, XlE Lei, YANG Peng, et al Spoken term detection based on DTW[J]. Journal of Tsinghua University: Science and Technology, 2017, 57 (1): 18- 23
3 汪鹏, 刘加, 刘润生 基于离散HMM的非特定人关键词提取语音识别系统[J]. 吉林大学学报: 理学版, 2003, 41 (3): 347- 351
WANG Peng, LIU Jia, LIU Run-sheng Discrete HMM based speaker independent keyword spotting speech recognition system[J]. Journal of Jilin University: Science Edition, 2003, 41 (3): 347- 351
4 TOSELLI A H, VIDAL E. Fast HMM-Filler approach for key word spotting in handwritten documents [C] // Proceedings of 2013 12th International Conference on IEEE. NewYork: IEEE, 2013: 341-358.
5 LIN C Y, HOVY E. Automatic evaluation of summaries using n-gram co-occurrence statistics [C] // Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-volume. Association for Computational Linguistics, 2003.
6 孙彦楠, 夏秀渝 基于深度神经网络的关键词识别系统[J]. 计算机系统应用, 2018, 27 (5): 41- 48
SUN Yan-nan, XIA Xiu-yu Keyword recognition system based on deep neural network[J]. Computer Systems and Applications, 2018, 27 (5): 41- 48
7 GRAVES A, MOHAMED A, HINTON G. Speech recognition with deep recurrent neural networks [C] // 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
8 王满洪, 张二华, 王明合 基于双门限算法的端点检测改进研究[J]. 计算机与数字工程, 2017, 45 (11): 2223- 2228
WANG Man-hong, ZHANG Er-hua, WANG Ming-he Research and improvement on endpoint detection based on dual-threshold algorithm[J]. Computer and Digital Engineering, 2017, 45 (11): 2223- 2228
doi: 10.3969/j.issn.1672-9722.2017.11.030
9 邵明强, 徐志京 基于改进MFCC特征的语音识别算法[J]. 微型机与应用, 2017, 21 (1): 52- 54
SHAO Ming-qiang, XU Zhi-jing A speech recognition algorithm based on improved MFCC[J]. Microcomputer and Its Applications, 2017, 21 (1): 52- 54
10 SHAHNAWAZUDDIN S, SINHA R, PRADHAN G Pitch-normalized acoustic features for robust children ’s speech recognition[J]. IEEE Signal Processing Letters, 2017, 24 (8): 1128- 1132
doi: 10.1109/LSP.2017.2705085
11 花静. 基于HMM/SVM混合架构的连续语音识别模型的研究[D]. 哈尔滨: 哈尔滨工业大学, 2006.
HUA Jing. Research on continuous speech recognition based on a hybrid HMM/SVM framework[D]. Harbin: Harbin Institute of Technology, 2006.
12 JIANG M, LIANG Y, FENG X, et al Text classification based on deep belief network and softmax regression[J]. Neural Computing and Applications, 2016, (7): 1- 10
13 NEYSHABUR B, SALAKHUTDINOV R R, SREBRO N. Path-SGD: path-normalized optimization in deep neural networks [C] // Advances in Neural Information Processing Systems 28. Montreal: [s.n.], 2015: 2422-2430.
14 陈洁群 基于Viterbi改进算法的汉语数码语音识别模型[J]. 微型机与应用, 2017, 36 (14): 11- 13
CHEN Jie-qun The research on improved Viterbi algorithm for Chinese digital speech recognition system[J]. Microcomputer and its Applications, 2017, 36 (14): 11- 13
15 GAUVAIN J L, LEE C H Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains[J]. IEEE Transactions on Speech and Audio Processing, 1994, 2 (2): 291- 298
doi: 10.1109/89.279278
[1] 宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[2] 郑英杰,吴松荣,韦若禹,涂振威,廖进,刘东. 基于目标图像FCM算法的地铁定位点匹配及误报排除方法[J]. 浙江大学学报(工学版), 2021, 55(3): 586-593.
[3] 晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.
[4] 冯毅雄,李康杰,高一聪,郑浩. 面向视觉伺服的工业机器人轮廓曲线角点识别[J]. 浙江大学学报(工学版), 2020, 54(8): 1449-1456.
[5] 沈宗礼,余建波. 基于迁移学习与深度森林的晶圆图缺陷识别[J]. 浙江大学学报(工学版), 2020, 54(6): 1228-1239.
[6] 杨冰,莫文博,姚金良. 融合局部特征与深度学习的三维掌纹识别[J]. 浙江大学学报(工学版), 2020, 54(3): 540-545.
[7] 周金海,王依川,佟京鲆,周世镒,吴翔飞. 基于慢时间分割的超宽带雷达步态识别[J]. 浙江大学学报(工学版), 2020, 54(2): 283-290.
[8] 扶建辉,王进,陆国栋,JUNGYoong-ho. 基于体素的汽车装配体漏水缝隙识别与可视化[J]. 浙江大学学报(工学版), 2020, 54(2): 357-364.
[9] 许爱东,黄文琦,明哲,陈伟亮,胡浩基,杨航. 基于级联网络和残差特征的人脸特征点定位[J]. 浙江大学学报(工学版), 2019, 53(12): 2365-2371.
[10] 尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[11] 董巍, 沈会良. 双向纹理函数稀疏采集与重建[J]. 浙江大学学报(工学版), 2016, 50(12): 2364-2370.
[12] 郭浩东, 陈岭, 丁永锋, 陈根才. 运动识别中基于主题的特征构建方法[J]. 浙江大学学报(工学版), 2016, 50(6): 1149-1154.
[13] 唐有宝, 卜巍, 邬向前. 多层次MSER自然场景文本检测[J]. 浙江大学学报(工学版), 2016, 50(6): 1134-1140.
[14] 沈延斌, 陈岭, 郭浩东, 陈根才. 基于深度学习的放置方式和位置无关运动识别[J]. 浙江大学学报(工学版), 2016, 50(6): 1141-1148.
[15] 陈实, 郑楷洪, 孙凌云, 李彦. 可穿戴计算设备中振动表情的设计与应用[J]. 浙江大学学报(工学版), 2015, 49(12): 2298-2304.