统计机器翻译中大规模特征的深度融合

doi:10.3785/j.issn.1008-973X.2017.01.006

浙江大学学报(工学版)

自动化技术

统计机器翻译中大规模特征的深度融合

刘宇鹏, 乔秀明, 赵石磊, 马春光

1. 哈尔滨工程大学计算机科学与技术学院, 黑龙江哈尔滨 150001
2. 哈尔滨理工大学软件学院, 黑龙江哈尔滨 150001
3. 哈尔滨工业大学计算机学院, 黑龙江哈尔滨 150001

Deep combination of large-scale features in statistical machine translation

LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang

1. School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
2. Software School, Harbin University of Science and Technology, Harbin 150001, China;
3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China

全文: PDF(1355 KB) HTML

摘要：

对循环神经网络和递归神经网络进行改进,提出深度融合的神经网络(DNN)模型,在训练过程中加入大规模特征.该模型有很强的泛化能力,适合于现在主流的自底向上解码样式,融合了2种经典的机器翻译模型：基于短语的层次化文法（HPG）和括号转录文法（BTG）.使用改进的循环神经网络,生成适合短语生成过程的短语/规则对语义向量,并在生成过程中使用了自编码器以提高循环神经网络的性能.使用改进的递归神经网络,使它在翻译过程中指导解码,考虑到另一个解码器在解码过程中的信息,互相影响共同提高翻译性能.提出的深度融合模型不仅适合于异类翻译系统,也适合于异类语料.相对于经典的基线系统,在异类系统上该模型的实验结果获得1.0～1.9倍的BLEU分数提高,在异类语料上该模型的实验结果获得1.05～1.58的BLEU分数提高,且进行了统计显著性检验.

Abstract:

Deep neural network (DNN) has many successful applications in statistical machine translation (SMT), and the absent semantic problem of machine translation system was solved. The mainstream recurrent neural network (RTNN) and recursive neural network (RENN) model were modified, and a deep neural network combination (DCNN) of large-scale features for system combination in SMT was presented. The model has strong generalization ability, which is suitable for the current mainstream bottom-up decoding style. Hierarchical phrase-based grammar (HPG) was combined with bracket transduction grammar (BTG). The improved recurrent neural network was used to generate the phrase-pair semantic vector which is suitable to phrase generation process, and the autoencoder was used to improve the performance of the recurrent neural network. The improved recursive neural network was used to guide the decoding process in SMT task, and the mutual influence information was considered from another decoder. The deep neural translation combination model is suitable not only for heterogeneous system, but also for heterogeneous corpus. The experimental results showed that DCNN significantly improved the performance of a state-of-the-art SMT baseline system, leading to a gain of 1.0-1.9 and1.05-1.58 BLEU points in heterogeneous system and corpus combination, respectively.

出版日期: 2017-01-01

CLC:

TP 391

基金资助:

国家自然科学青年基金资助项目（61300115）；中国博士后科学基金资助项目（2014M561331）.

作者简介: 刘宇鹏（1978—），男，博士，副教授，从事自然语言处理、机器翻译和依存句法分析的研究. ORCID：0000-0003-3089-2129. E-mail:flyeagle99@126.com

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	作者相关文章

引用本文:

刘宇鹏, 乔秀明, 赵石磊, 马春光. 统计机器翻译中大规模特征的深度融合[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2017.01.006.

LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2017.01.006.

［1］ ONISHI K H, MURPHY G L, BOCK K. Prototypicality in sentence production ［J］. Cognitive Psychology, 2008, 56(2): 103-141.
［2］ YOO H J. Deep convolution neural networks in computer vision: a review ［J］. IEEE Transactions on Smart Processing and Computing, 2015, 4(1): 35-43.
［3］ SAINATH T N, KINGSBUR B, SAON G. Deep convolutional neural networks for largescale speech tasks ［J］. Neural Networks, 2015, 64: 39-48.
［4］ LE Q, MIKOLOV T. Distributed representations of sentences and documents ［C］∥Proceedings of the 31st International Conference on Machine Learning. Beijing: ［s. n.］, 2014: 1188-1196.
［5］ MIKOLOV T, KARAFIAT M, BURGET L. Recurrent neural network based language model ［C］∥ Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH）. Makuhari: ［s. n.］, 2010: 1045-1048.
［6］ SOCHER R, LIN C C, MANNING C. Parsing natural scenes and natural language with recursive neural networks ［C］∥Proceedings of ICML. Bellevue: ［s. n.］, 2011: 129136.
［7］ ROSTI A I, AYAN N F, XIANG B. Combing outputs from multiple machine translation systems ［C］∥HLTNAACL. Rochester: ［s. n.］, 2007: 228-235.
［8］ DENERO J, KUAMR S, CHELBA C. Model combination for machine translation ［C］∥ Proceedings of NAACL. Los Angeles: ［s. n.］, 2010: 975-983.
［9］ LIANG P,BOUCHART A, KLEIN D. An endtoend discriminative approach to machine translation ［C］∥Proceedings of ACL. Sydney: ACL, 2006: 277-285.
［10］ TILLMANN C, ZHANG T. A discriminative global training algorithm for statistical MT ［C］∥ Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics. Sydney: ［s. n.］, 2006:721-728.
［11］ TARO W, JUN S, HAJIME T. Online largemargin training for statistical machine translation ［C］∥Proceedings of EMNLPCoNLL. Prague: ［s. n.］, 2007: 764-773.
［12］ CHIANG D. Hope and fear for discriminative training of statistical translation models ［J］. Journal of Machine Learning Research, 2012(13): 1159-1187.
［13］ SPENCE G, WANG S, CER D. Fast and adaptive online training of featurerich translation models ［C］∥Proceedings of ACL. Sofia: ACL, 2013: 311-321.
［14］ HOPKINS M, MAY J. Tuning as ranking ［C］∥ Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Edinburgh: ［s. n.］, 2011: 1352-1362.
［15］ CHO K, VAN B, GULCEHRE C. Learning phrase representations using RNN encoderdecoder for statistical machine translation ［C］∥Proceedings of the Empirical Methods in Natural Language Processing. Doha: ［s. n.］, 2014.
［16］ BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate ［C］∥Proceedings of ICLR. San Diego: ［s. n.］, 2015: 115.
［17］ AULI M, GALLEY M, QUIRK C. Joint language and translation modeling with recurrent neural networks ［C］∥Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Seattle: ［s. n.］, 2013: 1044-1054.
［18］ ZHAI F, ZHANG J, ZHOU Y. RNNbased derivation structure prediction for SMT ［C］∥ Proceedings of ACL. Baltimore: ACL, 2014: 779-784.
［19］ ZHANG J, LIU S, LI M. Bilinguallyconstrained phrase embeddings for machine translation ［C］∥Proceedings of ACL. Baltimore: ACL, 2014: 111-121.
［20］ LI P, LIU Y, SUN M. Recursive autoencoders for ITGbased translation ［C］∥Proceedings of EMNLP. Seattle: ［s. n.］, 2013: 567-577.
［21］ ZHANG J, LIU S, LI M, et al. Bilinguallyconstrained phrase embeddings for machine translation ［C］∥Proceedings of ACL. Baltimore: ACL, 2014: 111-121.
［22］ SU J, XIONG D, ZHANG B. Bilingual correspondence recursive autoencoder for statistical machine translation ［C］∥Proceedings of EMNLP. Lisbon: ［s. n.］, 2015: 1248-1258.
［23］ LIU S, YANG N, LI M. A recursive recurrent neural network for statistical machine translation ［C］∥ Proceedings of ACL. Baltimore: ACL, 2014: 1491-1500.
［24］ ZHANG J, ZHANG D, HAO J. Local translation prediction with global sentence representation ［C］∥ Proceedings of IJCAI. Buenos Aires: ［s. n.］, 2015:1398-1404.
［25］ MENG F, LU Z, WANG M. Encoding source language with convolutional neural network for machine translation ［C］∥Proceedings of the 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACLIJCNLP′15). Beijing: ［s. n.］, 2015: 2030.
［26］ HU B, TU Z, LU Z. Contextdependent translation selection using convolutional neural network ［C］∥Proceedings of the 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACLIJCNLP′15). Beijing: ［s. n.］, 2015: 536-541.
［27］ MIKOLOV T, CHEN K, CORRADO G. Efficient estimation of word representations in vector space ［C］∥ICLR 2013. Scottsdale: ［s. n.］, 2013:112.
［28］ CHIANG D. Hierarchical phrasebased translation ［J］. Computational Linguistics, 2007, 33(2):201-228.
［29］ SOCHER R, PENNINGTON J, MANNING E. Semisupervised recursive autoencoders for predicting sentiment distributions ［C］∥Conference on Empirical Methods in Natural Language Processing. Edinburgh: ［s. n.］, 2011: 151-161.
［30］ BODEN M. A guide to recurrent neural networks and backpropagation ［R］. Holst: Application of Data Analysis with Learning Systems, 2001: 110.
［31］ ZHANG H, HUANG L, GILDEA D. Synchronous binarization for machine translation ［C］∥ Proceedings of the 2006 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL06). New York: ［s. n.］, 2006: 256-263.
［32］ JOERN W, ARNE M, HERMANN N. Training phrase translation models with leaving-one-out ［C］∥Proceeding of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala: ［s. n.］, 2010: 475-484.
［33］ ZHANG M, XIAO X, XIONG D. Topicbased dissimilarity and sensitivity models for translation rule selection ［J］. Journal of Artificial Intelligence Research, 2014(50): 130.
［34］ XIONG D, LIU Q, LIN S. Maximum entropy based phrase reordering model for statistical machine translation ［C］∥Proceeding of the 43rd Annual Meeting of the Association for Computational Linguistics. Sydney: ［s. n.］, 2006: 521-528.
［35］ STOCKLE A. SRILM: an extensible language modeling toolkit ［C］∥Proceeding of International Conference on Spoken Language Processing. Denver: ［s. n.］, 2002: 901-904.
［36］ PAPINENI K, ROUKOS S, WARD T. Bleu: a method for automatic evaluation of machine translation［C］∥Proceeding of the 40th Annual Meeting on Association for Computational Linguistics. Barcelona: ［s. n.］, 2002: 311-318.
［37］ KOEHN P. Statistical significance tests for machine translation evaluation ［C］∥Proceedings of EMNLP. Doha: ［s. n.］, 2004: 201-208.
［38］ BICICI E, YURET D. Instance selection for machine translation using feature decay algorithm ［C］∥Proceedings of the 6th Workshop on Statistical Machine Translation of Association for Computational Linguistics. Edinburgh: ［s. n.］, 2011: 272-283.

[1]	郑守国,张勇德,谢文添,樊虎,王青. 基于数字孪生的飞机总装生产线建模[J]. 浙江大学学报(工学版), 2021, 55(5): 843-854.
[2]	张师林,马思明,顾子谦. 基于大边距度量学习的车辆再识别方法[J]. 浙江大学学报(工学版), 2021, 55(5): 948-956.
[3]	宋鹏,杨德东,李畅,郭畅. 整体特征通道识别的自适应孪生网络跟踪算法[J]. 浙江大学学报(工学版), 2021, 55(5): 966-975.
[4]	蔡君,赵罡,于勇,鲍强伟,戴晟. 基于点云和设计模型的仿真模型快速重构方法[J]. 浙江大学学报(工学版), 2021, 55(5): 905-916.
[5]	王虹力,郭斌,刘思聪,刘佳琪,仵允港,於志文. 边端融合的终端情境自适应深度感知模型[J]. 浙江大学学报(工学版), 2021, 55(4): 626-638.
[6]	张腾,蒋鑫龙,陈益强,陈前,米涛免,陈彪. 基于腕部姿态的帕金森病用药后开-关期检测[J]. 浙江大学学报(工学版), 2021, 55(4): 639-647.
[7]	郑英杰,吴松荣,韦若禹,涂振威,廖进,刘东. 基于目标图像FCM算法的地铁定位点匹配及误报排除方法[J]. 浙江大学学报(工学版), 2021, 55(3): 586-593.
[8]	雍子叶,郭继昌,李重仪. 融入注意力机制的弱监督水下图像增强算法[J]. 浙江大学学报(工学版), 2021, 55(3): 555-562.
[9]	于勇,薛静远,戴晟,鲍强伟,赵罡. 机加零件质量预测与工艺参数优化方法[J]. 浙江大学学报(工学版), 2021, 55(3): 441-447.
[10]	胡惠雅,盖绍彦,达飞鹏. 基于生成对抗网络的偏转人脸转正[J]. 浙江大学学报(工学版), 2021, 55(1): 116-123.
[11]	陈杨波,伊国栋,张树有. 基于点云特征对比的曲面翘曲变形检测方法[J]. 浙江大学学报(工学版), 2021, 55(1): 81-88.
[12]	段有康,陈小刚,桂剑,马斌,李顺芬,宋志棠. 基于相位划分的下肢连续运动预测[J]. 浙江大学学报(工学版), 2021, 55(1): 89-95.
[13]	张太恒,梅标,乔磊,杨浩杰,朱伟东. 纹理边界引导的复合材料圆孔检测方法[J]. 浙江大学学报(工学版), 2020, 54(12): 2294-2300.
[14]	梁栋,刘昕宇,潘家兴,孙涵,周文俊,金子俊一. 动态背景下基于自更新像素共现的前景分割[J]. 浙江大学学报(工学版), 2020, 54(12): 2405-2413.
[15]	晋耀,张为. 采用Anchor-Free网络结构的实时火灾检测算法[J]. 浙江大学学报(工学版), 2020, 54(12): 2430-2436.

Viewed

Full text

Abstract

Cited

Shared

Discussed