Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
Automation technology     
Deep combination of large-scale features in statistical machine translation
LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang
1. School of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China;
2. Software School, Harbin University of Science and Technology, Harbin 150001, China;
3. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
Download:   PDF(1355KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Deep neural network (DNN) has many successful applications in statistical machine translation (SMT), and the absent semantic problem of machine translation system was solved. The mainstream recurrent neural network (RTNN) and recursive neural network (RENN) model were modified, and a deep neural network combination (DCNN) of large-scale features for system combination in SMT was presented. The model has strong generalization ability, which is suitable for the current mainstream bottom-up decoding style. Hierarchical phrase-based grammar (HPG) was combined with bracket transduction grammar (BTG). The improved recurrent neural network was used to generate the phrase-pair semantic vector which is suitable to phrase generation process, and the autoencoder was used to improve the performance of the recurrent neural network. The improved recursive neural network was used to guide the decoding process in SMT task, and the mutual influence information was considered from another decoder. The deep neural translation combination model is suitable not only for heterogeneous system, but also for heterogeneous corpus. The experimental results showed that DCNN significantly improved the performance of a state-of-the-art SMT baseline system, leading to a gain of 1.0-1.9 and1.05-1.58 BLEU points in heterogeneous system and corpus combination, respectively.



Published: 01 January 2017
CLC:  TP 391  
Cite this article:

LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.


统计机器翻译中大规模特征的深度融合

对循环神经网络和递归神经网络进行改进,提出深度融合的神经网络(DNN)模型,在训练过程中加入大规模特征.该模型有很强的泛化能力,适合于现在主流的自底向上解码样式,融合了2种经典的机器翻译模型:基于短语的层次化文法(HPG)和括号转录文法(BTG).使用改进的循环神经网络,生成适合短语生成过程的短语/规则对语义向量,并在生成过程中使用了自编码器以提高循环神经网络的性能.使用改进的递归神经网络,使它在翻译过程中指导解码,考虑到另一个解码器在解码过程中的信息,互相影响共同提高翻译性能.提出的深度融合模型不仅适合于异类翻译系统,也适合于异类语料.相对于经典的基线系统,在异类系统上该模型的实验结果获得1.0~1.9倍的BLEU分数提高,在异类语料上该模型的实验结果获得1.05~1.58的BLEU分数提高,且进行了统计显著性检验.

[1] ONISHI K H, MURPHY G L, BOCK K. Prototypicality in sentence production [J]. Cognitive Psychology, 2008, 56(2): 103-141.
[2] YOO H J. Deep convolution neural networks in computer vision: a review [J]. IEEE Transactions on Smart Processing and Computing, 2015, 4(1): 35-43.
[3] SAINATH T N, KINGSBUR B, SAON G. Deep convolutional neural networks for largescale speech tasks [J]. Neural Networks, 2015, 64: 39-48.
[4] LE Q, MIKOLOV T. Distributed representations of sentences and documents [C]∥Proceedings of the 31st International Conference on Machine Learning. Beijing: [s. n.], 2014: 1188-1196.
[5] MIKOLOV T, KARAFIAT M, BURGET L. Recurrent neural network based language model [C]∥  Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH). Makuhari: [s. n.], 2010: 1045-1048.
[6] SOCHER R, LIN C C, MANNING C. Parsing natural scenes and natural language with recursive neural networks [C]∥Proceedings of ICML. Bellevue: [s. n.], 2011: 129136.
[7] ROSTI A I, AYAN N F, XIANG B. Combing outputs from multiple machine translation systems [C]∥HLTNAACL. Rochester: [s. n.], 2007: 228-235.
[8] DENERO J, KUAMR S, CHELBA C. Model combination for machine translation [C]∥ Proceedings of NAACL. Los Angeles: [s. n.], 2010: 975-983.
[9] LIANG P,BOUCHART A, KLEIN D. An endtoend discriminative approach to machine translation [C]∥Proceedings of ACL. Sydney: ACL, 2006: 277-285.
[10] TILLMANN C, ZHANG T. A discriminative global training algorithm for statistical MT [C]∥ Proceedings of the 44th Annual Meeting of the Association for Computational Linguistics. Sydney: [s. n.], 2006:721-728.
[11] TARO W, JUN S, HAJIME T. Online largemargin training for statistical machine translation [C]∥Proceedings of EMNLPCoNLL. Prague: [s. n.], 2007: 764-773.
[12] CHIANG D. Hope and fear for discriminative training of statistical translation models [J]. Journal of Machine Learning Research, 2012(13): 1159-1187.
[13] SPENCE G, WANG S, CER D. Fast and adaptive online training of featurerich translation models [C]∥Proceedings of ACL. Sofia: ACL, 2013: 311-321.
[14] HOPKINS M, MAY J. Tuning as ranking [C]∥ Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. Edinburgh: [s. n.], 2011: 1352-1362.
[15] CHO K, VAN B, GULCEHRE C. Learning phrase representations using RNN encoderdecoder for statistical machine translation [C]∥Proceedings of the Empirical Methods in Natural Language Processing. Doha: [s. n.], 2014.
[16] BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translate [C]∥Proceedings of ICLR. San Diego: [s. n.], 2015: 115.
[17] AULI M, GALLEY M, QUIRK C. Joint language and translation modeling with recurrent neural networks [C]∥Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics. Seattle: [s. n.], 2013: 1044-1054.
[18] ZHAI F, ZHANG J, ZHOU Y. RNNbased derivation structure prediction for SMT [C]∥ Proceedings of ACL. Baltimore: ACL, 2014: 779-784.
[19] ZHANG J, LIU S, LI M. Bilinguallyconstrained phrase embeddings for machine translation [C]∥Proceedings of ACL. Baltimore: ACL, 2014: 111-121.
[20] LI P, LIU Y, SUN M. Recursive autoencoders for ITGbased translation [C]∥Proceedings of EMNLP. Seattle: [s. n.], 2013: 567-577.
[21] ZHANG J, LIU S, LI M, et al. Bilinguallyconstrained phrase embeddings for machine translation [C]∥Proceedings of ACL. Baltimore: ACL, 2014: 111-121.
[22] SU J, XIONG D, ZHANG B. Bilingual correspondence recursive autoencoder for statistical machine translation [C]∥Proceedings of EMNLP. Lisbon: [s. n.], 2015: 1248-1258.
[23] LIU S, YANG N, LI M. A recursive recurrent neural network for statistical machine translation [C]∥ Proceedings of ACL. Baltimore: ACL, 2014: 1491-1500.
[24] ZHANG J, ZHANG D, HAO J. Local translation prediction with global sentence representation [C]∥ Proceedings of IJCAI. Buenos Aires: [s. n.], 2015:1398-1404.
[25] MENG F, LU Z, WANG M. Encoding source language with convolutional neural network for machine translation [C]∥Proceedings of the 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACLIJCNLP′15). Beijing: [s. n.], 2015: 2030.
[26] HU B, TU Z, LU Z. Contextdependent translation selection using convolutional neural network [C]∥Proceedings of the 53th Annual Meeting of Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (ACLIJCNLP′15). Beijing: [s. n.], 2015: 536-541.
[27] MIKOLOV T, CHEN K, CORRADO G. Efficient estimation of word representations in vector space [C]∥ICLR 2013. Scottsdale: [s. n.], 2013:112.
[28] CHIANG D. Hierarchical phrasebased translation [J]. Computational Linguistics, 2007, 33(2):201-228.
[29] SOCHER R, PENNINGTON J, MANNING E. Semisupervised recursive autoencoders for predicting sentiment distributions [C]∥Conference on Empirical Methods in Natural Language Processing. Edinburgh: [s. n.], 2011: 151-161.
[30] BODEN M. A guide to recurrent neural networks and backpropagation [R]. Holst: Application of Data Analysis with Learning Systems, 2001: 110.
[31] ZHANG H, HUANG L, GILDEA D. Synchronous binarization for machine translation [C]∥ Proceedings of the 2006 Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL06). New York: [s. n.], 2006: 256-263.
[32] JOERN W, ARNE M, HERMANN N. Training phrase translation models with leaving-one-out [C]∥Proceeding of the 48th Annual Meeting of the Association for Computational Linguistics. Uppsala: [s. n.], 2010: 475-484.
[33] ZHANG M, XIAO X, XIONG D. Topicbased dissimilarity and sensitivity models for translation rule selection [J]. Journal of Artificial Intelligence Research, 2014(50): 130.
[34] XIONG D, LIU Q, LIN S. Maximum entropy based phrase reordering model for statistical machine translation [C]∥Proceeding of the 43rd Annual Meeting of the Association for Computational Linguistics. Sydney: [s. n.], 2006: 521-528.
[35] STOCKLE A. SRILM: an extensible language modeling toolkit [C]∥Proceeding of International Conference on Spoken Language Processing. Denver: [s. n.], 2002: 901-904.
[36] PAPINENI K, ROUKOS S, WARD T. Bleu: a method for automatic evaluation of machine translation[C]∥Proceeding of the 40th Annual Meeting on Association for Computational Linguistics. Barcelona: [s. n.], 2002: 311-318.
[37]  KOEHN P. Statistical significance tests for machine translation evaluation [C]∥Proceedings of EMNLP. Doha: [s. n.], 2004: 201-208.
[38] BICICI E, YURET D. Instance selection for machine translation using feature decay algorithm [C]∥Proceedings of the 6th Workshop on Statistical Machine Translation of Association for Computational Linguistics. Edinburgh: [s. n.], 2011: 272-283.

[1] Shou-guo ZHENG,Yong-de ZHANG,Wen-tian XIE,Hu FAN,Qing WANG. Aircraft final assembly line modeling based on digital twin[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(5): 843-854.
[2] Shi-lin ZHANG,Si-ming MA,Zi-qian GU. Large margin metric learning based vehicle re-identification method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(5): 948-956.
[3] Peng SONG,De-dong YANG,Chang LI,Chang GUO. An adaptive siamese network tracking algorithm based on global feature channel recognition[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(5): 966-975.
[4] Jun CAI,Gang ZHAO,Yong YU,Qiang-wei BAO,Sheng DAI. A rapid reconstruction method of simulation model based on point cloud and design model[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(5): 905-916.
[5] Hong-li WANG,Bin GUO,Si-cong LIU,Jia-qi LIU,Yun-gang WU,Zhi-wen YU. End context-adaptative deep sensing model with edge-end collaboration[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(4): 626-638.
[6] Teng ZHANG,Xin-long JIANG,Yi-qiang CHEN,Qian CHEN,Tao-mian MI,Piu CHAN. Wrist attitude-based Parkinson's disease ON/OFF state assessment after medication[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(4): 639-647.
[7] Ying-jie ZHENG,Song-rong WU,Ruo-yu WEI,Zhen-wei TU,Jin LIAO,Dong LIU. Metro location point matching and false alarm elimination based on FCM algorithm of target image[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(3): 586-593.
[8] Zi-ye YONG,Ji-chang GUO,Chong-yi LI. weakly supervised underwater image enhancement algorithm incorporating attention mechanism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(3): 555-562.
[9] Yong YU,Jing-yuan XUE,Sheng DAI,Qiang-wei BAO,Gang ZHAO. Quality prediction and process parameter optimization method for machining parts[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(3): 441-447.
[10] Hui-ya HU,Shao-yan GAI,Fei-peng DA. Face frontalization based on generative adversarial network[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(1): 116-123.
[11] Yang-bo CHEN,Guo-dong YI,Shu-you ZHANG. Surface warpage detection method based on point cloud feature comparison[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(1): 81-88.
[12] You-kang DUAN,Xiao-gang CHEN,Jian GUI,Bin MA,Shun-fen LI,Zhi-tang SONG. Continuous kinematics prediction of lower limbs based on phase division[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2021, 55(1): 89-95.
[13] Tai-heng ZHANG,Biao MEI,Lei QIAO,Hao-jie YANG,Wei-dong ZHU. Detection method for composite hole guided by texture boundary[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2020, 54(12): 2294-2300.
[14] Dong LIANG,Xin-yu LIU,Jia-xing PAN,Han SUN,Wen-jun ZHOU,Shun’ichi KANEKO. Foreground segmentation under dynamic background based on self-updating co-occurrence pixel[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2020, 54(12): 2405-2413.
[15] Yao JIN,Wei ZHANG. Real-time fire detection algorithm with Anchor-Free network architecture[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2020, 54(12): 2430-2436.