Improved statistical machine translation model with topic-based paraphrase" /> Improved statistical machine translation model with topic-based paraphrase" /> Improved statistical machine translation model with topic-based paraphrase" /> 引入基于主题复述知识的统计机器翻译模型
Please wait a minute...
JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE)
    
Improved statistical machine translation model with topic-based paraphrase
SU Jin-song1, DONG Huai-lin1, CHEN Yi-dong2, SHI Xiao-dong2, WU Qing-qiang1
1. School of Software, Xiamen University, Xiamen 361005, China; 2. Department of Cognitive Science, Xiamen University, Xiamen 361005, China
Download:   PDF(732KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

To deal with the defect of the conventional parallel corpus based paraphrase extraction method which neglects document-level context, the paraphrase extraction and its application in statistical machine translation were improved by introducing the context based on topic model. The problem that how to better learn two kinds of paraphrase probabilities: topic-insensitive and topic-sensitive ones, was mainly analyzed. Both of the two probabilities can be incorporated into the modeling of statistical machine translation by using different methods. The experimental results on various test sets demonstrated the effectiveness of the approach.



Published: 01 October 2014
CLC:  TP 391  
Cite this article:

SU Jin-song, DONG Huai-lin, CHEN Yi-dong, SHI Xiao-dong, WU Qing-qiang.

Improved statistical machine translation model with topic-based paraphrase
. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2014, 48(10): 1843-1849.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2014.10.019     OR     http://www.zjujournals.com/eng/Y2014/V48/I10/1843


引入基于主题复述知识的统计机器翻译模型

针对传统的基于双语平行语料的复述获取方法在复述获取和应用的过程中忽视文档上下文的缺点,引入基于主题模型的上下文信息来改善复述获取—主要致力于如何计算上下文无关的复述生成概率和上下文相关的复述生成概率.研究如何将上述2种概率融入统计机器翻译建模,以提高翻译系统的性能.多个测试集上的实验结果证明了该方法的有效性.

 

[1] KOEHN P, OCH F J, MARCU D. Statistical phrase-based translation [C]∥Proceedings of HLT-NAACL. Edmonton, Canada: ACL, 2003: 48-54.
[2] CHIANG D. Hierarchical phrase-based translation [J]. Computational Linguistics, 2007, 33(2): 201-288.
[3] GALLEY M, GRAEHL J, KNIGHT K, et al. Scalable inference and training of context-rich syntactic translation models [C]∥Proceedings of ACL. Sydney: ACL, 2006: 961-968.
[4] LIU Yang, LIU Qun, LIN Shou-xun. 2006. Tree-to-string alignment template for statistical machine translation [C]∥Proceedings of ACL. Sydney: ACL, 2006: 609-616.
[5] WU De-kai. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora [J]. Computational Linguistics, 1997, 23(3): 377-404.
[6] XIONG De-yi, LIU Qun, LIN Shou-xun. Maximum entropy based on phrase reordering model for statistical machine translation [C]∥Proceedings of ACL. Sydney: ACL, 2006: 521-528.
[7] 刘群,王海峰,王惠临,等.机器翻译技术的进展与展望\[EB/OL\].2011-12-04. http:∥nlp. ict. ac. cn/~liuqun/index_zh. htm.
[8] MITAMURA T, NYBERG E. Automatic rewriting for controlled language translation [C]∥Proceedings of NLPRS. Tokyo: ACL, 2001: 112.
[9] YAMAMOTO K. Machine translation by interaction between paraphraser and transfer [C]∥Proceedings of COLING. Taipei: ACM, 2002: 1107-1113.
[10] ZHANG Y J, YAMAMOTO K. Paraphrasing of Chinese utterances [C]∥Proceedings of COLING. Taipei: ACM, 2002: 1163-1169.
[11] SHIMOHATA M, SUMITA E, MATSUMOTO Y. Building a paraphrase corpus for speech translation [C]∥Proceedings of LREC. Lisbon: European Language Resources Association, 2004: 1407-1410.
[12] ONISHI T, UTIYAMA M, SUMITA U. Paraphrase lattice for statistical machine translation [C]∥Proceedings of ACL. Uppsala: ACL, 2010: 15.
[13] DU Jin-hua, JIANG Jie, WAY A. Facilitating translation using source language paraphrase lattices [C]∥Proceedings of EMNLP. Cambridge: ACL, 2010: 420-429.
[14] HE Wei, WU Hua, WANG Hai-feng, et al. Improve SMT quality with automatically extracted paraphrase rules [C]∥Proceedings of ACL. Jeju, Korea: ACL, 2012: 979-987.
[15] CALLISON-BURCH C, KOEHN P, OSBORNE M. Improved statistical machine translation using paraphrases [C]∥Proceedings of HLT-NAACL. New York: ACL, 2006: 17-24.
[16] MARTON Y, CALLISON-BURCH C, RESNIK P. Improved statistical machine translation using monolingually dervied paraphrases [C]∥ Proceedings of EMNLP. Singapore: ACL, 2006: 381-390.
[17] KUHN R, CHEN B X, FOSTER G, et al. Phrase clustering for smoothing TM probabilities - or, how to extract paraphrases from phrase tables [C]∥ Proceedings of COLING. Beijing: ACL, 2010: 608-616.
[18] AURELIEN M. Example-based paraphrasing for improved phrase-based statistical machine translation [C]∥Proceedings of EMNLP. Cambridge: ACL, 2010: 656-666.
[19] KAUCHAK D, BARZILAY R. Paraphrasing for automatic evaluation [C]∥Proceedings of HLT-NAACL. New York: ACL, 2006: 455-462.
[20] ZHOU L, LIN C Y, HOVY E. Re-evaluating machine translation results with paraphrase support [C]∥Proceedings of EMNLP. Sydney: ACL, 2006: 77-84.
[21] MADNANI N, AYAN N F, RESNIK P, et al. Using paraphrases for parameter tuning in statistical machine translation [C]∥Proceedings of ACL Workshop. Prague: ACL, 2007: 120-127.
[22] 赵世奇, 刘挺, 李生. 复述技术研究[J]. 软件学报, 2009, 20(8): 2124-2137.
ZHAO Shi-qi, LIU Ting, LI Sheng. Research on paraphrasing technology [J]. Journal of Software, 2009, 20(8): 2124-2137.
[23] OCH F J, NEY H. Discriminative training and maximum entropy models for statistical machine translation [C]∥Proceedings of ACL. Philadelphia: ACL, 2002: 295-302.
[24] OCH F J. Minimum error rate training in statistical machine translation [C]∥Proceedings of ACL. Sapporo: ACL, 2003: 160-167.
[25] BLEI D M. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[26] SU Jin-song, WU Hua, WANG Hai-feng, et al. Translation model adaptation for statistical machine translation with monolingual topic information [C]∥Proceedings of ACL. Jeju: ACL, 2012: 459-468.
[27] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation [C]∥Proceedings of ACL. Philadelphia: ACL, 2002: 311-318.
[28] KOEHN P. Statistical significance tests for machine translation evaluation [C]∥Proceedings of EMNLP. Barcelona: ACL, 2004: 388-395.

 

[1] HE Xue-jun, WANG Jin, LU Guo-dong, LIU Zhen-yu, CHEN Li, JIN Jing. 3D head portrait sculpture by industrial robot based on triangular mesh slicing and collision detection[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1104-1110.
[2] WANG Hua, HAN Tong-yang, ZHOU Ke. KeyGraph-based community detection algorithm for public security intelligence[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1173-1180.
[3] YOU Hai-hui, MA Zeng-yi, TANG Yi-jun, WANG Yue-lan, ZHENG Lin, YU Zhong, JI Cheng-jun. Soft measurement of heating value of burning municipal solid waste for circulating fluidized bed[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(6): 1163-1172.
[4] BI Xiao-jun, WANG Jia-hui. Teaching-learning-based optimization algorithm with hybrid learning strategy[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(5): 1024-1031.
[5] WANG Liang, YU Zhi-wen, GUO Bin. Moving trajectory prediction model based on double layer multi-granularity knowledge discovery[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 669-674.
[6] LIAO Miao, ZHAO Yu-qian, ZENG Ye-zhan, HUANG Zhong-chao, ZHANG Bing-kui, ZOU Bei-ji. Automatic segmentation for cell images based on support vector machine and ellipse fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 722-728.
[7] HUANG Zheng-yu, JIANG Xin-long, LIU Jun-fa, CHEN Yi-qiang, GU Yang. Fusion feature based semi-supervised manifold localization method[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 655-662.
[8] JIANG Xin-long, CHEN Yi-qiang, LIU Jun-fa, HU Li-sha, SHEN Jian-fei. Wearable system to support proximity awareness for people with autism[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 637-647.
[9] MU Jing-jing, ZHAO Xin-yue, HE Zai-xing, ZHANG Shu-you. Contour reconstruction of overlapped bubbles based on concave-convex transformation and circle fitting[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(4): 714-721.
[10] DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 554-561.
[11] LIU Lei, YANG Peng, LIU Zuo-jun. Locomotion-Mode recognition using multiple kernel relevance vector machine[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 562-571.
[12] GUO Meng-li, DA Fei-peng, DENG Xing, GAI Shao-yan. 3D face recognition based on keypoints and local feature[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(3): 584-589.
[13] WANG Hai jun, GE Hong juan, ZHANG Sheng yan. Fast object tracking algorithm via kernel collaborative presentation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 399-407.
[14] ZHANG Ya nan, CHEN De yun, WANG Ying jie, LIU Yu peng. Incremental graph pattern matching based dynamic recommendation method for cold-start user[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(2): 408-415.
[15] LIU Yu peng, QIAO Xiu ming, ZHAO Shi lei, MA Chun guang. Deep combination of large-scale features in statistical machine translation[J]. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(1): 46-56.