Improved statistical machine translation model with topic-based paraphrase" /> 引入基于主题复述知识的统计机器翻译模型
Please wait a minute...
浙江大学学报(工学版)
计算机技术﹑电信技术     
引入基于主题复述知识的统计机器翻译模型
苏劲松1, 董槐林1, 陈毅东2, 史晓东2, 吴清强1
1.厦门大学 软件学院,福建 厦门 361005;2.厦门大学 智能学科系,福建 厦门 361005
Improved statistical machine translation model with topic-based paraphrase
SU Jin-song1, DONG Huai-lin1, CHEN Yi-dong2, SHI Xiao-dong2, WU Qing-qiang1
1. School of Software, Xiamen University, Xiamen 361005, China; 2. Department of Cognitive Science, Xiamen University, Xiamen 361005, China
 全文: PDF(732 KB)   HTML
摘要:

针对传统的基于双语平行语料的复述获取方法在复述获取和应用的过程中忽视文档上下文的缺点,引入基于主题模型的上下文信息来改善复述获取—主要致力于如何计算上下文无关的复述生成概率和上下文相关的复述生成概率.研究如何将上述2种概率融入统计机器翻译建模,以提高翻译系统的性能.多个测试集上的实验结果证明了该方法的有效性.

Abstract:

To deal with the defect of the conventional parallel corpus based paraphrase extraction method which neglects document-level context, the paraphrase extraction and its application in statistical machine translation were improved by introducing the context based on topic model. The problem that how to better learn two kinds of paraphrase probabilities: topic-insensitive and topic-sensitive ones, was mainly analyzed. Both of the two probabilities can be incorporated into the modeling of statistical machine translation by using different methods. The experimental results on various test sets demonstrated the effectiveness of the approach.

出版日期: 2014-10-01
:  TP 391  
基金资助:

国家“十二五”科技支撑计划资助项目(2012BAH14F03);国家自然科学基金资助项目(61005052, 61303082);高等学校博士学科点专项科研基金资助项目(2012012120046);福建省自然科学基金资助项目(2011J01360);厦门市科技计划资助项目(3502Z20103001);深圳市高性能数据挖掘重点实验室资助项目(CXB201005250021A).

通讯作者: 吴清强,男,副教授     E-mail: wuqq@xmu.edu.cn
作者简介: 苏劲松(1982—),男,博士,助理教授,从事自然语言处理、机器翻译的研究.E-mail: jssu@xmu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  

引用本文:

苏劲松, 董槐林, 陈毅东, 史晓东, 吴清强. 引入基于主题复述知识的统计机器翻译模型[J]. 浙江大学学报(工学版), 10.3785/j.issn.1008-973X.2014.10.019.

SU Jin-song, DONG Huai-lin, CHEN Yi-dong, SHI Xiao-dong, WU Qing-qiang.

Improved statistical machine translation model with topic-based paraphrase
. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 10.3785/j.issn.1008-973X.2014.10.019.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2014.10.019        http://www.zjujournals.com/eng/CN/Y2014/V48/I10/1843

 

[1] KOEHN P, OCH F J, MARCU D. Statistical phrase-based translation [C]∥Proceedings of HLT-NAACL. Edmonton, Canada: ACL, 2003: 48-54.
[2] CHIANG D. Hierarchical phrase-based translation [J]. Computational Linguistics, 2007, 33(2): 201-288.
[3] GALLEY M, GRAEHL J, KNIGHT K, et al. Scalable inference and training of context-rich syntactic translation models [C]∥Proceedings of ACL. Sydney: ACL, 2006: 961-968.
[4] LIU Yang, LIU Qun, LIN Shou-xun. 2006. Tree-to-string alignment template for statistical machine translation [C]∥Proceedings of ACL. Sydney: ACL, 2006: 609-616.
[5] WU De-kai. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora [J]. Computational Linguistics, 1997, 23(3): 377-404.
[6] XIONG De-yi, LIU Qun, LIN Shou-xun. Maximum entropy based on phrase reordering model for statistical machine translation [C]∥Proceedings of ACL. Sydney: ACL, 2006: 521-528.
[7] 刘群,王海峰,王惠临,等.机器翻译技术的进展与展望\[EB/OL\].2011-12-04. http:∥nlp. ict. ac. cn/~liuqun/index_zh. htm.
[8] MITAMURA T, NYBERG E. Automatic rewriting for controlled language translation [C]∥Proceedings of NLPRS. Tokyo: ACL, 2001: 112.
[9] YAMAMOTO K. Machine translation by interaction between paraphraser and transfer [C]∥Proceedings of COLING. Taipei: ACM, 2002: 1107-1113.
[10] ZHANG Y J, YAMAMOTO K. Paraphrasing of Chinese utterances [C]∥Proceedings of COLING. Taipei: ACM, 2002: 1163-1169.
[11] SHIMOHATA M, SUMITA E, MATSUMOTO Y. Building a paraphrase corpus for speech translation [C]∥Proceedings of LREC. Lisbon: European Language Resources Association, 2004: 1407-1410.
[12] ONISHI T, UTIYAMA M, SUMITA U. Paraphrase lattice for statistical machine translation [C]∥Proceedings of ACL. Uppsala: ACL, 2010: 15.
[13] DU Jin-hua, JIANG Jie, WAY A. Facilitating translation using source language paraphrase lattices [C]∥Proceedings of EMNLP. Cambridge: ACL, 2010: 420-429.
[14] HE Wei, WU Hua, WANG Hai-feng, et al. Improve SMT quality with automatically extracted paraphrase rules [C]∥Proceedings of ACL. Jeju, Korea: ACL, 2012: 979-987.
[15] CALLISON-BURCH C, KOEHN P, OSBORNE M. Improved statistical machine translation using paraphrases [C]∥Proceedings of HLT-NAACL. New York: ACL, 2006: 17-24.
[16] MARTON Y, CALLISON-BURCH C, RESNIK P. Improved statistical machine translation using monolingually dervied paraphrases [C]∥ Proceedings of EMNLP. Singapore: ACL, 2006: 381-390.
[17] KUHN R, CHEN B X, FOSTER G, et al. Phrase clustering for smoothing TM probabilities - or, how to extract paraphrases from phrase tables [C]∥ Proceedings of COLING. Beijing: ACL, 2010: 608-616.
[18] AURELIEN M. Example-based paraphrasing for improved phrase-based statistical machine translation [C]∥Proceedings of EMNLP. Cambridge: ACL, 2010: 656-666.
[19] KAUCHAK D, BARZILAY R. Paraphrasing for automatic evaluation [C]∥Proceedings of HLT-NAACL. New York: ACL, 2006: 455-462.
[20] ZHOU L, LIN C Y, HOVY E. Re-evaluating machine translation results with paraphrase support [C]∥Proceedings of EMNLP. Sydney: ACL, 2006: 77-84.
[21] MADNANI N, AYAN N F, RESNIK P, et al. Using paraphrases for parameter tuning in statistical machine translation [C]∥Proceedings of ACL Workshop. Prague: ACL, 2007: 120-127.
[22] 赵世奇, 刘挺, 李生. 复述技术研究[J]. 软件学报, 2009, 20(8): 2124-2137.
ZHAO Shi-qi, LIU Ting, LI Sheng. Research on paraphrasing technology [J]. Journal of Software, 2009, 20(8): 2124-2137.
[23] OCH F J, NEY H. Discriminative training and maximum entropy models for statistical machine translation [C]∥Proceedings of ACL. Philadelphia: ACL, 2002: 295-302.
[24] OCH F J. Minimum error rate training in statistical machine translation [C]∥Proceedings of ACL. Sapporo: ACL, 2003: 160-167.
[25] BLEI D M. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003, 3: 993-1022.
[26] SU Jin-song, WU Hua, WANG Hai-feng, et al. Translation model adaptation for statistical machine translation with monolingual topic information [C]∥Proceedings of ACL. Jeju: ACL, 2012: 459-468.
[27] PAPINENI K, ROUKOS S, WARD T, et al. Bleu: a method for automatic evaluation of machine translation [C]∥Proceedings of ACL. Philadelphia: ACL, 2002: 311-318.
[28] KOEHN P. Statistical significance tests for machine translation evaluation [C]∥Proceedings of EMNLP. Barcelona: ACL, 2004: 388-395.

 

[1] 何雪军, 王进, 陆国栋, 刘振宇, 陈立, 金晶. 基于三角网切片及碰撞检测的工业机器人三维头像雕刻[J]. 浙江大学学报(工学版), 2017, 51(6): 1104-1110.
[2] 王桦, 韩同阳, 周可. 公安情报中基于关键图谱的群体发现算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1173-1180.
[3] 尤海辉, 马增益, 唐义军, 王月兰, 郑林, 俞钟, 吉澄军. 循环流化床入炉垃圾热值软测量[J]. 浙江大学学报(工学版), 2017, 51(6): 1163-1172.
[4] 毕晓君, 王佳荟. 基于混合学习策略的教与学优化算法[J]. 浙江大学学报(工学版), 2017, 51(5): 1024-1031.
[5] 王亮, 於志文, 郭斌. 基于双层多粒度知识发现的移动轨迹预测模型[J]. 浙江大学学报(工学版), 2017, 51(4): 669-674.
[6] 廖苗, 赵于前, 曾业战, 黄忠朝, 张丙奎, 邹北骥. 基于支持向量机和椭圆拟合的细胞图像自动分割[J]. 浙江大学学报(工学版), 2017, 51(4): 722-728.
[7] 黄正宇, 蒋鑫龙, 刘军发, 陈益强, 谷洋. 基于融合特征的半监督流形约束定位方法[J]. 浙江大学学报(工学版), 2017, 51(4): 655-662.
[8] 蒋鑫龙, 陈益强, 刘军发, 忽丽莎, 沈建飞. 面向自闭症患者社交距离认知的可穿戴系统[J]. 浙江大学学报(工学版), 2017, 51(4): 637-647.
[9] 穆晶晶, 赵昕玥, 何再兴, 张树有. 基于凹凸变换与圆周拟合的重叠气泡轮廓重构[J]. 浙江大学学报(工学版), 2017, 51(4): 714-721.
[10] 戴彩艳, 陈崚, 李斌, 陈伯伦. 复杂网络中的抽样链接预测[J]. 浙江大学学报(工学版), 2017, 51(3): 554-561.
[11] 刘磊, 杨鹏, 刘作军. 采用多核相关向量机的人体步态识别[J]. 浙江大学学报(工学版), 2017, 51(3): 562-571.
[12] 郭梦丽, 达飞鹏, 邓星, 盖绍彦. 基于关键点和局部特征的三维人脸识别[J]. 浙江大学学报(工学版), 2017, 51(3): 584-589.
[13] 王海军, 葛红娟, 张圣燕. 基于核协同表示的快速目标跟踪算法[J]. 浙江大学学报(工学版), 2017, 51(2): 399-407.
[14] 张亚楠, 陈德运, 王莹洁, 刘宇鹏. 基于增量图形模式匹配的动态冷启动推荐方法[J]. 浙江大学学报(工学版), 2017, 51(2): 408-415.
[15] 刘宇鹏, 乔秀明, 赵石磊, 马春光. 统计机器翻译中大规模特征的深度融合[J]. 浙江大学学报(工学版), 2017, 51(1): 46-56.