Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2014, Vol. 15 Issue (4): 241-253    DOI: 10.1631/jzus.C1300208
    
Topic-aware pivot language approach for statistical machine translation
Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong
Software School, Xiamen University, Xiamen 361005, China; Center for Digital Media Computing, Xiamen University, Xiamen 361005, China; Cognitive Science Department, Xiamen University, Xiamen 361005, China; Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  The pivot language approach for statistical machine translation (SMT) is a good method to break the resource bottleneck for certain language pairs. However, in the implementation of conventional approaches, pivot-side context information is far from fully utilized, resulting in erroneous estimations of translation probabilities. In this study, we propose two topic-aware pivot language approaches to use different levels of pivot-side context. The first method takes advantage of document-level context by assuming that the bridged phrase pairs should be similar in the document-level topic distributions. The second method focuses on the effect of local context. Central to this approach are that the phrase sense can be reflected by local context in the form of probabilistic topics, and that bridged phrase pairs should be compatible in the latent sense distributions. Then, we build an interpolated model bringing the above methods together to further enhance the system performance. Experimental results on French-Spanish and French-German translations using English as the pivot language demonstrate the effectiveness of topic-based context in pivot-based SMT.

Key wordsNatural language processing      Pivot-based statistical machine translation      Topical context information     
Received: 04 August 2013      Published: 10 April 2014
CLC:  TP391.1  
Cite this article:

Jin-song Su, Xiao-dong Shi, Yan-zhou Huang, Yang Liu, Qing-qiang Wu, Yi-dong Chen, Huai-lin Dong. Topic-aware pivot language approach for statistical machine translation. Front. Inform. Technol. Electron. Eng., 2014, 15(4): 241-253.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/jzus.C1300208     OR     http://www.zjujournals.com/xueshu/fitee/Y2014/V15/I4/241


主题敏感的枢轴语言统计机器翻译

研究目的:枢轴语言方法是解决统计机器翻译建模缺乏双语训练语言的一种方法。传统的枢轴语言方法忽视了枢轴语言文本存在的歧义性,导致建模得到的翻译模型概率知识不够准确。为此,本文使用主题模型为不同层次的上下文信息进行建模,并将上下文信息融入枢轴语言统计机器翻译的建模过程,以改善基于枢轴语言的统计机器翻译模型。
创新要点:使用传统的向量空间模型表示上下文,具有数据稀疏的缺点。本文采用主题模型将不同层次上下文信息概率化,使得枢轴语言文本的上下文信息能够较好融入翻译模型的概率计算,进而改善翻译模型。
研究方法:发挥主题模型的优势,使用主题模型对不同层次上下文进行降维表示;修改传统枢轴语言方法的建模公式,将上下文作为隐变量或相似度,重新调整翻译模型概率。
重要结论:数据实验表明,主题模型能够较好地表示不同层次的上下文,融入主题模型上下文的枢轴语言统计机器翻译模型比传统枢轴语言方法建立的模型具有更好效果。

关键词: 统计机器翻译,  枢轴语言,  主题模型 
[1] Hui Chen, Bao-gang Wei, Yi-ming Li, Yong-huai Liu, Wen-hao Zhu. An easy-to-use evaluation framework for benchmarking entity recognition and disambiguation systems[J]. Front. Inform. Technol. Electron. Eng., 2017, 18(2): 195-205.
[2] Xi-ming Li, Ji-hong Ouyang, You Lu. Topic modeling for large-scale text data[J]. Front. Inform. Technol. Electron. Eng., 2015, 16(6): 457-465.
[3] Xiao-hong Mao, Jing-hua Fu, Wei Chen, Qian You, Shiao-fen Fang, Qun-sheng Peng. Structural visualization of sequential DNA data[J]. Front. Inform. Technol. Electron. Eng., 2011, 12(4): 263-272.
[4] Yun-hua Qu, Tian-jiong Tao, Serge Sharoff, Narisong Jin, Ruo-yuan Gao, Nan Zhang, Yu-ting Yang, Cheng-zhi Xu. Using an integrated feature set to generalize and justify the Chinese-to-English transferring rule of the ‘ZHE’ aspect[J]. Front. Inform. Technol. Electron. Eng., 2010, 11(9): 663-676.