Abstract The study of identifying the authors or translators of ancient texts often requires inferring unknown texts based on the characteristics of known ones. In recent decades, the issues of translators and the dating of early Chinese Buddhist translations have attracted widespread attention in the academic community. This paper utilizes deep learning models, specifically BERT and RBT6, to extract feature information from translated texts and conduct a comprehensive examination of the inscription issues in An Shigao’s translations. Additionally, the study validates the effectiveness of language models in identifying texts in Chinese Buddhist translations.
The study uses 14 widely accepted translations by An Shigao as positive samples and randomly selects non-An Shigao translations as negative samples. The experimental results show that the RBT6 model outperforms both the BERT and traditional support vector machine (SVM) models in precision, recalls, and other metrics, demonstrating superior classification performance. As a validation, the trained model is applied to evaluate 35 translations attributed to An Shigao but widely regarded as unreliable by scholars. The model’s evaluations are found to align perfectly with the conclusions established through textual criticism, thereby confirming its effectiveness in distinguishing authentic translations. Additionally, to examine whether factors such as variant texts, punctuation segmentation, and text length affect the detection results, the study employs techniques like masking, random punctuation insertion, and random segment extraction on the same set of texts. The results of both experiments are consistent, confirming that these factors had no significant effect on the model’s detection outcomes.
This study applies the three trained models to detect the disputed or newly discovered translations attributed to An Shigao. The models identify the following texts as translations by An Shigao: T101 Za ahan jing 杂阿含经 (excluding sutras 9 and 10), T1557 Apitan wufaxing jing 阿毗昙五法行经, T735 Siyuan jing 四愿经 (17/537b17-c27 part), the Kongō-ji manuscript of Anban shouyi jing 安般守意经, Foshuo shi’ermen jing 佛说十二门经 and Fo shuojie shi’ermen jing 佛说解十二门经. In contrast, the models classify the following texts as non-An Shigao translations: T105 Wuyin piyu jing 五阴譬喻经, T109 Zhuan falun jing 转法轮经, Wushi jiaoji jing 五十校计经 (volume 59 and 60 of T397 Da fangdeng daji jing 大方等大集经), T605 Chanxing faxiang jing 禅行法想经, T792 Fa shouchen jing 法受尘经, the Dunhuang version of Sanshiqi pin jing 三十七品经, and sutras 9 and 10 of the T101 Za Ahan jing. The models’ verification results largely align with recent conclusions drawn from a linguistic perspective regarding the identification of suspicious translations attributed to An Shigao.
This study offers a practical comparison between traditional identification methods and language model-based detection, reflecting on potential issues with both approaches. Traditional methods may involve selective interpretation of data, excessive reliance on documentary evidence, and an overemphasis on the uniqueness of linguistic features, while neglecting tendencies. In contrast, when using language models for identification, it is also crucial to consider the impact of content and format on detection results.
This study applies deep learning language models to the identification of translated Buddhist texts, significantly enhancing the efficiency and scientific rigor of determining translator attributions. In the era of big data, language model-based detection methods not only provide effective support for author identification and dating of ancient texts but also significantly improve the processing efficiency of questionable documents, particularly in cases involving a vast corpus of texts with complex transmission histories. These methods offer scientific, rapid, and quantifiable analytical tools for related researches. This approach opens up exciting prospects for the advancement of philology and linguistics in the new era, while also providing valuable insights for the academic community to further explore the application of deep learning technology across various fields.
|
Published: 03 February 2025
|
|
|
|