Please wait a minute...
Journal of Zhejiang University (Science Edition)  2020, Vol. 47 Issue (3): 329-336    DOI: 10.3785/j.issn.1008-9497.2020.03.010
Mathematics and Computer Science     
A study of automated English essay evaluating framework based on semantic similarity and XGBoost algorithm
LYU Xin1, CHENG Yuxia2
1.School of Foreign Languages and Literatures, Hangzhou Dianzi University, Hangzhou 310018, China
1.School of Computer Science and Technology, Hangzhou Dianzi University, Hangzhou 310018,China
Download: HTML (   PDF(1100KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Automated essay scoring and comment generation has greatly released expert human raters from huge workload of evaluating English essays, but up to now, there is still some doubt about the accuracy and fairness of its results. In recent years, with the rapid development of machine learning and natural language processing, etc., to some extent the performance of text classification, machine translation and the like has been improved. However, quite a number of new research achievements have not been applied to automated essay scoring. This paper presents a semantic representation vector of essays, which is a combination of the features of word2vec, paragraph2vec, pos2vec and LDA (latent Dirichlet allocation); then, the commentary labels of essays are generated through the semantic similarity model based on kNN (k nearest neighbors) algorithm; next, the English essays are scored on the basis of XGBoost (extreme gradient boosting) regression model; finally, 900 college students’ English essays are taken as samples to verify the results. The case studies show that the evaluating framework in this paper has higher accuracy in automated scoring and comment generation of English essays than traditional methods.

Key wordsEnglish essay      automated essay scoring      semantic representation      similarity      XGBoost     
Received: 18 April 2019      Published: 25 June 2020
CLC:  TP391.6  
Cite this article:

LYU Xin, CHENG Yuxia. A study of automated English essay evaluating framework based on semantic similarity and XGBoost algorithm. Journal of Zhejiang University (Science Edition), 2020, 47(3): 329-336.



作文智能评分和评语智能生成能极大减轻评阅专家的工作量、节约人力成本。目前,评分和评语结果的准确性与公平性尚不高。近年来,机器学习和自然语言处理等技术的快速发展,在一定程度上提升了文本分类、机器翻译等任务的性能,但仍有许多新的研究成果尚未应用于作文智能评价。本研究综合了词向量(word2vec)、段落向量(paragraph2vec)、词性向量(pos2vec)和LDA (latent dirichlet allocation)等特征,共同组合为作文的语义表示向量;采用基于kNN (k nearest neighbors)算法的语义相似度模型,得到作文的评语标签;采用基于XGBoost(extreme gradient boosting)的回归模型计算英语作文的评分值;并以900篇大学生英语作文为样本,构造算例进行验证。最后表明,提出的智能评价框架在英语作文自动评分和评语生成的准确性上,都要高于传统方法。

关键词: 智能评分,  相似度,  语义表示,  XGBoost,  英语作文 
1 DIKLI S. An overview of automated scoring of essays[J]. Journal of Technology Learning and Assessment, 2006, 5(1): 1-35.
2 VALENTI S, NERI F, CUCCHIARELLI A. An overview of current research on automated essay grading[J]. Journal of Information Technology Education, 2003, 2(1): 319-330.
3 LANDAUER T K, LAHAM D, FOLTZ P W. The intelligent essay assessor: applications to educational technology [J]. IEEE Intelligent Systems and Their Applications, 2000, 15(5): 27-31.
4 张梅, 印勇. 英语作文计算机评分技术综述[J]. 外语电化教学, 2010(136): 44-47.DOI:10.3969/j.issn.1001-5795.2010.06.008 ZHANG M, YIN Y. An overview of computerized scoring of English essays[J]. Computer-Assisted Foreign Language Education, 2010(136): 44-47.DOI:10.3969/j.issn.1001-5795.2010.06.008
5 刘明杨. 高考作文自动评分关键技术研究[D]. 哈尔滨: 哈尔滨工业大学, 2015. LIU M Y. Research on the Key Technology of the Automatic Scoring of the College Entrance Examination Essay [D]. Harbin: Harbin Institute of Technology, 2015.
6 HIRSCHBERG J , MANNING C D. Advances in natural language processing[J]. Science, 2015, 349(6245): 261-266.
7 ISHIOKA T , KAMEDA M. Automated Japanese essay scoring system based on articles written by experts[C]// 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference.Sydney ACL, 2006 (1): 233-240.DOI:10.3115/1220175.1220205
8 LARKEY L S. Automated Essay Scoring: A Cross Disciplinary Perspective[M]. Hillsdale, NJ: Lawrence Erlbaum Associates, 2002.
9 RAMINENI C, TRAPANI C S, WILLIAMSON D M, et al. Evaluation of the E-rater scoring engine for the GRE issue and argument prompts[J]. ETS Research Report Series, 2012 (1): i-106.DOI:10.1002/j.2333-8504.2012.tb02284.x
10 殷小娟, 贾永华, 林庆英. “句酷网”和“冰果”自动评分效度的对比实证研究[J]. 河北北方学院学报(社会科学版), 2017(1): 91-96.DOI:10.3969/j.issn.2095-462X.2017.01.022 YIN X J, JIA Y H, LIN Q Y. A comparative empirical study on the reliability of “Juku” and “Bingo” online autonomous grading systems[J]. Journal of Hebei North University (Social Science Edition), 2017(1): 91-96.DOI:10.1371/journal.pone.0066730
11 FOLTZ P W. Latent semantic analysis for text-based research[J]. Behavior Research Methods Instruments & Computers, 1996, 28(2): 197-202.DOI:10.3758/bf03204765
12 王耀华, 李舟军, 何跃鹰, 等. 基于文本语义离散度的自动作文评分关键技术研究[J]. 中文信息学报, 2016(6): 173-181.DOI:10.4028/ WANG Y H, LI Z J, HE Y Y, et al. Research on key technology of automatic essay scoring based on text semantic dispersion[J]. Journal of Chinese Information Processing,2016(6): 173-181.DOI:10.4028/
13 陈一乐. 基于回归分析的中文作文自动评分技术研究[D]. 哈尔滨: 哈尔滨工业大学,2016. CHEN Y L. Research on Key Techniques of Automated Chinese Essay Scoring Based on Regression Analysis[D]. Harbin: Harbin Institute of Technology, 2016.
14 李斌. 基于文本分类技术的英语作文自动评分研究[D]. 哈尔滨:哈尔滨工业大学,2009. DOI:10.3923/itj.2013.7977.7982 LI B. Research on Automated English Essay Scoring Using Text Categorization[D]. Harbin: Harbin Institute of Technology,2009.DOI:10.3923/itj.2013.7977.7982
15 CHEN H,XU J, HE B. Automated essay scoring by capturing relative writing quality [J].The Computer Journal,2014,57(9):1318-1330.
16 魏扬威, 黄萱菁. 结合语言学特征和自编码器的英语作文自动评分[J].计算机系统应用,2017(1): 1-8. DOI:10.15888/j.cnki.csa.005535 WEI Y W, HUANG X J. Automatic essay scoring using linguistic features and auto encoder[J]. Computer Systems and Applications,2017(1): 1-8.DOI:10.15888/j.cnki.csa.005535
17 李婷, 张景祥. 集中趋势自适应增强的英语作文评分算法[J].计算机工程与应用,2018(9): 151-155.DOI:10.3778/j.issn.1002-8331.1611-0502 LI T, ZHANG J X. Adaptive boosting with central tendency algorithm for English essay scoring[J]. Computer Engineering and Applications, 2018(9): 151-155. DOI:10.3778/j.issn.1002-8331.1611-0502
18 LE Q , MIKOLOV T. Distributed representations of sentences and documents[C]// The 31st International Conference on Machine Learning. Beijing: ICML, 2014.
19 BLEI D M, NG A Y , JORDAN M I. Latent dirichlet allocation [J]. Journal of Machine Learning Research, 2003,3(4/5): 993-1022.
20 CHEN T , GUESTRIN C. XGBoost: A scalable tree boosting system[C]//The 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM,2016(1):785-794. DOI:10.1145/2939672.2939785
[1] Bo WANG, Xiaojing HUI, Xing LU. Axiomatic truth degrees of formula in MTL∀predicate logic system[J]. Journal of Zhejiang University (Science Edition), 2022, 49(5): 521-526.
[2] . [J]. Journal of Zhejiang University (Science Edition), 2022, 49(4): 398-407.
[3] ZHENG Jing, ZHANG Kai. Case-based decision method for gas explosion considering decision maker's regret aversion[J]. Journal of Zhejiang University (Science Edition), 2020, 47(3): 337-344.
[4] LI Zongmin, BIAN Lingyan, LIU Yujie. Cross-scenario clothing retrieval based on multi-level features[J]. Journal of Zhejiang University (Science Edition), 2019, 46(4): 431-438.
[5] CHEN Yongpei, DU Zhenhong, LIU Renyi, ZHANG Feng, WANG Liangang. A hybrid geo-semantic similarity measurement model introducing geographic entities[J]. Journal of Zhejiang University (Science Edition), 2018, 45(2): 196-204.
[6] HE Jing, LIU Renyi, ZHANG Feng, DU Zhenhong, CHEN Yongpei. An image representation method based on the similarity of feature points[J]. Journal of Zhejiang University (Science Edition), 2017, 44(5): 599-605.
[7] FANG Yapan, ZHANG Feng, DU Zhenhong, LIU Renyi. Geographical event detection based on topology and spatial similarity[J]. Journal of Zhejiang University (Science Edition), 2016, 43(6): 701-708.
[8] BU Dengli. Hybrid genetic algorithm for MPRM minimization[J]. Journal of Zhejiang University (Science Edition), 2016, 43(2): 184-189.