Please wait a minute...
浙江大学学报(工学版)  2020, Vol. 54 Issue (6): 1115-1125    DOI: 10.3785/j.issn.1008-973X.2020.06.008
计算机技术     
基于视觉行为与文本特征分析的阅读批注生成方法
程时伟(),郭炜
浙江工业大学 计算机科学与技术学院,浙江 杭州 310023
Reading annotation generation method through analysis of visual behavior and text features
Shi-wei CHENG(),Wei GUO
School of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
 全文: PDF(1245 KB)   HTML
摘要:

提出一种阅读辅助方法,利用一种分级锚定方法确定目标文本,构造与用户视觉行为和目标文本特征相关的需求判定因子,根据这些因子计算用户对阅读辅助的需求度,从而判定用户对目标文本是否有单词翻译或长难句摘要方面的需求. 当判定用户有需求时,以批注的形式显示单词词义或长难句摘要. 实验结果表明,提出的用户需求判定方法平均精确率达到了80.6% ± 6.3%,自动批注提高了用户的阅读效率和主观体验,验证了该方法的可行性和有效性.

关键词: 眼动跟踪文本识别需求判定自动批注人机交互    
Abstract:

A reading aid method was proposed. A hierarchical anchoring method was used to determine the target text, in order to construct the demand determination factors related to the user’s visual behavior and the features of the target text, and to calculate the user's demand degree for reading aid based on these factors, so as to determine whether the user had the demand for word translation or long sentence summary of the target text. When the demand of the user was determined, the word meaning or long difficult sentence summary was displayed in the form of annotation. The test results show that the average accuracy of this method reached 80.6% ± 6.3%, and the automatically generated annotation can improve the user’s reading efficiency and subjective experience. Thus, the feasibility and effectiveness of the proposed method are validated.

Key words: eye tracking    text recognition    demand determination    automatic annotation    human-computer interaction
收稿日期: 2020-01-01 出版日期: 2020-07-06
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61772468);浙江省属高校基本科研业务费专项资助(RF-B2019001)
作者简介: 程时伟(1981—),男,教授,博导,从事人机交互及普适计算研究. orcid.org/0000-0003-4716-4179. E-mail: swc@zjut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
程时伟
郭炜

引用本文:

程时伟,郭炜. 基于视觉行为与文本特征分析的阅读批注生成方法[J]. 浙江大学学报(工学版), 2020, 54(6): 1115-1125.

Shi-wei CHENG,Wei GUO. Reading annotation generation method through analysis of visual behavior and text features. Journal of ZheJiang University (Engineering Science), 2020, 54(6): 1115-1125.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.06.008        http://www.zjujournals.com/eng/CN/Y2020/V54/I6/1115

图 1  阅读需求判定与批注生成方法总体框架
图 2  阅读模式分析流程图
图 3  纸质阅读时文本图像随设备与书面的相对位置变化示例
图 4  文本提取方法流程图
图 5  固定参数在不同字符大小时的轮廓连通效果示例
图 6  “开”参数变化时轮廓总量的变化
图 7  文本矩偏角的轮廓总量直方图
分类器 Pc / % Tc / s 分类器 Pc / % Tc / s
KNN 67.3 0.036 CNN 89.1 0.089
RF 78.6 0.025 SVM 84.7 0.013
表 1  字符分类器性能测试结果
图 8  文本对象结构树示意图
① https://www.english-corpora.org/coca/
表 1  
深度模型 预训练模型 R2
Seq2seq-Attention Word2vec 0.129 4
GloVe 0.137 4
BERT 0.154 7
表 2  长难句自动摘要评估结果
② https://www-nlpir.nist.gov/projects/duc/data.html
③ https://catalog.ldc.upenn.edu/LDC2012T21
表 1  
参数 符号 默认值
单词强度系数 ${\lambda _{\rm w}}$ 0.36
句子强度系数 ${\lambda _{\rm s}}$ 0.36
各项因子系数 $\{ {v_{\rm{g}}},{v_{\rm{k}}},{v_{\rm{d}}}\} $, $\{ {t_{\rm{g}}},{v_{\rm{s}}},{v_{\rm{l}}}\} $ 1.00
表 3  需求判定的可调参数设置说明
图 9  原型系统界面与阅读批注示例
图 10  不同阅读模式下的用户测试示例
混淆矩阵 0:统判定无需求 1:系统判定有需求
0:用户没有需求 TN FN
1:用户有需求 FP TP
表 4  需求判定混淆矩阵的定义
图 11  单词和长难句需求判定的F分数
阅读模式 样本类型 P/% T/s
电子阅读 单词 86.3 1.3
句子 72.6 2.7
纸质阅读 单词 80.4 1.5
句子 63.9 2.9
表 5  需求判定的平均精确率与批注生成的平均时延
1 RAYNER K Eye movements in reading and information processing: 20 years of research[J]. Psychological Bulletin, 1998, 124 (3): 372- 422
doi: 10.1037/0033-2909.124.3.372
2 范琳, 刘振前 阅读理解过程的眼动研究[J]. 外语与外语教学, 2007, (4): 38- 43
FAN Lin, LIU Zhen-qian A study on eye movement in reading comprehension[J]. Foreign Languages and Their Teaching, 2007, (4): 38- 43
3 RIGAS I, FRIEDMAN L, KOMOGORTSEV O. A study on the extraction and analysis of a large set of eye movement features during reading [J]. arXiv Preprint: 1703.09167, 2017.
4 NILSSON M, NIVRE J. Learning where to look: modeling eye movements in reading [C] // Thirteenth Conference on Computational Natural Language Learning (CoNLL), Boulder: Association for Computational Linguistics, 2009: 93-101.
5 HARA T, MOCHIHASHI D, KANO Y, et al. Predicting word fixations in text with a crf model for capturing general reading strategies among readers [C] // Proceedings of the First Workshop on Eye-Tracking and Natural Language Processing. Stroudsburg: ACL, 2012: 55-70.
6 KUNZE K, IWAMURA M, KISE K, et al Activity recognition for the mind: toward a cognitive "Quantified Self"[J]. Computer, 2013, 46 (10): 105- 108
doi: 10.1109/MC.2013.339
7 KUNZE K, UTSUMI Y, SHIGA Y, et al. I know what you are reading: recognition of document types using mobile eye tracking [C] // Proceedings of the 2013 International Symposium on Wearable Computers (ISWC). New York: ACM, 2013: 113-116.
8 BERZAK Y, KATZ B, LEVY R. Assessing language proficiency from eye movements in reading [J]. arXiv Preprint: 1804.07329, 2018.
9 WALLACE R, MCMILLAN C. EyeDoc: documentation navigation with eye tracking [J]. arXiv Preprint: 1903.00040, 2019.
10 DUSSIAS P E Uses of eye-tracking data in second language sentence processing research[J]. Annual Review of Applied Linguistics, 2010, 30: 149- 166
doi: 10.1017/S026719051000005X
11 CHEN Z, SHI B E Using variable dwell time to accelerate gaze-based web browsing with two-step selection[J]. International Journal of Human Computer Interaction, 2019, 35 (3): 240- 255
doi: 10.1080/10447318.2018.1452351
12 CHIRCOP L, RADHAKRISHNAN J, SELENER L, et al. Markitup: crowdsourced collaborative reading [C] // CHI'13 Extended Abstracts on Human Factors in Computing Systems. New York: ACM, 2013: 2567-2572.
13 TASHMAN C S, EDWARDS W K. Active reading and its discontents: the situations, problems and ideas of readers [C] // Proceedings of the SIG-CHI Conference on Human Factors in Computing Systems. New York: ACM, 2011: 2927-2936.
14 BIEDERT R, BUSCHER G, SCHWARZ S, et al. The text 2.0 framework: writing web-based gaze-controlled realtime applications quickly and easily [C] // Proceedings of the 2010 workshop on Eye Gaze in Intelligent Human Machine Interaction. New York: ACM, 2010: 114-117.
15 BUSCHER G, DENGEL A, VAN ELST L, et al. Generating and using gaze-based document annotations [C] // CHI'08 extended abstracts on Human factors in computing systems. New York: ACM, 2008: 3045-3050.
16 程时伟, 孙煜杰 面向阅读教学的眼动数据可视化批注方法[J]. 浙江工业大学学报, 2017, 45 (6): 610- 614
CHENG Shi-wei, SUN Yu-jie Eye movement data visualization based annotation for reading teaching[J]. Journal of Zhejiang University of Technology, 2017, 45 (6): 610- 614
doi: 10.3969/j.issn.1006-4303.2017.06.004
17 KAR A, CORCORAN P A review and analysis of eye-gaze estimation systems, algorithms and performance evaluation methods in consumer platforms[J]. IEEE Access, 2017, 5: 16495- 16519
doi: 10.1109/ACCESS.2017.2735633
18 ZHANG C, CHI J N, ZHANG Z H, et al Gaze estimation in a gaze tracking system[J]. Science China Information Sciences, 2011, 54 (11): 2295- 2306
doi: 10.1007/s11432-011-4243-6
19 周小龙, 汤帆扬, 管秋, 等 基于3D人眼模型的视线跟踪技术综述[J]. 计算机辅助设计与图形学学报, 2017, 29 (9): 1579- 1589
ZHOU Xiao-long, TANG Fan-yang, GUAN Qiu, et al A survey of 3D eye model based gaze tracking[J]. Journal of Computer-Aided Design and Computer Graphics, 2017, 29 (9): 1579- 1589
doi: 10.3969/j.issn.1003-9775.2017.09.001
20 金纯, 李娅萍 视线追踪系统中注视点估计方法研究[J]. 自动化仪表, 2016, 37 (5): 32- 35
JIN Chun, LI Ya-ping Estimation method of the fixation point in gaze tracking system[J]. Process Automation Instrumentation, 2016, 37 (5): 32- 35
21 张昀, 赵荣椿, 赵歆波, 等. 视线跟踪技术的2D和3D方法综述[C] // 第十三届全国信号处理学术年会(CCSP-2007)论文集. 济南: [s. n.], 2007: 223-229.
ZHANG Yun, ZHAO Rong-chun, ZHAO Xin-bo, et al. A review of 2d and 3d eye gaze tracking techniques [C] // 13rd Signal Processing (CCSP-2007). Jinan: [s. n.], 2007: 223-229.
22 ZHANG X. Appearance-based gaze estimation in the wild [C] // Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). Washington: IEEE CS, 2015: 4511-4520.
23 ZHANG X, SUGANO Y, FRITZ M, et al Mpiigaze: real-world dataset and deep appearance-based gaze estimation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 41 (1): 162- 175
24 SELA M, XU P, HE J, et al. Gazegan-unpaired adversarial image generation for gaze estimation [J]. arXiv Preprint: 1711.09767, 2017.
25 PALMERO C, SELVA J, BAGHERI M A, et al. Recurrent cnn for 3D gaze estimation using appearance and shape cues [J]. arXiv Preprint: 1805.03064, 2018.
26 ZHAO T, YAN Y, PENG J, et al Guiding intelligent surveillance system by learning-by-synthesis gaze estimation[J]. Pattern Recognition Letters, 2019, 125: 556- 562
doi: 10.1016/j.patrec.2019.02.008
27 程时伟, 孙志强 用于移动设备人机交互的眼动跟踪方法[J]. 计算机辅助设计与图形学学报, 2014, (8): 1354- 1361
CHENG Shi-wei, SUN Zhi-qiang An approach to eye tracking for mobile device based interaction[J]. Journal of Computer-Aided Design and Computer Graphics, 2014, (8): 1354- 1361
28 VILLANUEVA A, CABEZA R A novel gaze estimation system with one calibration point[J]. IEEE Transactions on Systems, Man and Cybernetics, Part B (Cybernetics), 2008, 38 (4): 1123- 1138
doi: 10.1109/TSMCB.2008.926606
29 JI Q, ZHU Z, et al Eye and gaze tracking for interactive graphic display[J]. Machine Vision and Applications, 2004, 15 (3): 139- 148
30 GUESTRIN E D, EIZENMAN M General theory of remote gaze estimation using the pupil center and corneal reflections[J]. IEEE transactions on bio-medical engineering, 2006, 53 (6): 1124- 1133
doi: 10.1109/TBME.2005.863952
31 WANG J, YUAN X, LIU Z An extraction method of pupil and corneal reflection centers based on image processing technology[J]. CAAI Transactions on Intelligent Systems, 2012, 5
32 MORIMOTO C H, MIMICA M R M Eye gaze tracking techniques for interactive applications[J]. Computer Vision and Image Understanding, 2005, 98 (1): 4- 24
doi: 10.1016/j.cviu.2004.07.010
33 FISCHLER M A, BOLLES R C Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography[J]. Communications of the ACM, 1981, 24 (6): 381- 395
doi: 10.1145/358669.358692
34 王润民, 桑农, 丁丁, 等 自然场景图像中的文本检测综述[J]. 自动化学报, 2018, 44 (12): 3- 31
WANG Run-Min, SANG Nong, DING Ding, et al Text detection in natural scene image: a survey[J]. Journal of Automatica Sinica, 2018, 44 (12): 3- 31
35 LIAO M H, SHI B G, BAI X, et al. Textboxes: a fast text detector with a single deep neural network [C] // Proceedings of the 31st AAAI Conference on Artificial Intelligence. Menlo Park AAAI, 2017: 4161-4167.
36 ZHANG Z, ZHANG C Q, SHEN W, et al. Multi-oriented text detection with fully convolutional network [C] // Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington: IEEE CS, 2016: 4159-4167.
37 LIU Y L, JIN L W. Deep matching prior network: toward tighter multi-oriented text detection [C] // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington: IEEE CS, 2017: 3454-3461.
38 SHI B, BAI X, BELONGIE S. Detecting oriented text in natural images by linking segments [C] // Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington: IEEE CS, 2017: 2550-2558.
39 ZHOU X Y, YAO C, WEN H, et al. EAST: an efficient and accurate scene text detector [C] // Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Washington: IEEE CS, 2017: 2642-2651.
40 OSTU N A threshold selection method from gray-histogram[J]. IEEE Transactions on Systems, Man, and Cybernetics, 1979, 9 (1): 62- 66
doi: 10.1109/TSMC.1979.4310076
41 DALAL N, TRIGGS B. Histograms of oriented gradients for human detection [C] // IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). Washington: IEEE CS, 2005, 1: 886-893.
42 DE C, T E, BABU B R, et al Character recognition in natural images[J]. Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP), 2009, 7: 273- 280
43 ESTER M, KRIEGEL H P, XU X. A density-based algorithm for discovering clusters in large spatial databases with noise [C] // International Conference on Knowledge Discovery and Data Mining (KDD). New York: ACM, 1996, 96(34): 226-231.
44 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [J]. arXiv Preprint: 1810.04805, 2018.
45 CHOPRA S, AULI M, Rush A M. Abstractive sentence summarization with attentive recurrent neural networks [C] // Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT). San Diego: NAACL, 2016: 93-98.
[1] 徐建明,赵智鹏,董建伟. 连杆侧无传感器下机器人柔性关节系统的零力控制[J]. 浙江大学学报(工学版), 2020, 54(7): 1256-1263.
[2] 蒋卓华, 蒋焕煜, 童俊华. 穴盘苗自动移栽机末端执行器的优化设计[J]. 浙江大学学报(工学版), 2017, 51(6): 1119-1125.
[3] 程时伟, 陆煜华, 蔡红刚. 移动设备眼动跟踪技术[J]. 浙江大学学报(工学版), 2016, 50(6): 1160-1166.
[4] 宫勇, 张三元, 刘志方, 沈法. 颜色对图标视觉搜索效率影响的眼动研究[J]. 浙江大学学报(工学版), 2016, 50(10): 1987-1994.
[5] 厉小军,戴霖,施寒潇,黄琦. 文本倾向性分析综述[J]. J4, 2011, 45(7): 1167-1174.
[6] 陈卫东, 李昕, 刘俊, 郝耀耀, 廖玉玺, 苏煜, 张韶岷, 郑筱祥. 基于数学形态学的眼电信号识别及其应用[J]. J4, 2011, 45(4): 644-649.
[7] 徐韡 刘向东 孟晓. 基于光学信息处理的交互式电子白板[J]. J4, 2006, 40(1): 33-37.