Please wait a minute...
浙江大学学报(工学版)  2017, Vol. 51 Issue (10): 1891-1900    DOI: 10.3785/j.issn.1008-973X.2017.10.002
自动化技术     
基于节点拓扑特性的网站无障碍抽样方法
高斐1, 陈荣华2, 卜佳俊3, 于智3, 王鹰汉4, 田甜5
1. 莆田学院 信息工程学院, 福建 莆田 310011;
2. 江西财经职业学院, 江西 九江 332000;
3. 浙江大学 浙江省服务机器人重点实验室, 浙江 杭州 310027;
4. 上饶职业技术学院, 江西 上饶 334109;
5. 审计署驻上海特派员办事处, 上海 200051
Web accessibility sampling method based on node topology characteristics
GAO Fei1, CHEN Rong-hua2, BU Jia-jun3, YU Zhi3, WANG Ying-han4, TIAN Tian5
1. College of Information Engineering, Putian University, Putian 310011, China;
2. Jiangxi Vocational College of Finance and Economics, Jiujiang 332000, China;
3. Zhejiang Provincial Key Laboratory of Service Robot, Zhejiang University, Hangzhou 310027, China;
4. Shangrao Vocational and Technical College, Shangrao 334109, China;
5. Shanghai Agency of National Audit Office, Shanghai 200051, China
 全文: PDF(2069 KB)   HTML
摘要:

针对已有无障碍网站抽样算法抽取的样本代表性不高,难以满足整体样本数据的分布特征,导致抽样误差大等问题,从网页节点间的拓扑结构入手,提出基于节点拓扑特性的间隔抽样算法.把每个网页作为一个节点,通过邻近构图算法(KNN)建立网页相似度拓扑图;根据节点局部和全局拓扑性质,对节点重要性进行评估和排序;在排序结果的基础上,采用间隔抽样算法,实现不同拓扑区域的分布抽样.真实残联网站上的实验数据表明,基于节点拓扑特性的间隔抽样算法与其他算法相比,在均值误差和分布性上具有更好的效果.

Abstract:

As the existing sampling methods for web accessibility evaluation could not provide the samples which could give good representation of the entire website, the sampling methods could not reflect the distribution characteristics of the website sample data, which lead to some problems that make big sampling errors. A novel interval sampling algorithm based on the node's topological characteristics was proposed starting with the topological structure between web nodes in order to solve the problem. Each page was treated as a node and the similarity topological graph between web pages was constructed by the KNN-Graph algorithm. Then the importance of each node was obtained by its local and global topological characteristics and was sorted to get an orderly sequence of all the pages. The pages with interval sampling algorithm were chosen based on the sorting results. The method can achieve distributed sampling in different topological regions. The experimental data on real disabled person federation website shows that the method can achieve better results by obtaining smaller mean errors and more extensive distribution of the samples than other algorithms.

收稿日期: 2016-12-28 出版日期: 2017-09-27
CLC:  TP391  
基金资助:

国家科技支撑计划资助项目(2014BAK15B02);国家自然科学基金资助项目(61173185,61173186);浙江省自然科学基金资助项目(LZ13F020001).

通讯作者: 卜佳俊,男,教授,博士.     E-mail: bjj@cs.zju.edu.cn
作者简介: 高斐(1982-),女,讲师,研究生,从事数据挖掘、人工智能、计算机网络研究.ORCID:0000-0002-4509-6718.E-mail:gaofei8237@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

高斐, 陈荣华, 卜佳俊, 于智, 王鹰汉, 田甜. 基于节点拓扑特性的网站无障碍抽样方法[J]. 浙江大学学报(工学版), 2017, 51(10): 1891-1900.

GAO Fei, CHEN Rong-hua, BU Jia-jun, YU Zhi, WANG Ying-han, TIAN Tian. Web accessibility sampling method based on node topology characteristics. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2017, 51(10): 1891-1900.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2017.10.002        http://www.zjujournals.com/eng/CN/Y2017/V51/I10/1891

[1] 刘智慧,张泉灵.大数据技术研究综述[J].浙江大学学报:工学版,2014,48(6):957-972. LIU Zhi-hui, ZHANG Quan-ling. Reseach overview of big data technology[J]. Journal of Zhejiang University:Engineering Science, 2014, 48(6):957-972.
[2] 中国残疾人联合会关于使用2010年末全国残疾人总数及各类、不同残疾等级人数的通知[R/OL].[2012-06-26]. http://www.cdpf.org.cn/ggtz/201203/t20120312_410693.shtml.
[3] WHO,世界银行世界残疾问题报告[R/OL].[2011-06-09]. http://www.docin.com/p-297463652.html.
[4] 刘学慧,王东博,曲波,等.随机抽样方法在医学教学中的应用及计算机实现[J].中国医科大学学报,2013,42(5):460-462. LU Xue-hui, WANG Dong-bo, QU Bo, et al. Application of random sampling method in medical teaching and comuter implementation[J]. Journal of China Medical University, 2013, 42(5):460-462.
[5] WEST P W. Sample random sampling of individualitems in the absence of a sampling frame that lists the individuals[J]. New Zealand Journal of ForestryScience, 2016, 46(15):1-7.
[6] HUANG Z. Extensions to the Kmeans algorithm for clustering large data sets with categorical values[J]. Data Mining and Knowledge Discovery, 1998, 2(3):283-304.
[7] PAGE L, BRIN S, MOTWANI R, et al. Technical report, Stanford Digital Library Technologies Project. The PageRank citation ranking:bringing order to the web[R/OL].[2001-10-13]. http://dbpubs.stanford.edu/pub/1999-66.
[8] TAHER H H. Topic-sensitive PageRank[C]//Proceedings of the 11th International Conference on World Wide Web. Honolulu:IEEE, 2002:517-526.
[9] TAHER H H. Technical report, Stanford University. Efficient computation of PageRank[R/OL].[2000-02-25]. http://dbpubs.stanford.edu:8090/pub/1999-31.
[10] 肖俐平,孟晖,李德毅.基于拓扑势的网络节点重要性排序及评价方法[J].武汉大学学报:信息科学版,2008,33(4):380-383. XIAO Li-ping, MENG Hui, LI De-yi. Approach to node ranking in a network based on topology potential[J]. Geomatics and Information Science of Wuhan Uni-versity, 2008, 33(4):380-383.
[11] XU H, ZHANG J P, YANG J. Measurement of nodes importance for structural-holes-oriented[C]//Proceedings of Pioneering Computer Scientists, Engineers and Educators(ICYCSEE). Harbin:Springer, 2016:458-469.
[12] ULRIKE V L. A tutorial on spectral clustering[J]. Statistics and Computing. 2007, 17(4):395-416.
[13] ZHANG S C, ZONG M, SUN K, et al. Efficient KNN algorithm based on graph sparse reconstruction[C]//Proceedings of Advanced Data Mining and Applications(ADMA). Guilin:[s. n.], 2014:356-369.
[14] MASAJIRO I. Pruned bi-directed K-nearest neighbor graph for proximity search[C]//Proceedings of Similarity Search and Applications (SISAP). Tokyo:Springer, 2016:20-33.
[15] UGUR D, FAMOUSH B, CYRUS S, et al. Towards K-nearest neighbor search in time dependent spatial network databases[C]//Proceedings of Databases in Networked Information Systems(DNIS). Aizu-Wakamatsu:Springer, 2010:296-310.
[16] MAX K, JORGEN Z. A. Similarity measure in Bayesian classification based on characteristic attributes of objects[C]//Proceedings of Information Fusion. Heidelberg:IEEE, 2016, 5:1-8.
[17] ALFIRNA R L, ADHISTYA E P, NOOR A S. Cosine similarity to determine similarity measure:Study case in online essay assessment[C]//Proceedings of International Conference on Cyber and it Service management. Djakarta:IEEE, 2016.
[18] GOBEL F, JAGERS A A. Random walks on graphs[J]. Stochastic Processes and Their Applications, 1974, 2(4):311-336.
[19] SEAN G, TINA E R, LAN D. Guided learning for role discovery (GLRD):framework, algorithms, and applications[C]//Proceedings of Knowledge Discovery and Data Mining. Chicago:ACM, 2013:113-121.
[20] WATTS D J, STROGATZ S H. Collective dynamics of "small-world" networks[J]. Nature. 1998, 393:440-442.
[21] MURIEL G, RICARDO A, JOSE C Q, et al. Encyclopedia of astrobiology[M]. Berlin:Springer, 2011,3(3):28-29.
[22] FREEMAN L C. A set of measures of centrality based upon betweenness[J]. Sociometry, 1977, 40(1):35-41.
[23] 于会,刘尊,李勇.基于多属性决策的复杂网络节点重要性综合评价方法[J].物理学报,2013, 62(2):41-49. YU Hui, LIU Zun, LI Yong. Key nodes in complex networks identified by multi-attribute decision-making method[J]. Acta Physica Sinica, 2013, 62(2):41-49.
[24] HENZINGER M R, HEYDON A, MITZENMACHER M, et al.On near-uniform URL sampling[J].Computer Networks, 2000, 33(1):295-308.
[25] ZHANG M N, WANG C, BU J J, et al. A sampling method based on URL clustering for fast Web accessibility evaluation[J]. Frontiers of Information Technology and Electronic Engineering, 2015, 16(6):449-456.
[26] 周宇.基于抽样和模板的网站无障碍检测方法[D].杭州:浙江大学, 2014. ZHOU Yu. Web accessibility evaluation approaches based on sampling and template detection[D]. Hangzhou:Zhejiang University, 2014.

[1] 韩勇, 宁连举, 郑小林, 林炜华, 孙中原. 基于社交信息和物品曝光度的矩阵分解推荐[J]. 浙江大学学报(工学版), 2019, 53(1): 89-98.
[2] 郑洲, 张学昌, 郑四鸣, 施岳定. 基于区域增长与统一化水平集的CT肝脏图像分割[J]. 浙江大学学报(工学版), 2018, 52(12): 2382-2396.
[3] 赵丽科, 郑顺义, 王晓南, 黄霞. 单目序列的刚体目标位姿测量[J]. 浙江大学学报(工学版), 2018, 52(12): 2372-2381.
[4] 何杰光, 彭志平, 崔得龙, 李启锐. 局部维度改进的教与学优化算法[J]. 浙江大学学报(工学版), 2018, 52(11): 2159-2170.
[5] 李志, 单洪, 马涛, 黄郡. 基于反向标签传播的移动终端用户群体发现[J]. 浙江大学学报(工学版), 2018, 52(11): 2171-2179.
[6] 王硕朋, 杨鹏, 孙昊. 听觉定位数据库构建过程优化[J]. 浙江大学学报(工学版), 2018, 52(10): 1973-1979.
[7] 魏小峰, 程承旗, 陈波, 王海岩. 基于独立边数的链码方法[J]. 浙江大学学报(工学版), 2018, 52(9): 1686-1693.
[8] 陈荣华, 王鹰汉, 卜佳俊, 于智, 高斐. 基于KNN算法与局部回归的网站无障碍采样评估[J]. 浙江大学学报(工学版), 2018, 52(9): 1702-1708.
[9] 张承志, 冯华君, 徐之海, 李奇, 陈跃庭. 图像噪声方差分段估计法[J]. 浙江大学学报(工学版), 2018, 52(9): 1804-1810.
[10] 刘洲洲, 李士宁, 李彬, 王皓, 张倩昀, 郑然. 基于弹性碰撞优化算法的传感云资源调度[J]. 浙江大学学报(工学版), 2018, 52(8): 1431-1443.
[11] 王勇超, 祝凯林, 吴奇轩, 鲁东明. 基于局部渲染的高精度模型自适应展示技术[J]. 浙江大学学报(工学版), 2018, 52(8): 1461-1466.
[12] 孙念, 李玉强, 刘爱华, 刘春, 黎威威. 基于松散条件下协同学习的中文微博情感分析[J]. 浙江大学学报(工学版), 2018, 52(8): 1452-1460.
[13] 郑守国, 崔雁民, 王青, 杨飞, 程亮. 飞机装配现场数据采集平台设计[J]. 浙江大学学报(工学版), 2018, 52(8): 1526-1534.
[14] 毕晓君, 王朝. 基于超平面投影的高维多目标进化算法[J]. 浙江大学学报(工学版), 2018, 52(7): 1284-1293.
[15] 张廷蓉, 滕奇志, 李征骥, 卿粼波, 何小海. 岩心三维CT图像超分辨率重建[J]. 浙江大学学报(工学版), 2018, 52(7): 1294-1301.