面向Web活跃用户的树型访问模式挖掘算法

doi:10.3785/j.issn.1008-973X.2009.

2009, Vol. 43

Issue (6): 1005-1013 DOI: 10.3785/j.issn.1008-973X.2009.

计算机技术、自动化技术

面向Web活跃用户的树型访问模式挖掘算法

贝毅君,陈刚,董金祥

( 浙江大学计算机科学与技术学院,浙江杭州 310027 )

Mining access patterns of Web active user based on tree structure

BEI Yi-Jun, CHEN Gang, DONG Jin-Xiang

(College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China)

全文: PDF(2337 KB)

摘要：

传统Web挖掘技术面向所有Web用户，而访问网站时活跃用户与非活跃用户表现特征不同.基于此，提出一种面向活跃用户的访问模式挖掘方法，包括活跃用户会话提取算法(AUSM)和树型访问模式挖掘算法(WAPBUM).AUSM扫描一遍日志数据即可挖掘Web活跃用户并提取会话信息,在提取的用户会话信息基础上，利用网站拓扑结构给出了一种基于树结构的频繁访问模式挖掘算法(WAPBUM).WAPBUM针对Web日志挖掘特点，通过对子树构造等价类，自下而上产生频繁子树.人工数据集和真实数据集上的实验都证明AUSM算法的运行时间与Web日志数据量成线性关系，且运行过程中内存保持稳定；WAPBUM在处理带根子树挖掘时明显快于FREQT算法，所挖掘结果可有效应用于网站结构分析.

关键词： Web使用挖掘; Web访问模式; Web日志; 活跃用户; 频繁子树

Abstract:

Conventional Web mining approaches generally employ the Web logs of all users when mining patterns. However, the behaviors of active users and inactive users are usually not the same when visiting the Web site. Therefore, an approach to access pattern mining was introduced, oriented to active users. The session-retrieval algorithm, named active user session miner (AUSM), was proposed to retrieve sessions of active users using one pass scan of the Web logs. Moreover, a tree-mining algorithm, named Web access pattern bottom up miner (WAPBUM), was presented to discover frequent access patterns from the retrieved sessions based on the topology of Web site. Based on the characteristics of the Web logs, WAPBUM builtds the subtree equivalence classes and generated frequent subtrees from bottom to top. Performance of these two algorithms were evaluated both on the synthetic and real datasets. Experimental results show that the proposed algorithms are efficient and effective. AUSM can keep memory stable and its running time is linear to the log scale. WAPBUM is not only more efficient than the previous algorithm FREQT, but also provides useful mining results for analyzing the web structure.

Key words: Web usage mining Web access pattern Web log active user frequent subtree

出版日期: 2009-07-01

TP309.2

基金资助:

国家自然科学基金资助项目（60603044），浙江省重大软件专项基金资助项目（2006c11108），长江学者和创新团队发展计划资助项目（IRT0652）.

通讯作者: 陈刚，男，教授. E-mail: cg@zju.edu.cn

作者简介: 贝毅君（1980－），男，浙江宁波人，博士，从事数据库、数据挖掘技术研究．

	服务
	把本文推荐给朋友
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章
	贝毅君
	陈刚
	董金祥

引用本文:

贝毅君, 陈刚, 董金祥. 面向Web活跃用户的树型访问模式挖掘算法[J]. J4, 2009, 43(6): 1005-1013.

BEI Yi-Jun, CHEN Gang, DONG Jin-Xiang. Mining access patterns of Web active user based on tree structure. J4, 2009, 43(6): 1005-1013.

链接本文:

http://www.zjujournals.com/xueshu/eng/CN/10.3785/j.issn.1008-973X.2009. 或 http://www.zjujournals.com/xueshu/eng/CN/Y2009/V43/I6/1005

［1］ COOLEY R, SRIVASTAVA J. Data preparation for mining World Wide Web browsing patterns ［J］. Journal of Knowledge and Information Systems, 1999, 1(1):532．
［2］邢东山, 沈钧毅, 宋擒豹. 从Web日志中挖掘用户浏览偏爱路径［J］. 计算机学报, 2003, 26(11): 15181523．
XING Dong-shan, SHEN Jun-yi, SONG Qin-bao. Discovering preferred browsing paths from Web Logs ［J］. Chinese Journal of Computers, 2003, 26(11): 15181523．
［3］余轶军,林怀忠,陈纯. 基于竞争凝聚的个性化网页推荐［J］. 浙江大学学报：工学版, 2007, 41(2): 239244．
YU Zhi-jun, LIN Huai-zhong, CHEN Chun. Personalized Web recommending based on competitive agglomeration ［J］. Journal of Zhejiang University:Engineering Science, 2007, 41(2): 239244．
［4］韩家炜, 孟小峰, 王静, 等. Web挖掘研究［J］. 计算机研究与发展, 2001, 38(4): 405414．
HAN Jia-wei, MENG Xiao-feng,WANG Jing,et al. Research on Web mining: a survey ［J］. Journal of Computer Research and Development, 2001, 38(4): 405414．
［5］ LIU B. Web data mining: Exploring hyperlinks, contents, and usage data ［M］. Berlin: Springer, 2007．
［6］ NANOPOULOS A, MANOLOPULOS Y. Mining patterns from graph traversals ［J］. Data & Knowledge Engineering, 2001, 37: 243266．
［7］李颖基, 彭宏, 郑启伦, 等. Web日志中有趣关联规则的发现［J］. 计算机研究与发展, 2003, 40(3): 435439．
LI Ying-ji, PENG Hong, ZHENG Qi-lun, et al. Discovery of interesting association rules in web log data ［J］. Journal of Computer Research and Development, 2003, 40(3): 435439．
［8］欧阳一鸣, 陈敏, 刘红樱,等. Web挖掘中发现用户访问模式算法的改进与分析［J］. 模式识别与人工智能, 2005, 18(6), 728734．
OUYANG Yi-ming,CHEN Min,LIU Hong-ying, et al. Improvement and analysis of algorithm for discovering users frequent access patterns on web mining ［J］. Pattern Recognition and Artificial Intelligence, 2005, 18(6):728734．
［9］ PEI J, HAN J W, MORTAZAVI-ASL B, et al. Mining access patterns efficiently from web logs ［C］∥ Proceedings of the 4th PAKDD. Kyoto, Japan: Springer, 2000: 396407．
［10］ FIOT C, LAURENT A, TEISSEIRE M. Web access log mining with soft sequential patterns ［C］∥ Proceedings of the 7th International FLINS Conference on Applied Artificial Intelligence. Genova, Italy: World Scientific, 2006．
［11］ ZHOU B Y, HUI S C, FONG A C M. Efficient sequential access pattern mining for Web recommendations ［J］. International Journal of Knowledge-based and Intelligent Engineering Systems, 2006, 10(2): 155168．
［12］ ASAI T, ABE K, KAWASOE S. Efficient substructure discovery from large semi-structured data ［C］∥ Proceedings of the 2nd SIAM Int′l Conference on Data Mining. Arlington: IEEE, 2002: 158174．
［13］ ZAKI M J. Efficiently mining frequent trees in a forest ［C］∥ Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton: IEEE, 2002: 7180．
［14］ CHI Y, YANG Y, MUNTZ R R. HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms ［C］∥ Proceedings of the 16th International Conference on Scientific and Statistical Database Management. Washington: IEEE Computer Society, 2004．
［15］朱永泰, 王晨, 洪铭胜,等. ESPM—频繁子树挖掘算法［J］. 计算机研究与发展, 2004, 41(10): 17201727．
ZHU Yong-tai, WANG Chen, HONG Ming-sheng, et al. ESPM-An algorithm to mine frequent subtrees ［J］. Journal of Computer Research and Development, 2004, 41(10): 17201727．
［16］ CHEHREGHANI M H, RAHGOZAR M, LUCAS C. Mining maximal embedded unordered tree patterns ［C］∥ Proceedings of Computational Intelligence and Data Mining. Honolulu: IEEE, 2007: 437443.
［17］ LI G L, FENG J H, WANG J Y. Incremental mining of frequent query patterns from XML queries for caching ［C］∥ Proceedings of the 2006 IEEE Int. Conf. on Data Mining. Hong Kong, China: IEEE, 2006: 350361．
［18］ GU M S, HWANG J H, RYU K H. Frequent XML query pattern mining based on FP-Tree ［C］∥ Proceedings of DEXA Workshops. Regensburg, Germany: IEEE, 2007: 555559．
［19］ MANKU G S, MOTWANI R. Approximate frequency counts over data streams［C］∥ Proceedings of the 28th VLDB. Hong Kong, China: Morgan Kaufmann 2002:346357．

No related articles found!

Viewed

Full text

Abstract

Cited

Shared

Discussed