Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2016, Vol. 17 Issue (2): 122-134    DOI: 10.1631/FITEE.1500187
    
基于共同共现群体相似度的社会化标签聚类方法
Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan
1School of Computer and Information, Hefei University of Technology, Hefei 230009, China; 2School of Economics and Management, Anhui University of Science and Technology, Huainan 232001, China; 3School of Computer, Minnan Normal University, Zhangzhou 363000, China
A social tag clustering method based on common co-occurrence group similarity
Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan
1School of Computer and Information, Hefei University of Technology, Hefei 230009, China; 2School of Economics and Management, Anhui University of Science and Technology, Huainan 232001, China; 3School of Computer, Minnan Normal University, Zhangzhou 363000, China
 全文: PDF 
摘要: 目的:社会化标注系统产生了大量具有歧义和非受控的标签,降低了用户体验也限制了资源检索效率。标签聚类能够将具有相似语义的标签聚集在一起,从而缓解上述问题。现有的社会化标签聚类方法基本上从“资源-标签”的二元关系测量标签相似度,并使用K-means和层次聚类等算法实现标签的聚类,容易引起高维、稀疏和标签语义丢失等问题。本文提出一种基于共同共现群体的标签相似度测量方法,利用谱聚类算法实现标签聚类。
创新点:对社会化标注系统中的三元标注关系进行分析,总结出三元关系中最能保持语义关系的标签共现形式。在分析标签个体共现相似度的基础上,利用群体思想,提出标签的共同共现群体相似度,从全局角度精准地刻画标签的语义相似性,并提出一种基于共同共现群体相似度的社会化标签谱聚类方法。
方法:利用共同共现群体相似度来计算两两标签的相似度,建立相似度矩阵(公式(4))。使用谱聚类算法实验标签的聚类,首先使用拉普拉斯(Laplacian)变换对相似度矩阵进行规范化,建立标签的规范化拉普拉斯(Normalized Laplacian)矩阵,然后计算该矩阵的前k个特征值及其对应的特征向量,并将这k个特征向量组成新的特征空间,在此空间上用K-means算法将标签聚成k个类簇(算法1)。
结论:利用内部评价指标SC和Dunn对本文提出的标签聚类方法和其它传统的标签聚类方法进行实验对比。得出基于共同共现群体相似度的标签谱聚类方法在SC和Dunn这两个指标上的值均优于其它传统标签聚类方法;基于共同共现群体相似度的标签谱聚类方法能够获取较好的聚类结果。
关键词: 社会化标注系统标签共现谱聚类群体相似度    
Abstract: Social tagging systems are widely applied in Web 2.0. Many users use these systems to create, organize, manage, and share Internet resources freely. However, many ambiguous and uncontrolled tags produced by social tagging systems not only worsen users’ experience, but also restrict resources’ retrieval efficiency. Tag clustering can aggregate tags with similar semantics together, and help mitigate the above problems. In this paper, we first present a common co-occurrence group similarity based approach, which employs the ternary relation among users, resources, and tags to measure the semantic relevance between tags. Then we propose a spectral clustering method to address the high dimensionality and sparsity of the annotating data. Finally, experimental results show that the proposed method is useful and efficient.
Key words: Social tagging systems    Tag co-occurrence    Spectral clustering    Group similarity
收稿日期: 2015-06-11 出版日期: 2016-02-02
CLC:  TP311  
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
RSS
作者相关文章  
Hui-zong Li
Xue-gang Hu
Yao-jin Lin
Wei He
Jian-han Pan

引用本文:

Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. A social tag clustering method based on common co-occurrence group similarity. Front. Inform. Technol. Electron. Eng., 2016, 17(2): 122-134.

链接本文:

http://www.zjujournals.com/xueshu/fitee/CN/10.1631/FITEE.1500187        http://www.zjujournals.com/xueshu/fitee/CN/Y2016/V17/I2/122

[1] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. 基于两级层次特征学习的图像分类方法[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.