Please wait a minute...
Front. Inform. Technol. Electron. Eng.  2016, Vol. 17 Issue (2): 122-134    DOI: 10.1631/FITEE.1500187
    
A social tag clustering method based on common co-occurrence group similarity
Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan
1School of Computer and Information, Hefei University of Technology, Hefei 230009, China; 2School of Economics and Management, Anhui University of Science and Technology, Huainan 232001, China; 3School of Computer, Minnan Normal University, Zhangzhou 363000, China
Download:   PDF(0KB)
Export: BibTeX | EndNote (RIS)      

Abstract  Social tagging systems are widely applied in Web 2.0. Many users use these systems to create, organize, manage, and share Internet resources freely. However, many ambiguous and uncontrolled tags produced by social tagging systems not only worsen users’ experience, but also restrict resources’ retrieval efficiency. Tag clustering can aggregate tags with similar semantics together, and help mitigate the above problems. In this paper, we first present a common co-occurrence group similarity based approach, which employs the ternary relation among users, resources, and tags to measure the semantic relevance between tags. Then we propose a spectral clustering method to address the high dimensionality and sparsity of the annotating data. Finally, experimental results show that the proposed method is useful and efficient.

Key wordsSocial tagging systems      Tag co-occurrence      Spectral clustering      Group similarity     
Received: 11 June 2015      Published: 02 February 2016
CLC:  TP311  
Cite this article:

Hui-zong Li, Xue-gang Hu, Yao-jin Lin, Wei He, Jian-han Pan. A social tag clustering method based on common co-occurrence group similarity. Front. Inform. Technol. Electron. Eng., 2016, 17(2): 122-134.

URL:

http://www.zjujournals.com/xueshu/fitee/10.1631/FITEE.1500187     OR     http://www.zjujournals.com/xueshu/fitee/Y2016/V17/I2/122


基于共同共现群体相似度的社会化标签聚类方法

目的:社会化标注系统产生了大量具有歧义和非受控的标签,降低了用户体验也限制了资源检索效率。标签聚类能够将具有相似语义的标签聚集在一起,从而缓解上述问题。现有的社会化标签聚类方法基本上从“资源-标签”的二元关系测量标签相似度,并使用K-means和层次聚类等算法实现标签的聚类,容易引起高维、稀疏和标签语义丢失等问题。本文提出一种基于共同共现群体的标签相似度测量方法,利用谱聚类算法实现标签聚类。
创新点:对社会化标注系统中的三元标注关系进行分析,总结出三元关系中最能保持语义关系的标签共现形式。在分析标签个体共现相似度的基础上,利用群体思想,提出标签的共同共现群体相似度,从全局角度精准地刻画标签的语义相似性,并提出一种基于共同共现群体相似度的社会化标签谱聚类方法。
方法:利用共同共现群体相似度来计算两两标签的相似度,建立相似度矩阵(公式(4))。使用谱聚类算法实验标签的聚类,首先使用拉普拉斯(Laplacian)变换对相似度矩阵进行规范化,建立标签的规范化拉普拉斯(Normalized Laplacian)矩阵,然后计算该矩阵的前k个特征值及其对应的特征向量,并将这k个特征向量组成新的特征空间,在此空间上用K-means算法将标签聚成k个类簇(算法1)。
结论:利用内部评价指标SC和Dunn对本文提出的标签聚类方法和其它传统的标签聚类方法进行实验对比。得出基于共同共现群体相似度的标签谱聚类方法在SC和Dunn这两个指标上的值均优于其它传统标签聚类方法;基于共同共现群体相似度的标签谱聚类方法能够获取较好的聚类结果。

关键词: 社会化标注系统,  标签共现,  谱聚类,  群体相似度 
[1] Guang-hui Song, Xiao-gang Jin, Gen-lang Chen, Yan Nie. Two-level hierarchical feature learning for image classification[J]. Front. Inform. Technol. Electron. Eng., 2016, 17(9): 897-906.