Please wait a minute...
J4  2011, Vol. 45 Issue (5): 934-940    DOI: 10.3785/j.issn.1008-973X.2011.05.027
计算机技术、生物医学工程     
加权成对约束投影半监督聚类
潘俊1,孔繁胜1,王瑞琴2
1.浙江大学 计算机科学与技术学院,浙江 杭州 310027; 2.温州大学 物理与电子信息工程学院,浙江 温州 325035
Semi-supervised clustering with weighted pairwise
constraints projection
PAN Jun1, KONG Fan-sheng1, WANG Rui-qin2
1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;
2. College of Physics and Electronic Information Engineering, Wenzhou University, Wenzhou 325035, China
 全文: PDF  HTML
摘要:

为了充分挖掘成对约束所隐含的信息来指导数据降维和数据聚类,提出一种基于加权成对约束投影的半监督聚类方法.该方法构造成对约束信息的k最近邻集并扩充成对约束集,分析成对约束实例包含的信息量并构造权系数矩阵,在加权成对约束信息的指导下求得投影矩阵,通过投影矩阵将样本数据投影到低维空间,使类内各点紧密分布,类间各点分散分布.同时,通过一种新的评价函数对k均值聚类算法进行改进,能够在尽量不违反成对约束的情况下优化聚类性能,实验结果表明,与现有半监督降维聚类算法相比,新方法能以较低的开销对高维数据进行聚类.

Abstract:

In order to utilize pairwise constraints to full extent in the process of dimension reduction and data clustering, a novel approach called semi-supervised clustering with weighted pairwise constraints projection was developed. The new method expanded the original constraints set by k nearest neighbors of the pairwise constraints, then assigned weights to each pairwise constraint by its information power, and finally found a proper projection matrix guided by the weighted pairwise constraints. With the projection matrix, all the data were projected onto a low-dimensional manifold, so that the intra–class distance is decreased and the inter-class distance is increased. In addition, a new evaluation function was introduced to enforce the k means cluster algorithm, which had enabled it to provide an appealing clustering performance with minimum violation of the pairwise constraints. Experimental results on real world datasets demonstrate the proposed algorithm can deal with high-dimensional data at lower cost compared to state-of-the-art semi-supervised algorithms.

出版日期: 2011-11-24
:  TP 181  
通讯作者: 孔繁胜,男,教授.     E-mail: kfs@cs.zju.edu.cn
作者简介: 潘俊(1978-),男,浙江温州人,博士生,从事机器学习、语义挖掘研究.E-mail:panjun@wzu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  

引用本文:

潘俊,孔繁胜,王瑞琴. 加权成对约束投影半监督聚类[J]. J4, 2011, 45(5): 934-940.

PAN Jun, KONG Fan-sheng, WANG Rui-qin. Semi-supervised clustering with weighted pairwise
constraints projection. J4, 2011, 45(5): 934-940.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2011.05.027        https://www.zjujournals.com/eng/CN/Y2011/V45/I5/934

[1] ZHU Xiaojin. Semisupervised learning literature survey [R].Computer Science TR 1530, USA, University of Wisconsin: Department of Computer Sciences, 2008.
[2] 李昆仑,曹铮,曹丽苹等. 半监督聚类的若干新进展[J].模式识别与人工智能, 2009, 22(5):735-742.
LI Kunlun, CAO Zheng, Cao Liping, et al. Some developments on semisupervised clustering [J]. Pattern Recognition and Artificial Intelligence, 2009, 22(5): 735-742.
[3] WAGSTAFF K, CARDIE C, ROGERS S, et al. Constrained kmeans clustering with background knowledge[C]∥Proceedings of the 18th International Conference on Machine Learning. Williamstown: Morgan Kaufmann Press, 2001: 577-584.
[4] LI Zhengguo, LIU Jianzhuang, TANG Xiaoou. Pairwise constraint propagation by semidefinite programming for semisupervised classification [C]∥Proceedings of the 25th International Conference on Machine Learning. New York: ACM Press, 2008: 576-583.
[5] XING E P, NG A Y, JORDAN M I, et al. Distance metric learning with application to clustering with sideinformation[C]∥Advances in Neural Information Processing Systems 15. Cambridge: MIT Press, 2003:505-512.
[6] 肖宇,于剑. 基于近邻传播算法的半监督聚类[J].软件学报, 2008, 19(11): 2803-2813.
XIAO Yu, YU Jian. SemiSupervised Clustering Based on Affinity Propagation [J]. Journal of Software, 2008, 19(11): 2803-2813.
[7] QI Guojun, TANG Jinhui, Zha Zhengjun, et al. An efficient sparse metric learning in highdimensional space via 1penalized logdeterminant regularization.[C]∥ Proceedings of the 26th International Conference on Machine Learning. New York: ACM Press, 2009: 841-848.
[8] BASU S, BILENKO M, MOONEY R. A probabilistic framework for semisupervised clustering. [C]∥Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Seattle: ACM Press, 2004:59-68.
[9] BILENKO M, BASU S, MOONEY R. Integrating constraints and metric learning in semisupervised clustering[C]∥Proceedings of the 21st International Conference on Machine Learning. Banff: ACM Press, 2004: 81-88.
[10] BARHILLEL A, HERTZ T, SHENTAL N, et al. Learning distance functions using equivalence relations. [C]∥Proceedings of the 20th International Conference on Machine Learning. Washington: Morgan Kaufmann Publishers, 2003:11-18.
[11] TANG Wei, XIONG Hui, ZHONG Shi, et al. Enhancing semisupervised clustering: a feature projection perspective [C]∥Proceedings of the 13th ACM SIGKDD Internal Conference on Knowledge Discovery and Data Mining. New York: ACM Press, 2007:707-716.
[12] ZHANG Daoqiang, ZHOU Zhihua, CHEN Songcan. Semisupervised dimensionality reduction [C] ∥Proceedings of the 7th SIAM International Conference on Data Mining. Minneapolis: [s. n.] 2007:629-634.
[13] 韦佳,彭宏. 基于局部与全局保持的半监督维数约减方法[J].软件学报, 2008,19(11):51-60.
WEI Jia, PENG Hong. A semisupervised dimensionality reduction method based on local and global preserving [J]. Journal of Software, 2008, 19(11): 2833-2842.
[14] 朱凤梅,张道强.张量图像上的半监督降维算法[J].模式识别与人工智能,2009,22(4):574-580.
ZHU Fengmei, ZHANG Daoqiang. Semisupervised dimensionality reduction algorithm of tensor image [J]. Pattern Recognition and Artificial Intelligence, 2009, 22(4): 574-580.
[15] BASU S. Semisupervised clustering: probabilistic models, algorithms and experiments [D]. Austin: University of Texas at Austin, 2005.
[16] WAGSTAFF K, CARDIE C. Clustering with instancelevel constraints [C]∥Proceedings of the 17th International Conference on Machine Learning. Stanford: Morgan Kaufmann Publishers, 2000:1103-1110.
[17] 邓超,郭茂祖.基于Tritraining和数据剪辑的半监督聚类算法[J] .软件学报, 2008,19(3):663-673.
DENG Chao, GUO Maozu. Tritraining and data editing based semisupervised clustering algorithm [J]. Journal of Software, 2008, 19(3): 663-673.
[18] BLAKE C, MERZ J. UCI repository of machine learning databases[DB/OL]. [2010-02-26]. http:∥archive.ics.uci.edu/ml/
[19] GEORGHIADES A S, BELHUMEUR P N, KEIEGMAN D J. Fro

[1] 戴兴虎,钱沄涛,唐凤仙,居斌. 基于图表标题信息的在线生物文献MRI图像检测[J]. J4, 2012, 46(7): 1307-1313.