Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (2): 299-309    DOI: 10.3785/j.issn.1008-973X.2023.02.010
    
Deep clustering via high-order mutual information maximization and pseudo-label guidance
Chao LIU1(),Bing KONG1,*(),Guo-wang DU2,Li-hua ZHOU1,Hong-mei CHEN1,Chong-ming BAO3
1. School of Information Science and Engineering, Yunnan University, Kunming 650504, China
2. South-Western Institute For Astronomy Research, Yunnan University, Kunming 650504, China
3. School of Software, Yunnan University, Kunming 650504, China
Download: HTML     PDF(2168KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A high-order mutual information maximization and pseudo-label guided deep clustering model, HMIPDC, was proposed to solve the problem that the existing clustering methods cannot fully explore the topological structure and node relationships of the graph, and cannot benefit from inaccurate labels predicted by the model. The high-order mutual information maximization strategy was adopted to maximize the mutual information among the global representation of the graph, node representation, and node attribute information. Low-dimensional representations of nodes were extracted more reasonably through a self-attention mechanism combined with multi-hop proximity matrices. A deep divergence-based clustering loss function (DDC) was used to iteratively optimize the clustering objective while high confidence predicted labels were utilized to supervise the learning of low-dimensional representations. Experimental results of clustering tasks, experimental time analysis and clustering visualization analysis on four benchmark datasets show that the clustering performance of HMIPDC is always better than that of most deep clustering methods. The effectiveness and the stability of the model were also verified by ablation study and parameter sensitivity analysis.



Key wordsself-supervised learning      deep clustering      self-attention mechanism      high-order mutual information      pseudo-label     
Received: 30 July 2022      Published: 28 February 2023
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(62062066, 61762090, 31760152, 61966036, 62266050, 62276227); 2022年云南省基础研究计划重点项目(202201AS070015); 云南省中青年学术和技术带头人后备人才项目(202205AC160033)
Corresponding Authors: Bing KONG     E-mail: chaoliu@mail.ynu.edu.cn;kongbing@ynu.edu.cn
Cite this article:

Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.02.010     OR     https://www.zjujournals.com/eng/Y2023/V57/I2/299


高阶互信息最大化与伪标签指导的深度聚类

针对现有聚类方法未充分探索图的拓扑结构和节点关系,且无法受益于模型预测的不精确标签的问题,提出一种高阶互信息最大化与伪标签指导的深度聚类模型HMIPDC. 该模型采用高阶互信息最大化策略来最大化图的全局表示、节点表示、节点属性信息之间的互信息. 通过一种结合多跳邻近矩阵的自注意力机制更加合理地提取节点的低维表征. 使用基于深度散度的聚类损失函数(DDC)迭代优化聚类目标,抽取高置信度的预测标签对低维表征的学习进行监督. 在4个基准数据集上的聚类任务、实验时间分析和聚类可视化分析充分表明,HMIPDC的聚类性能始终优于大多数的深度聚类方法. 通过消融研究和参数敏感性分析验证了该模型的有效性和稳定性.


关键词: 自监督学习,  深度聚类,  自注意力机制,  高阶互信息,  伪标签 
Fig.1 Schematic diagram of HMIPDC model
数据集 $n$ $K$ $d$ $\xi $
ACM 3 025 3 1 870 13 128
Citeseer 3 327 6 3 703 4 552
DBLP 4 057 4 334 3 528
AMAP 7 650 8 745 119 081
Tab.1 Statistics of four benchmark datasets
%
数据集 方法 ACC NMI ARI F1
ACM K-means 67.31±0.71 32.44±0.46 30.60±0.69 67.57±0.74
AE 81.83±0.08 49.30±0.16 59.64±0.16 82.01±0.08
IDEC 85.12±0.52 56.61±1.16 62.16±1.50 85.11±0.48
GAE 84.52±1.44 55.38±1.92 59.46±3.10 84.65±1.33
DAEGC 88.24±0.02 63.01±0.04 68.70±0.04 88.11±0.01
SDCN 90.45±0.18 68.31±0.25 73.91±0.40 90.42±0.19
DFCN 90.90±0.20 64.40±0.40 74.90±0.40 90.80±0.20
DCRN 91.93±0.20 71.56±0.61 77.56±0.52 91.94±0.20
HMIPDC 92.12±0.15 72.18±0.52 78.04±0.39 92.13±0.14
Citeseer K-means 39.32±3.17 16.94±3.22 13.43±3.02 36.08±3.53
AE 57.08±0.13 27.64±0.08 29.31±0.14 53.80±0.11
IDEC 60.49±1.42 27.17±2.40 25.70±2.65 61.62±1.39
GAE 61.35±0.80 34.63±0.65 33.55±1.18 57.36±0.82
DAEGC 64.90±0.07 38.71±0.08 39.21±0.09 59.56±0.06
SDCN 65.96±0.31 38.71±0.32 40.17±0.43 63.62±0.24
DFCN 69.50±0.20 43.90±0.20 45.50±0.30 64.30±0.20
DCRN 70.86±0.18 45.86±0.35 47.64±0.30 65.83±0.21
HMIPDC 71.93±0.31 46.07±0.23 48.28±0.50 66.96±0.26
DBLP K-means 38.65±0.65 11.45±0.38 6.97±0.39 31.92±0.27
AE 51.43±0.35 25.40±0.16 12.21±0.43 52.53±0.36
IDEC 60.31±0.62 31.17±0.50 25.37±0.60 61.33±0.56
GAE 61.21±1.22 30.80±0.91 22.02±1.40 61.41±2.23
DAEGC 67.42±0.38 30.64±0.46 32.79±0.58 66.89±0.37
SDCN 68.05±1.81 39.50±1.34 39.15±2.01 67.71±1.51
DFCN 76.00±0.80 43.70±1.00 47.00±1.50 75.70±0.80
DCRN 79.66±0.25 48.95±0.44 53.60±0.46 79.28±0.26
HMIPDC 80.34±0.16 49.41±0.34 55.39±0.28 79.76±0.32
AMAP K-means 27.22±0.76 13.23±1.33 5.50±0.44 23.96±0.51
AE 48.25±0.08 38.76±0.30 20.80±0.47 47.87±0.20
IDEC 47.62±0.08 37.83±0.08 19.24±0.07 47.20±0.11
GAE 71.57±2.48 62.13±2.79 48.82±4.57 68.08±1.76
DAEGC 75.52±0.01 63.31±0.01 59.98±0.84 70.02±0.01
SDCN 53.44±0.81 44.85±0.83 31.21±1.23 50.66±1.49
DFCN 76.88±0.80 69.21±1.00 59.98±0.84 71.58±0.31
DCRN 79.94±0.13 73.70±0.24 63.69±0.20 73.82±0.12
HMIPDC 80.83±0.78 69.47±0.46 65.23±1.64 75.87±2.08
Tab.2 Clustering results of HMIPDC and eight baseline methods on four datasets
%
数据集 方法 ACC NMI ARI F1
ACM AD 91.87±0.12 70.92±0.29 77.49±0.32 91.67±0.29
AD-MI 91.94±0.15 71.50±0.33 77.57±0.36 91.95±0.15
AD-PL 92.07±0.12 72.08±0.23 77.92±0.28 92.08±0.12
AD-MI-PL 92.12±0.15 72.18±0.52 78.04±0.39 92.13±0.14
Citeseer AD 64.80±0.98 39.25±0.92 40.28±0.69 62.86±0.64
AD-MI 70.65±0.89 44.82±0.50 47.47±0.68 66.56±0.38
AD-PL 70.67±0.58 43.90±0.73 46.08±0.91 64.57±0.78
AD-MI-PL 71.93±0.31 46.07±0.23 48.28±0.50 66.96±0.26
DBLP AD 78.30±0.62 46.94±0.45 52.07±0.75 77.72±0.59
AD-MI 79.05±0.35 47.83±0.57 52.96±0.61 78.60±0.34
AD-PL 79.21±0.80 48.66±0.71 54.24±0.53 78.83±0.57
AD-MI-PL 80.34±0.16 49.41±0.34 55.39±0.28 79.76±0.32
AMAP AD 72.67±1.19 61.62±1.15 54.17±1.05 66.25±2.35
AD-MI 76.68±0.98 65.53±0.85 58.84±0.89 73.17±2.47
AD-PL 74.36±0.91 64.11±0.95 57.23±1.36 71.64±2.16
AD-MI-PL 80.83±0.78 69.47±0.46 65.23±1.64 75.87±2.08
Tab.3 Clustering results of HMIPDC and three variants on four datasets
Fig.2 Clustering accuracy on four datasets with different hyperparameters
Fig.3 Clustering accuracy of five methods with different training time on ACM dataset
Fig.4 2D visualization results of low-dimensional representations of different methods on ACM dataset
[1]   ZHAN Z H, LI J Y, ZHANG J Evolutionary deep learning: a survey[J]. Neurocomputing, 2022, 483: 42- 58
doi: 10.1016/j.neucom.2022.01.099
[2]   LIN Y J, GOU Y B, LIU Z T, et al. COMPLETER: incomplete multi-view clustering via contrastive prediction[C]// 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. [s. l. ]: CVPR, 2021: 11174-11183.
[3]   XIE J Y, GIRSHICK R, FARHADI A. Unsupervised deep embedding for clustering analysis[C]// Proceedings of the 33rd International Conference on Machine Learning. New York: ICML, 2016: 478-487.
[4]   GUO X F, GAO L, LIU X W, et al. Improved deep embedded clustering with local structure preservation[C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: IJCAI, 2017: 1753-1759.
[5]   ZHANG S S, LIU J W, ZUO X, et al Online deep learning based on auto-encoder[J]. Applied Intelligence, 2021, 51 (8): 5420- 5439
doi: 10.1007/s10489-020-02058-8
[6]   WU Z H, PAN S R, CHEN F W, et al A comprehensive survey on graph neural networks[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 32 (1): 4- 24
doi: 10.1109/TNNLS.2020.2978386
[7]   KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks[C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017: 1-14.
[8]   WANG C, PAN S R, HU R Q, et al. Attributed graph clustering: a deep attentional embedding approach[C]// Proceedings of the 28th International Joint Conference on Artificial Intelligence. Macao: IJCAI, 2019: 3670-3676.
[9]   BO D Y, WANG X, SHI C, et al. Structural deep clustering network[C]// Proceedings of the Web Conference 2020. Taipei: WWW, 2020: 1400-1410.
[10]   杜国王, 周丽华, 王丽珍, et al 基于两级权重的多视角聚类[J]. 计算机研究与发展, 2022, 59 (4): 907- 921
DU Guo-wang, ZHOU Li-hua, WANG Li-zhen, et al Multi-view clustering based on two-level weights[J]. Journal of Computer Research and Development, 2022, 59 (4): 907- 921
doi: 10.7544/issn1000-1239.20200897
[11]   KAMPFFMEYER M, LØKSE S, BIANCHI F M, et al Deep divergence-based approach to clustering[J]. Neural Networks, 2019, 113: 91- 101
doi: 10.1016/j.neunet.2019.01.015
[12]   MOLAEI S, BOUSEJIN N G, ZARE H, et al Deep node clustering based on mutual information maximization[J]. Neurocomputing, 2021, 455: 274- 282
doi: 10.1016/j.neucom.2021.03.020
[13]   陈亦琦, 钱铁云, 李万理, et al 基于复合关系图卷积的属性网络嵌入方法[J]. 计算机研究与发展, 2020, 57 (8): 1674- 1682
CHEN Yi-qi, QIAN Tie-yun, LI Wan-li, et al Exploiting composite relation graph convolution for attributed network embedding[J]. Journal of Computer Research and Development, 2020, 57 (8): 1674- 1682
doi: 10.7544/issn1000-1239.2020.20200206
[14]   KIPF T N, WELLING M. Variational graph auto-encoders[C]// Bayesian Deep Learning Workshop on 30th Conference on Neural Information Processing Systems. Barcelona: NIPS, 2016.
[15]   KOU S W, XIA W, ZHANG X D, et al Self-supervised graph convolutional clustering by preserving latent distribution[J]. Neurocomputing, 2021, 437: 218- 226
doi: 10.1016/j.neucom.2021.01.082
[16]   TU W X, ZHOU S H, LIU X W, et al. Deep fusion clustering network[C]// 35th AAAI Conference on Artificial Intelligence. [s.l.]: AAAI, 2021: 9978-9987.
[17]   LIU Y, TU W X, ZHOU S H, et al. Deep graph clustering via dual correlation reduction[C]// 36th Conference on Artificial Intelligence. Vancouver: AAAI, 2022: 7603-7611.
[18]   BELGHAZI M I, BARATIN A, RAJESWAR S, et al. MINE: mutual information neural estimation[C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018: 531-540.
[19]   HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization[C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[20]   VELIČKOVIĆ P, FEDUS W, HAMILTON W L, et al. Deep graph infomax[C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[21]   JING B Y, PARK C Y, TONG H H. HDMI: high-order deep multiplex infomax[C]// Proceedings of the Web Conference 2021. New York: WWW, 2021: 2414-2424.
[22]   MCGILL W J Multivariate information transmission[J]. Transactions of the IRE Professional Group on Information Theory, 1954, 4 (4): 93- 111
doi: 10.1109/TIT.1954.1057469
[23]   VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks[C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017.
[24]   RIZVE M N, DUARTE K, RAWAT Y S, et al. In defense of pseudo-labeling: an uncertainty-aware pseudo-label selection framework for semi-supervised learning[C]// 9th International Conference on Learning Representations. [s. l. ]: ICLR, 2021.
[25]   HARTIGAN J A, WONG M A A K-means clustering algorithm[J]. Journal of the Royal Statistical Society Series C Applied Statistics, 1979, 28 (1): 100- 108
[26]   ZHAO H, YANG X, WANG Z R, et al. Graph debiased contrastive learning with joint representation clustering[C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence. [s.l.]: IJCAI, 2021: 3434-3440.
[27]   LV J C, KANG Z, LU X, et al Pseudo-supervised deep subspace clustering[J]. IEEE Transactions on Image Processing, 2021, 30: 5252- 5263
doi: 10.1109/TIP.2021.3079800
[28]   BOUYER A, ROGHANI H LSMD: a fast and robust local community detection starting from low degree nodes in social networks[J]. Future Generation Computer Systems, 2020, 113: 41- 57
doi: 10.1016/j.future.2020.07.011
[29]   KINGMA D P, BA J. Adam: a method for stochastic optimization[C]// 3rd International Conference for Learning Representations. San Diego: ICLR, 2015.
[30]   MAATENL V D, HINTON G Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579- 2605
[31]   周丽华, 王家龙, 王丽珍, 等 异质信息网络表征学习综述[J]. 计算机学报, 2022, 45 (1): 160- 189
ZHOU Li-hua, WANG Jia-long, WANG Li-zhen, et al Heterogeneous information network representation learning: a survey[J]. Chinese Journal of Computers, 2022, 45 (1): 160- 189
doi: 10.11897/SP.J.1016.2022.00160
[1] Tian-qi ZHOU,Yan YANG,Ji-jie ZHANG,Shao-wei YIN,Zeng-qiang GUO. Graph contrastive learning based on negative-sample-free loss and adaptive augmentation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 259-266.
[2] Xiao-chen JU,Xin-xin ZHAO,Sheng-sheng QIAN. Self-attention mechanism based bridge bolt detection algorithm[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 901-908.
[3] Ying-li LIU,Rui-gang WU,Chang-hui YAO,Tao SHEN. Construction method of extraction dataset of Al-Si alloy entity relationship[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 245-253.
[4] Nan-jing YU,Xiao-biao FAN,Tian-min DENG,Guo-tao MAO. Ship detection algorithm in complex backgrounds via multi-head self-attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(12): 2392-2402.