Existing community search methods are difficult to deal with the complex and diverse search requirements of users and difficult to integrate network structure and node attributes to measure the correlation between nodes in high-dimensional and sparse heterogeneous information networks (HINs). In order to solve the problems, the community search problem of mutual information maximization over HINs was proposed, the definition of community with mutual information maximization was given, and a corresponding community search method, community search with mutual information maximization (CSMIM) was designed. The user's search requirements were defined as query constraints, and a query constraint deep graph infomax (QC-DGI) model was proposed to learn node embedding by extracting the structure, semantics and node attribute information in HINs to effectively calculate the mutual information between nodes. Then, according to the given query information, the mutual information maximization criterion was used to search the target community. In addition, an optimization strategy based on user feedback was proposed to realize the personalized calculation of mutual information from global to local, so as to improve the accuracy of search results. Finally, extensive experiments on real HINs dataset were performed, and experimental results prove that the proposed method can effectively search the community of a given node according to the search requirements, and has higher accuracy than the representative baseline methods.
Ya-feng WANG,Li-hua ZHOU,Wei CHEN,Li-zhen WANG,Hong-mei CHEN. Community search with mutual information maximization over heterogeneous information networks. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 287-298.
Fig.1Heterogeneous information networks and its schema
Fig.2Connected graph based on query constraints
Fig.3Illustration of community search with mutual information maximization
Fig.4Framework of query constraints deep graph infomax model
Fig.5Illustration of interactive community search with mutual information maximization
数据集
节点类型
Nn
边类型
Ne
Nf
ACM
Paper(P) Author(A) Subject(S)
3 025 5 835 36
Paper-Author Paper-Subject
9 744 3 025
1 870
IMDB
Movie(M) Actor(A) Director(D) Keyword(K)
4 275 5 431 2 082 7 313
Movie-Actor Movie-Director Movie-Keyword
12 838 4 280 20 529
6 334
DBLP
Author(A) Paper(P) Conference(C) Term(T)
4 057
Author-Paper Paper-Conference Paper-Term
19 645 14 328 88 420
334
14 328
20
8 789
Tab.1Parameter summary of dataset in experiments
方法
查询约束: AMA
查询约束: AMA,
$A\mathop \to \limits^2 M$
查询约束: AMA, AMDMA, AMKMA,
$A\mathop \to \limits^2 M$
FastBCore
Fionnula F, Matthew S, Elizabeth M, James E, A.J. Buckley, Frances F, Kim D等 714人
Null
Null
CSMIM
Fionnula F, Matthew S, Elizabeth M, James E, A.J. Buckley, Frances F, Kim D等 30人
Fionnula F, Matthew S, Elizabeth M, Johnny D, A.J. Buckley, Frances F, Kim D等 30人
Fionnula F, Matthew S, Elizabeth M, Johnny D, Dennis Q, Xzibit, Toby J等 30人
Tab.2Comparison of community search results under different query constraints
Fig.6Efficiency of community search methods
Fig.7Performance of optimized method ICSMIN
数据集
方法
Pre
Mod
PS
MI
ACM
FastBCore
0.330
0.90
0.044
11.490
ICS-GNN
0.938
0.43
0.237
14.195
ICSMIM
0.952
0.53
0.260
14.209
DBLP
FastBCore
0.280
0.88
0.178
7.340
ICS-GNN
0.825
0.49
0.190
16.320
ICSMIM
0.852
0.59
0.193
16.340
IMDB
FastBCore
0.345
0.93
0.007
10.960
ICS-GNN
0.515
0.35
0.046
13.257
ICSMIM
0.570
0.45
0.049
13.262
Tab.3Performance comparison of community search methods
Fig.8Influence of community size on search results
Fig.9Influence of embedding models on search results
Fig.10Influence of search algorithm on search results
[1]
石川, 王睿嘉, 王啸 异质信息网络分析与应用综述[J]. 软件学报, 2022, 33 (2): 24 SHI Chuan, WANG Rui-jia, WANG Xiao A survey of heterogeneous information networks analysis and applications[J]. Journal of Software, 2022, 33 (2): 24
doi: 10.13328/j.cnki.jos.006357
[2]
REN Y, LIU B, HUANG C, et al. Heterogeneous deep graph infomax [C]// 8th International Conference on Learning Representations. New Orleans: ICLR, 2020.
[3]
WANG C, PAN S, HU R, et al. Attributed graph clustering: a deep attentional embedding approach [C]// Proceedings of the 28th International Joint Conferences on Artificial Intelligence Organization. Macao: IJCAI, 2019: 3670-3676.
[4]
FANG Y, HUANG X, QIN L, et al A survey of community search over big graphs[J]. The VLDB Journal, 2020, 29 (1): 353- 392
doi: 10.1007/s00778-019-00556-x
[5]
FANG Y, YANG Y, ZHANG W, et al Effective and efficient community search over large heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (6): 854- 867
[6]
YANG Y, FANG Y, LIN X, et al. Effective and efficient truss computation over large heterogeneous information networks[C]// 2020 IEEE 36th International Conference on Data Engineering (ICDE). [s.l.]: IEEE, 2020: 901-912.
[7]
QIAO L, ZHANG Z, YUAN Y, et al. Keyword-centric community search over large heterogeneous information networks [C]// 26th International Conference on Database Systems for Advanced Applications. Taipei: DASFAA, 2021: 158-173.
[8]
JIANG Y, RONG Y, CHENG H, et al. Query-driven graph convolutional networks for attributed community search [C]// 48th International Conference on Very Large Databases. Sydney: PVLDB, 2022.
[9]
竺俊超, 王朝坤 复杂条件下的社区搜索方法[J]. 软件学报, 2019, 30 (3): 21 ZHU Jun-chao, WANG Chao-kun Approaches to community search under complex conditions[J]. Journal of Software, 2019, 30 (3): 21
doi: 10.13328/j.cnki.jos.005699
[10]
JIAN X, WANG Y, CHEN L Effective and efficient relational community detection and search in large dynamic heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (10): 1723- 1736
doi: 10.14778/3401960.3401969
[11]
GAO J, CHEN J, LI Z, et al ICS-GNN: lightweight interactive community search via graph neural network[J]. Proceedings of the VLDB Endowment, 2021, 14 (6): 1006- 1018
doi: 10.14778/3447689.3447704
[12]
BELGHAZI M I, BARATIN A, RAJESWAR S, et al. MINE: mutual information neural estimation [C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018: 531-540.
[13]
KINNEY J B, ATWAL G S Equitability, mutual information, and the maximal information coefficient[J]. Proceedings of the National Academy of Sciences, 2014, 111 (9): 3354- 3359
doi: 10.1073/pnas.1309933111
[14]
ZHANG T, XIONG Y, ZHANG J, et al. CommDGI [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York: ACM, 2020: 1843-1852.
[15]
VELIČKOVIĆ P, FEDUS W, HAMILTON W L, et al. Deep graph infomax [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[16]
HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[17]
周丽华, 王家龙, 王丽珍,等 异质信息网络表征学习综述[J]. 计算机学报, 2022, 45 (1): 30 ZHOU Li-hua, WANG Jia-long, WANG Li-zhen, et al Heterogeneous information network representation learning: a survey[J]. Chinese Journal of Computers, 2022, 45 (1): 30
doi: 10.11897/SP.J.1016.2022.00160
[18]
KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017.
[19]
WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// WWW ’19: The World Wide Web Conference. [s.l.]: ACM, 2019: 2022–2032.