Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (2): 287-298    DOI: 10.3785/j.issn.1008-973X.2023.02.009
    
Community search with mutual information maximization over heterogeneous information networks
Ya-feng WANG(),Li-hua ZHOU*(),Wei CHEN,Li-zhen WANG,Hong-mei CHEN
School of Information Science and Engineering, Yunnan University, Kunming 650500, China
Download: HTML     PDF(2007KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Existing community search methods are difficult to deal with the complex and diverse search requirements of users and difficult to integrate network structure and node attributes to measure the correlation between nodes in high-dimensional and sparse heterogeneous information networks (HINs). In order to solve the problems, the community search problem of mutual information maximization over HINs was proposed, the definition of community with mutual information maximization was given, and a corresponding community search method, community search with mutual information maximization (CSMIM) was designed. The user's search requirements were defined as query constraints, and a query constraint deep graph infomax (QC-DGI) model was proposed to learn node embedding by extracting the structure, semantics and node attribute information in HINs to effectively calculate the mutual information between nodes. Then, according to the given query information, the mutual information maximization criterion was used to search the target community. In addition, an optimization strategy based on user feedback was proposed to realize the personalized calculation of mutual information from global to local, so as to improve the accuracy of search results. Finally, extensive experiments on real HINs dataset were performed, and experimental results prove that the proposed method can effectively search the community of a given node according to the search requirements, and has higher accuracy than the representative baseline methods.



Key wordscommunity search      heterogeneous information network      network representation learning      mutual information      query constraint     
Received: 28 July 2022      Published: 28 February 2023
CLC:  TP 301  
Fund:  国家自然科学基金资助项目(62062066, 61762090, 31760152, 61966036, 62266050, 62276227);云南省基础研究计划重点资助项目(202201AS070015);云南省高校物联网技术及应用重点实验室资助项目;云南大学研究生科研创新基金资助项目(2021Y024);云南省中青年学术和技术带头人后备人才资助项目(202205AC160033)
Corresponding Authors: Li-hua ZHOU     E-mail: yfwang982@163.com;lhzhou@ynu.edu.cn
Cite this article:

Ya-feng WANG,Li-hua ZHOU,Wei CHEN,Li-zhen WANG,Hong-mei CHEN. Community search with mutual information maximization over heterogeneous information networks. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 287-298.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.02.009     OR     https://www.zjujournals.com/eng/Y2023/V57/I2/287


异质信息网络的互信息最大化社区搜索

针对现有社区搜索方法难以处理复杂多样的搜索要求及在高维稀疏的异质信息网络(HINs)中难以融合网络结构和节点属性来度量节点间相关性的不足,提出异质信息网络互信息最大化社区搜索问题,给出互信息最大化的社区定义,设计相应的搜索方法(互信息最大化社区搜索,CSMIM). 将用户的搜索要求定义为查询约束,利用带查询约束的深度图互信息最大化(QC-DGI)模型融合网络结构、语义和节点属性信息获得节点嵌入,有效地计算节点间的互信息. 根据给定的查询信息,利用互信息最大化准则搜索目标社区. 为了提高搜索结果的准确率,提出基于用户反馈的优化策略,实现互信息从全局到局部的个性化计算. 在真实数据集上进行大量实验,实验结果表明所提方法能够有效地根据搜索要求挖掘出给定节点所在的社区,相比具有代表性的基线方法有更高的准确率.


关键词: 社区搜索,  异质信息网络,  网络表示学习,  互信息,  查询约束 
Fig.1 Heterogeneous information networks and its schema
Fig.2 Connected graph based on query constraints
Fig.3 Illustration of community search with mutual information maximization
Fig.4 Framework of query constraints deep graph infomax model
Fig.5 Illustration of interactive community search with mutual information maximization
数据集 节点类型 Nn 边类型 Ne Nf
ACM Paper(P)
Author(A)
Subject(S)
3 025
5 835
36
Paper-Author
Paper-Subject
9 744
3 025
1 870
IMDB Movie(M)
Actor(A)
Director(D)
Keyword(K)
4 275
5 431
2 082
7 313
Movie-Actor
Movie-Director
Movie-Keyword
12 838
4 280
20 529
6 334
DBLP Author(A)
Paper(P)
Conference(C)
Term(T)
4 057 Author-Paper
Paper-Conference
Paper-Term
19 645
14 328
88 420
334
14 328
20
8 789
Tab.1 Parameter summary of dataset in experiments
方法 查询约束:
AMA
查询约束:
AMA

$A\mathop \to \limits^2 M$
查询约束:
AMA
AMDMA
AMKMA

$A\mathop \to \limits^2 M$
FastBCore Fionnula F,
Matthew S,
Elizabeth M,
James E,
A.J. Buckley,
Frances F,
Kim D等
714人
Null Null
CSMIM Fionnula F,
Matthew S,
Elizabeth M,
James E,
A.J. Buckley,
Frances F,
Kim D等
30人
Fionnula F,
Matthew S,
Elizabeth M,
Johnny D,
A.J. Buckley,
Frances F,
Kim D等
30人
Fionnula F,
Matthew S,
Elizabeth M,
Johnny D,
Dennis Q,
Xzibit,
Toby J等
30人
Tab.2 Comparison of community search results under different query constraints
Fig.6 Efficiency of community search methods
Fig.7 Performance of optimized method ICSMIN
数据集 方法 Pre Mod PS MI
ACM FastBCore 0.330 0.90 0.044 11.490
ICS-GNN 0.938 0.43 0.237 14.195
ICSMIM 0.952 0.53 0.260 14.209
DBLP FastBCore 0.280 0.88 0.178 7.340
ICS-GNN 0.825 0.49 0.190 16.320
ICSMIM 0.852 0.59 0.193 16.340
IMDB FastBCore 0.345 0.93 0.007 10.960
ICS-GNN 0.515 0.35 0.046 13.257
ICSMIM 0.570 0.45 0.049 13.262
Tab.3 Performance comparison of community search methods
Fig.8 Influence of community size on search results
Fig.9 Influence of embedding models on search results
Fig.10 Influence of search algorithm on search results
[1]   石川, 王睿嘉, 王啸 异质信息网络分析与应用综述[J]. 软件学报, 2022, 33 (2): 24
SHI Chuan, WANG Rui-jia, WANG Xiao A survey of heterogeneous information networks analysis and applications[J]. Journal of Software, 2022, 33 (2): 24
doi: 10.13328/j.cnki.jos.006357
[2]   REN Y, LIU B, HUANG C, et al. Heterogeneous deep graph infomax [C]// 8th International Conference on Learning Representations. New Orleans: ICLR, 2020.
[3]   WANG C, PAN S, HU R, et al. Attributed graph clustering: a deep attentional embedding approach [C]// Proceedings of the 28th International Joint Conferences on Artificial Intelligence Organization. Macao: IJCAI, 2019: 3670-3676.
[4]   FANG Y, HUANG X, QIN L, et al A survey of community search over big graphs[J]. The VLDB Journal, 2020, 29 (1): 353- 392
doi: 10.1007/s00778-019-00556-x
[5]   FANG Y, YANG Y, ZHANG W, et al Effective and efficient community search over large heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (6): 854- 867
[6]   YANG Y, FANG Y, LIN X, et al. Effective and efficient truss computation over large heterogeneous information networks[C]// 2020 IEEE 36th International Conference on Data Engineering (ICDE). [s.l.]: IEEE, 2020: 901-912.
[7]   QIAO L, ZHANG Z, YUAN Y, et al. Keyword-centric community search over large heterogeneous information networks [C]// 26th International Conference on Database Systems for Advanced Applications. Taipei: DASFAA, 2021: 158-173.
[8]   JIANG Y, RONG Y, CHENG H, et al. Query-driven graph convolutional networks for attributed community search [C]// 48th International Conference on Very Large Databases. Sydney: PVLDB, 2022.
[9]   竺俊超, 王朝坤 复杂条件下的社区搜索方法[J]. 软件学报, 2019, 30 (3): 21
ZHU Jun-chao, WANG Chao-kun Approaches to community search under complex conditions[J]. Journal of Software, 2019, 30 (3): 21
doi: 10.13328/j.cnki.jos.005699
[10]   JIAN X, WANG Y, CHEN L Effective and efficient relational community detection and search in large dynamic heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (10): 1723- 1736
doi: 10.14778/3401960.3401969
[11]   GAO J, CHEN J, LI Z, et al ICS-GNN: lightweight interactive community search via graph neural network[J]. Proceedings of the VLDB Endowment, 2021, 14 (6): 1006- 1018
doi: 10.14778/3447689.3447704
[12]   BELGHAZI M I, BARATIN A, RAJESWAR S, et al. MINE: mutual information neural estimation [C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018: 531-540.
[13]   KINNEY J B, ATWAL G S Equitability, mutual information, and the maximal information coefficient[J]. Proceedings of the National Academy of Sciences, 2014, 111 (9): 3354- 3359
doi: 10.1073/pnas.1309933111
[14]   ZHANG T, XIONG Y, ZHANG J, et al. CommDGI [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York: ACM, 2020: 1843-1852.
[15]   VELIČKOVIĆ P, FEDUS W, HAMILTON W L, et al. Deep graph infomax [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[16]   HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
[17]   周丽华, 王家龙, 王丽珍,等 异质信息网络表征学习综述[J]. 计算机学报, 2022, 45 (1): 30
ZHOU Li-hua, WANG Jia-long, WANG Li-zhen, et al Heterogeneous information network representation learning: a survey[J]. Chinese Journal of Computers, 2022, 45 (1): 30
doi: 10.11897/SP.J.1016.2022.00160
[18]   KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017.
[19]   WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// WWW ’19: The World Wide Web Conference. [s.l.]: ACM, 2019: 2022–2032.
[1] Ting-ting ZHAO,zhe WANG,Yi-nan LU. Heterogeneous information network representation learning based on transition probability matrix (HINtpm)[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 548-554.
[2] YU Bao-hua , YANG Shi-xi, ZHOU Xiao-feng. Optimization of sensor allocation based on
fault-measurement point mutual information
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(1): 156-162.
[3] LAI Xiao-bo , ZHU Shi-qiang. Mutual information based non-parametric
transform stereo matching algorithm
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(9): 1636-1642.
[4] WANG Li-juan, ZHANG Hui. On parameter identifiability of linear multivariable systems under
communication access constraints
[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(12): 2103-2108.
[5] XU Miao, LI Shi-Ju, YANG Zhi-Min. Binary base-band signal processing using adaptive stochastic resonance[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(4): 692-695.