Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (2): 287-298    DOI: 10.3785/j.issn.1008-973X.2023.02.009
计算机技术     
异质信息网络的互信息最大化社区搜索
王亚峰(),周丽华*(),陈伟,王丽珍,陈红梅
云南大学 信息学院,云南 昆明 650500
Community search with mutual information maximization over heterogeneous information networks
Ya-feng WANG(),Li-hua ZHOU*(),Wei CHEN,Li-zhen WANG,Hong-mei CHEN
School of Information Science and Engineering, Yunnan University, Kunming 650500, China
 全文: PDF(2007 KB)   HTML
摘要:

针对现有社区搜索方法难以处理复杂多样的搜索要求及在高维稀疏的异质信息网络(HINs)中难以融合网络结构和节点属性来度量节点间相关性的不足,提出异质信息网络互信息最大化社区搜索问题,给出互信息最大化的社区定义,设计相应的搜索方法(互信息最大化社区搜索,CSMIM). 将用户的搜索要求定义为查询约束,利用带查询约束的深度图互信息最大化(QC-DGI)模型融合网络结构、语义和节点属性信息获得节点嵌入,有效地计算节点间的互信息. 根据给定的查询信息,利用互信息最大化准则搜索目标社区. 为了提高搜索结果的准确率,提出基于用户反馈的优化策略,实现互信息从全局到局部的个性化计算. 在真实数据集上进行大量实验,实验结果表明所提方法能够有效地根据搜索要求挖掘出给定节点所在的社区,相比具有代表性的基线方法有更高的准确率.

关键词: 社区搜索异质信息网络网络表示学习互信息查询约束    
Abstract:

Existing community search methods are difficult to deal with the complex and diverse search requirements of users and difficult to integrate network structure and node attributes to measure the correlation between nodes in high-dimensional and sparse heterogeneous information networks (HINs). In order to solve the problems, the community search problem of mutual information maximization over HINs was proposed, the definition of community with mutual information maximization was given, and a corresponding community search method, community search with mutual information maximization (CSMIM) was designed. The user's search requirements were defined as query constraints, and a query constraint deep graph infomax (QC-DGI) model was proposed to learn node embedding by extracting the structure, semantics and node attribute information in HINs to effectively calculate the mutual information between nodes. Then, according to the given query information, the mutual information maximization criterion was used to search the target community. In addition, an optimization strategy based on user feedback was proposed to realize the personalized calculation of mutual information from global to local, so as to improve the accuracy of search results. Finally, extensive experiments on real HINs dataset were performed, and experimental results prove that the proposed method can effectively search the community of a given node according to the search requirements, and has higher accuracy than the representative baseline methods.

Key words: community search    heterogeneous information network    network representation learning    mutual information    query constraint
收稿日期: 2022-07-28 出版日期: 2023-02-28
CLC:  TP 301  
基金资助: 国家自然科学基金资助项目(62062066, 61762090, 31760152, 61966036, 62266050, 62276227);云南省基础研究计划重点资助项目(202201AS070015);云南省高校物联网技术及应用重点实验室资助项目;云南大学研究生科研创新基金资助项目(2021Y024);云南省中青年学术和技术带头人后备人才资助项目(202205AC160033)
通讯作者: 周丽华     E-mail: yfwang982@163.com;lhzhou@ynu.edu.cn
作者简介: 王亚峰(1998—),女,硕士生,从事数据挖掘、社区搜索研究. orcid.org/0000-0002-5944-2888. E-mail: yfwang982@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
王亚峰
周丽华
陈伟
王丽珍
陈红梅

引用本文:

王亚峰,周丽华,陈伟,王丽珍,陈红梅. 异质信息网络的互信息最大化社区搜索[J]. 浙江大学学报(工学版), 2023, 57(2): 287-298.

Ya-feng WANG,Li-hua ZHOU,Wei CHEN,Li-zhen WANG,Hong-mei CHEN. Community search with mutual information maximization over heterogeneous information networks. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 287-298.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.02.009        https://www.zjujournals.com/eng/CN/Y2023/V57/I2/287

图 1  异质信息网络及其网络模式
图 2  基于查询约束的连通图
图 3  互信息最大化社区搜索方法框架
图 4  带查询约束的深度图互信息最大化模型框架
图 5  用户交互式互信息最大化社区搜索方法框架
数据集 节点类型 Nn 边类型 Ne Nf
ACM Paper(P)
Author(A)
Subject(S)
3 025
5 835
36
Paper-Author
Paper-Subject
9 744
3 025
1 870
IMDB Movie(M)
Actor(A)
Director(D)
Keyword(K)
4 275
5 431
2 082
7 313
Movie-Actor
Movie-Director
Movie-Keyword
12 838
4 280
20 529
6 334
DBLP Author(A)
Paper(P)
Conference(C)
Term(T)
4 057 Author-Paper
Paper-Conference
Paper-Term
19 645
14 328
88 420
334
14 328
20
8 789
表 1  实验所使用的数据集参数
方法 查询约束:
AMA
查询约束:
AMA

$A\mathop \to \limits^2 M$
查询约束:
AMA
AMDMA
AMKMA

$A\mathop \to \limits^2 M$
FastBCore Fionnula F,
Matthew S,
Elizabeth M,
James E,
A.J. Buckley,
Frances F,
Kim D等
714人
Null Null
CSMIM Fionnula F,
Matthew S,
Elizabeth M,
James E,
A.J. Buckley,
Frances F,
Kim D等
30人
Fionnula F,
Matthew S,
Elizabeth M,
Johnny D,
A.J. Buckley,
Frances F,
Kim D等
30人
Fionnula F,
Matthew S,
Elizabeth M,
Johnny D,
Dennis Q,
Xzibit,
Toby J等
30人
表 2  不同查询约束下的社区搜索结果对比
图 6  社区搜索方法效率对比
图 7  优化的社区搜索方法ICSMIN性能分析
数据集 方法 Pre Mod PS MI
ACM FastBCore 0.330 0.90 0.044 11.490
ICS-GNN 0.938 0.43 0.237 14.195
ICSMIM 0.952 0.53 0.260 14.209
DBLP FastBCore 0.280 0.88 0.178 7.340
ICS-GNN 0.825 0.49 0.190 16.320
ICSMIM 0.852 0.59 0.193 16.340
IMDB FastBCore 0.345 0.93 0.007 10.960
ICS-GNN 0.515 0.35 0.046 13.257
ICSMIM 0.570 0.45 0.049 13.262
表 3  不同社区搜索方法的性能对比
图 8  社区大小对搜索结果的影响
图 9  嵌入模型对搜索结果的影响
图 10  搜索算法对搜索结果的影响
1 石川, 王睿嘉, 王啸 异质信息网络分析与应用综述[J]. 软件学报, 2022, 33 (2): 24
SHI Chuan, WANG Rui-jia, WANG Xiao A survey of heterogeneous information networks analysis and applications[J]. Journal of Software, 2022, 33 (2): 24
doi: 10.13328/j.cnki.jos.006357
2 REN Y, LIU B, HUANG C, et al. Heterogeneous deep graph infomax [C]// 8th International Conference on Learning Representations. New Orleans: ICLR, 2020.
3 WANG C, PAN S, HU R, et al. Attributed graph clustering: a deep attentional embedding approach [C]// Proceedings of the 28th International Joint Conferences on Artificial Intelligence Organization. Macao: IJCAI, 2019: 3670-3676.
4 FANG Y, HUANG X, QIN L, et al A survey of community search over big graphs[J]. The VLDB Journal, 2020, 29 (1): 353- 392
doi: 10.1007/s00778-019-00556-x
5 FANG Y, YANG Y, ZHANG W, et al Effective and efficient community search over large heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (6): 854- 867
6 YANG Y, FANG Y, LIN X, et al. Effective and efficient truss computation over large heterogeneous information networks[C]// 2020 IEEE 36th International Conference on Data Engineering (ICDE). [s.l.]: IEEE, 2020: 901-912.
7 QIAO L, ZHANG Z, YUAN Y, et al. Keyword-centric community search over large heterogeneous information networks [C]// 26th International Conference on Database Systems for Advanced Applications. Taipei: DASFAA, 2021: 158-173.
8 JIANG Y, RONG Y, CHENG H, et al. Query-driven graph convolutional networks for attributed community search [C]// 48th International Conference on Very Large Databases. Sydney: PVLDB, 2022.
9 竺俊超, 王朝坤 复杂条件下的社区搜索方法[J]. 软件学报, 2019, 30 (3): 21
ZHU Jun-chao, WANG Chao-kun Approaches to community search under complex conditions[J]. Journal of Software, 2019, 30 (3): 21
doi: 10.13328/j.cnki.jos.005699
10 JIAN X, WANG Y, CHEN L Effective and efficient relational community detection and search in large dynamic heterogeneous information networks[J]. Proceedings of the VLDB Endowment, 2020, 13 (10): 1723- 1736
doi: 10.14778/3401960.3401969
11 GAO J, CHEN J, LI Z, et al ICS-GNN: lightweight interactive community search via graph neural network[J]. Proceedings of the VLDB Endowment, 2021, 14 (6): 1006- 1018
doi: 10.14778/3447689.3447704
12 BELGHAZI M I, BARATIN A, RAJESWAR S, et al. MINE: mutual information neural estimation [C]// Proceedings of the 35th International Conference on Machine Learning. Stockholm: ICML, 2018: 531-540.
13 KINNEY J B, ATWAL G S Equitability, mutual information, and the maximal information coefficient[J]. Proceedings of the National Academy of Sciences, 2014, 111 (9): 3354- 3359
doi: 10.1073/pnas.1309933111
14 ZHANG T, XIONG Y, ZHANG J, et al. CommDGI [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. New York: ACM, 2020: 1843-1852.
15 VELIČKOVIĆ P, FEDUS W, HAMILTON W L, et al. Deep graph infomax [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
16 HJELM R D, FEDOROV A, LAVOIE-MARCHILDON S, et al. Learning deep representations by mutual information estimation and maximization [C]// 7th International Conference on Learning Representations. New Orleans: ICLR, 2019.
17 周丽华, 王家龙, 王丽珍,等 异质信息网络表征学习综述[J]. 计算机学报, 2022, 45 (1): 30
ZHOU Li-hua, WANG Jia-long, WANG Li-zhen, et al Heterogeneous information network representation learning: a survey[J]. Chinese Journal of Computers, 2022, 45 (1): 30
doi: 10.11897/SP.J.1016.2022.00160
18 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// 5th International Conference on Learning Representations. Toulon: ICLR, 2017.
19 WANG X, JI H, SHI C, et al. Heterogeneous graph attention network[C]// WWW ’19: The World Wide Web Conference. [s.l.]: ACM, 2019: 2022–2032.
[1] 赵廷廷,王喆,卢奕南. 基于传播概率矩阵的异构信息网络表示学习[J]. 浙江大学学报(工学版), 2019, 53(3): 548-554.
[2] 于保华, 杨世锡,周晓峰. 基于故障-测点互信息的传感器多目标优化配置[J]. J4, 2012, 46(1): 156-162.
[3] 王丽娟, 章辉. 通讯访问约束下的线性系统参数可辨识性[J]. J4, 2011, 45(12): 2103-2108.
[4] 于淼, 李式巨, 杨志敏. 自适应随机共振二进制基带信号处理[J]. J4, 2010, 44(4): 692-695.