Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (1): 70-78    DOI: 10.3785/j.issn.1008-973X.2025.01.007
    
Efficient graph stream summarization technology for periodic edge queries
Zhuo LI1,2,3,4(),Shuaijun LIU1,4,Kaihua LIU4,5
1. School of Microelectronics, Tianjin University, Tianjin 300072, China
2. Pengcheng Laboratory, Shenzhen 518000, China
3. Tianjin Microelectronics Technology Key Laboratory of Imaging and Perception, Tianjin 300072, China
4. Tianjin Digital Information Technology Research Center, Tianjin 300072, China
5. School of Information and Intelligent Engineering, Tianjin Ren’ai College, Tianjin 301636, China
Download: HTML     PDF(1061KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A graph stream summarization technology for periodic edge query named periodic interaction matrix (PIM) was proposed to address the problem that the current graph stream summarization technology cannot achieve efficient and accurate graph stream measurement under smaller memory and cannot complete periodic edge query. PIM was designed as a hybrid structure consisting of a two-dimensional adjacency matrix and a three-dimensional adjacency matrix. The heavy edges were stored by the two-dimensional adjacency matrix, the light edges were stored by the three-dimensional adjacency matrix, and the memory efficiency was enhanced. Heavy edge identifiers, weights, and timestamps were retained in the two-dimensional adjacency matrix to complete various query tasks in real-time, including periodic edge queries. A weight-based and time-based replacement strategy was designed, using a shared hashing technology to improve query accuracy and insertion query efficiency. Experimental results show that PIM can efficiently complete a variety of graph stream query tasks in real-time and with small memory, and can accurately recall all heavy hitter edges, heavy hitter nodes, and periodic edges. Compared to the current graph stream summarization technology, PIM reduces the average relative error of query tasks by 91.41%-99.54%.



Key wordsgraph stream      graph stream summarization      periodic edge measurement      real-time query      adjacency matrix     
Received: 17 January 2024      Published: 18 January 2025
CLC:  TP 393.0  
Fund:  国家重点研发计划资助项目(2022YFB2901100,2022ZD0115303);鹏城实验室算力网重大攻关项目(PCL2023A06).
Cite this article:

Zhuo LI,Shuaijun LIU,Kaihua LIU. Efficient graph stream summarization technology for periodic edge queries. Journal of ZheJiang University (Engineering Science), 2025, 59(1): 70-78.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.01.007     OR     https://www.zjujournals.com/eng/Y2025/V59/I1/70


面向周期边查询的高效图流概要技术

当前图流概要技术不能在小内存下实现高效准确的图流测量,也无法完成周期边查询,为此提出面向周期边查询的图流概要技术——周期交互矩阵(PIM). PIM为混合结构,由存储重边的二维邻接矩阵和存储轻边的三维邻接矩阵组成,提高了内存效率. 二维邻接矩阵保留重边标识、权重和时间戳,实时完成包括周期边查询在内的多种查询任务. 设计基于权重和时间的替换策略,使用共享哈希技术以提高查询精度和插入查询效率. 实验结果表明,PIM在小内存下实时高效地完成了多种图流查询任务,能够准确地召回所有频繁边、频繁点和周期边. 对比当前图流概要技术,PIM将查询任务的平均相对误差降低了91.41%~99.54%.


关键词: 图流,  图流概要,  周期边测量,  实时查询,  邻接矩阵 
Fig.1 Data structure of periodic interaction matrix
Fig.2 Experiment with ratio of memory size of three-dimensional adjacency matrix to total memory size
Fig.3 Average relative error of node weight query using different graph stream summarization techniques in three datasets
Fig.4 Average relative error of heavy hitter edge query using different graph stream summarization techniques in three datasets
Fig.5 Average relative error of heavy hitter node query using different graph stream summarization techniques in three datasets
Fig.6 Average relative error of periodic edge query using different graph stream summarization techniques in three datasets
Fig.7 Harmonic mean of heavy hitter edge, heavy hitter node, and periodic edge queries using different graph stream summarization techniques in three datasets
方案t/s
插入节点权重查询频繁边
查询
频繁点
查询
周期边
查询
PIM1.51×10?72.41×10?53.42×10?36.073.47×10?3
PDMatrix2.34×10?74.81×10?51.10×10?24.442.88
PTCM2.01×10?72.08×10?53.7721.863.19
PCuckoo5.67×10?69.12×10?63.4913.194.99
Periodic4.08×10?84.15×10?3
Tab.1 Average time cost of inserting and querying process using different graph stream summarization techniques in Network-Flow datasets
[1]   PACACI A, BONIFATI A, ÖZSU M T. Regular path query evaluation on streaming graphs [C]// Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data . New York: ACM, 2020: 1415–1430.
[2]   SHAN Z G, SHI L, LI B, et al Empowering smart city situational awareness via big mobile data[J]. Frontiers of Information Technology and Electronic Engineering, 2023, 25: 286- 307
[3]   TIAN B, MORRIS B T, TANG M, et al Hierarchical and networked vehicle surveillance in ITS: a survey[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16 (2): 557- 580
doi: 10.1109/TITS.2014.2340701
[4]   ABHILASH C, MAHESH K. Graph analytics applied to COVID19 Karnataka state dataset [C]// Proceedings of the 4th International Conference on Information Science and Systems . Edinburgh: [s. n.], 2021: 74–80.
[5]   YU J, SUN Y E, HUANG H, et al. HeavyTracker: an efficient algorithm for heavy-hitter detection in high-speed networks [C]// 2022 IEEE 28th International Conference on Parallel and Distributed Systems . Nanjing: IEEE, 2023: 362–370.
[6]   CAI J Y, ZHOU Z Y, SUN T X, et al. MINT: empowering multiple flow definition query for network-wide measurement [C]// IEEE International Conference on Communications . Rome: IEEE, 2023: 1118–1123.
[7]   CHEN X, LIU H Y, SUN T X, et al. Excalibur: a scalable and low-cost traffic testing framework for evaluating DDoS defense solutions [C]// IEEE Conference on Computer Communications . New York: IEEE, 2023: 1–10.
[8]   TANG N, CHEN Q, MITRA P. Graph stream summarization: from big bang to big crunch [C]// Proceedings of the 2016 International Conference on Management of Data . San Francisco: ACM, 2016: 1481–1496.
[9]   KHAN A, AGGARWAL C Toward query-friendly compression of rapid graph streams[J]. Social Network Analysis and Mining, 2017, 7: 23
doi: 10.1007/s13278-017-0443-4
[10]   HOU C S, HOU B N, ZHOU T Q, et al DMatrix: toward fast and accurate queries in graph stream[J]. Computer Networks, 2021, 198: 108403
doi: 10.1016/j.comnet.2021.108403
[11]   GOU X Y, ZOU L, ZHAO C X Y, et al Graph stream sketch: summarizing graph streams with high speed and accuracy[J]. IEEE Transactions on Knowledge and Data Engineering, 2023, 35 (6): 5901- 5914
doi: 10.1109/TKDE.2022.3174570
[12]   LI Z, LI Z R, FAN Z Y, et al Cuckoo matrix: a high efficient and accurate graph stream summarization on limited memory[J]. Electronics, 2023, 12 (2): 414
doi: 10.3390/electronics12020414
[13]   ALREHAILI M, ALSHAMRANI A. An attack scenario reconstruction approach using alerts correlation and a dynamic attack graph [C]// 2023 Eighth International Conference On Mobile and Secure Services . Miami: IEEE, 2023: 1–8.
[14]   HEJASE H J, FAYYAD-KAZAN H F, MOUKADEM I Advanced persistent threats (APT): an awareness review[J]. Journal of Economics and Economic Education Research, 2020, 21 (6): 1- 8
[15]   FAN Z C, ZHANG Y D, YANG T, et al. PeriodicSketch: finding periodic items in data streams [C]// 2022 IEEE 38th International Conference on Data Engineering . Kuala Lumpur: IEEE, 2022: 96–109.
[16]   SINGH K, BEST P Anti-money laundering: using data visualization to identify suspicious activity[J]. International Journal of Accounting Information Systems, 2019, 34: 100418
doi: 10.1016/j.accinf.2019.06.001
[17]   CHEN T, YIN H Z, CHEN H X, et al Online sales prediction via trend alignment-based multitask recurrent neural networks[J]. Knowledge and Information Systems, 2020, 62: 2139- 2167
doi: 10.1007/s10115-019-01404-8
[18]   CHEN M, ZHOU R X, CHEN H H, et al. Scube: efficient summarization for skewed graph streams [C]// 2022 IEEE 42nd International Conference on Distributed Computing Systems . Bologna: IEEE, 2022: 100–110.
[19]   BLOOM B H Space/time trade-offs in hash coding with allowable errors[J]. Communications of the ACM, 1970, 13 (7): 422- 426
doi: 10.1145/362686.362692
[20]   CAIDA. The CAIDA anonymized internet traces 2015 dataset [EB/OL]. [2024–01–15]. https://www.caida.org/catalog/datasets/passive_dataset/.
[21]   Wiki. Wikipedia talk dataset [EB/OL]. (2017–10–27)[2024–01–15]. http://konect.cc/networks/wiki_talk_en/.
[1] Jiahong JIANG,Nan XIA,Changwu LI,Xinmiao YU. Occluded human pose estimation network based on knowledge sharing[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(10): 2001-2010.
[2] Chuan-hua ZHOU,Li-chun CAO,Jia-yi ZHOU,Feng ZHAN. Identification of critical nodes in temporal networks based on graph convolution union computing[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 930-938.