Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (2): 289-299    DOI: 10.3785/j.issn.1008-973X.2025.02.007
    
Network superspreader measurement algorithm based on multiple filter sketch
Zhuo LI1,2,3,4(),Jinhu MENG1,4,Shengyan LIN1,Yuan GAO1,Dawei GU1,Chenzhi YOU1,Kaihua LIU4,5
1. School of Microelectronics, Tianjin University, Tianjin 300072, China
2. The Peng Cheng Laboratory, Shenzhen 518000, China
3. Tianjin Microelectronics Technology Key Laboratory of Imaging and Perception, Tianjin 300072, China
4. Tianjin Digital Information Technology Research Center, Tianjin 300072, China
5. School of Information and Intelligent Engineering, Tianjin Ren’ai College, Tianjin 301636, China
Download: HTML     PDF(1716KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A novel multiple filter sketch (MFS) was proposed and analyzed for theoretical errors to address the problem that the classic sketches cannot be directly applied to superspreader measurement. The MFS removed repetitive packets by a front-end streaming filter, so that the superspreader measurement can be finished while considering the flow size estimation. The post-stage statistical filter was divided into three levels of substructures to capture flows with different sizes so as to adapt to the skewed network traffic distribution characteristics in reality. The memory utilization and the measurement accuracy of the sketch were improved. An asymmetric insertion and query algorithm (AIQ) was proposed to quickly finish the insertion and query operations by using the skewed distribution characteristics of the traffic, which can meet the requirement of link forwarding speed. The experimental results show that the proposed multiple filter sketch improves the measurement accuracy of flow size by approximately 2 to 48 times, the measurement accuracy of heavy hitters by approximately 3 to 53 times, and the measurement accuracy of superspreader by about 2.0 to 3.0 times, compared with the current state-of-the-art schemes. The measurement throughput of MFS can be comparable to that of the classical sketch.



Key wordsnetwork measurement      superspreader measurement      sketch     
Received: 03 December 2023      Published: 11 February 2025
CLC:  TP 393  
Fund:  国家重点研发计划资助项目(2022YFB2901100).
Cite this article:

Zhuo LI,Jinhu MENG,Shengyan LIN,Yuan GAO,Dawei GU,Chenzhi YOU,Kaihua LIU. Network superspreader measurement algorithm based on multiple filter sketch. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 289-299.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.02.007     OR     https://www.zjujournals.com/eng/Y2025/V59/I2/289


基于多重过滤草图的网络超点测量算法

针对典型草图无法直接应用到超点测量的问题,提出新型的多重过滤草图 (MFS),给出理论误差分析. MFS以前置分流过滤器来去除重复数据包,在兼顾流频率测量任务的同时完成超点测量. 后级统计过滤器划分为3级子结构,分别用来捕捉不同大小的流,以适应现实中偏态的网络流量分布特征,提升草图的内存利用率及测量精度. 提出非对称的插入和查询算法 (AIQ),该算法利用流量的偏态分布特征,快速完成插入和查询操作,满足链路转发的速度要求. 实验结果表明,与当下的先进方案相比,所提的多重过滤草图的流频率测量精度提升了2~48倍,大象流测量精度提升了3~53倍,超点测量精度提升了2.0~3.0倍,测量吞吐量与经典草图相当.


关键词: 网络测量,  超点测量,  草图 
Fig.1 Structure of counting filter
算法 1 AIQ插入算法
Input: 流标识为$ f $的数据包
1: Procedure INSERT($ f $,1)
2: for i=1 to $ d $
3:   $ {r_i} \leftarrow {h_i}(f) $
4:   if L1[$ {r_i} $]. key =$ f $ then
5:   if L1[$ {r_i} $].$ {v_e} $<$ {\tau _1} - 1 $ then
6:    L1[$ {r_i} $].$ {v_e} $ $ \leftarrow $ L1[$ {r_i} $].$ {v_e} $+1
7:   else if L1[$ {r_i} $].$ {v_e} $=$ {\tau _1} - 1 $ then
8:    L1[$ {r_i} $].$ {v_e} $ $ \leftarrow $ L1[$ {r_i} $].$ {v_e} $+1
9:     将$ (f,{\tau _1}) $插入L2
10:   else //此时为插入前计数器溢出
11:    将$ (f,1) $插入L2
12: if $ d $个桶中均无匹配流标识
13:  将$ f $插入CU
14: function INSERT_L2($ f $,$ v $) //插入L2
15:   if $ v $=$ {\tau _1} $ then
16:  利用$ {r_1} $和遍历的方式在L2查找空桶
17:  if L2中存在空桶
18:    $ {\mathrm{key}} \leftarrow f,{v_e} \leftarrow {\tau _1} $
19:  else
20:   放弃在L2中的插入
21:  else
22:  利用$ {r_1} $和遍历的方式在L2查找匹配$ f $的桶
23:  if L2中存在匹配$ f $的桶
24:    ${v_e}\leftarrow {v_e}+1 $
25:  else
26:   放弃在L2中的插入
27: function INSERT_CU($ f $) //插入CU
28:  $ {\mathrm{cu}} $$ \leftarrow $min (CU[$ {r_i} $]) //$ i $从1到$ d $, min用于求解最小值
29:  if $ {\mathrm{cu}} \lt {\tau _0} - 1 $ then
30:  CU[j] $ \leftarrow $ CU[j]+1 //j$ d $个计数器中最小的几个计数器
31:  else if $ {\mathrm{cu}} $=$ {\tau _0} - 1 $ then
32:  CU[j] $ \leftarrow $ CU[j]+1
33:  将$ (f,{\tau _0}) $插入L1
34:  else //插入前溢出,因此确定$ f $L1$ {v_c} $部分
35:  将$ (f,1) $再次插入L1
36: function INSERT_L1($ f $,$ v $) //插入L1
37:  if $ v $=$ {\tau _0} $ then
38:  if L1[$ {r_i} $]存在空桶k then
39:   L1[$ k $].$ {\text{key}} $$ \leftarrow $$ f $, L1[$ k $].$ {v_e} $$ \leftarrow $ 1
40:  else
41:   L1[$ j $].$ {v_c} $$ \leftarrow $ L1[$ j $].$ {v_c} $+1 //$ j $$ d $$ {v_c} $ 中最小的几个$ {v_c} $
42:  else
43:   $ l \leftarrow $与min (L1[$ {r_i} $].$ {v_e} $)对应桶索引
44:   $ m \leftarrow $与min (L1[$ {r_i} $].$ {v_c} $)对应桶索引
45:  L1[$ m $].$ {v_c} $L1[$ m $].$ {v_c} $+1
46:   $ {X_{\min }} \leftarrow \min $ (L1[$ {r_i} $].$ {v_e} $)
47:   $ {Y_{\min }} \leftarrow \min $ (L1[$ {r_i} $].$ {v_c} $)
48:  if $ {Y_{\min }} \gt \lambda {X_{\min }} $ then
49:   桶$ l \leftarrow (f,{Y_{\min }}) $
50:    $ {f_m} $的值$ {X_{\min }} $插入另外$ d $$ {v_c} $
 
算法2 AIQ查询算法
Input: 流标识$ f $
Output: $ f $的频率
1: Procedure QUERY ($ f $)
2:  $ {v_0} \leftarrow $CU查询值
3:  if $ {v_0} < {\tau _0} $ then
4:   return $ {v_0} $
5:  else
6:   $ {v_1} \leftarrow $L1查询值
7:   if $ {v_1} \lt {\tau _1} $ then
8:    return $ {v_1}+{\tau _0} - 1 $
9:   else
10:  return L2查询值
11: function QUERY_CU($ f $) //返回CU查询值
12:  for $ i $=1 to d
13:   $ {r_i} \leftarrow {h_i}(f) $
14:   return min (CU [$ {r_i} $])
15: function QUERY_L1($ f $) //返回L1查询值
16: for $ i $=1 to d
17: if L1[$ {r_i} $].Key=f then
18:  return L1[$ {r_i} $].${v_e}$
19:  return min (L1[$ {r_i} $].$ {v_c} $)
20: function QUERY_L2($ f $) //返回L2查询值
21:  if L2中存在匹配$ f $的桶$ k $ then
22:  return L2[$ k $].$ {v_e} $+$ {\tau _0} - 1 $
23: else
24: return $ {\tau _1}+{\tau _0} - 1 $
 
Fig.2 Overall framework of multiple filter sketch
Fig.3 Parameter experiment of $ \lambda $ on Web page
Fig.4 Measurement accuracy on flow size
方案e
M=400 kBM=500 kBM=600 kBM=700 kBM=800 kBM=900 kB
MFS10.616.544.473.252.481.95
Elastic48.4022.9515.3311.649.468.09
Diamond30.3123.7318.4714.8512.2810.42
TowerCU13.086.613.952.571.751.29
CM81.8757.9243.7734.5127.923.36
CU55.8639.3929.6023.3018.8215.56
CS111.3781.3162.9149.9640.6734.06
Tab.1 Average relative error of flow size measurement on Web page
方案e
s=0s=0.1s=0.2s=0.3s=0.4s=0.5s=0.6s=0.7s=0.8s=0.9s=1.0
MFS0.210.220.240.260.270.290.310.330.340.350.35
Elastic0.770.770.780.800.830.870.930.991.071.211.43
Diamond9.736.335.795.705.545.445.114.513.883.242.93
TowerCU0.400.420.410.420.430.440.420.400.360.280.25
CM3.083.093.153.233.343.473.643.844.044.415.01
CU1.581.611.691.801.932.082.232.432.602.933.39
CS1.121.161.221.331.481.671.922.252.663.304.34
Tab.2 Average relative error of flow size measurement on zipf dataset
Fig.5 Measurement accuracy on elephant flow
方案e/10?3
M=200 kBM=300 kBM=400 kBM=500 kBM=600 kBM=700 kB
Elastic38.536.936.736.636.336.2
CMH17.06.904.292.751.871.53
MVSketch43.822.213.99.877.235.57
WavingSketch16810272.163.048.641.7
MFS2.971.321.040.8220.7250.689
Tab.3 Average relative error of heavy hitter on Web page
Fig.6 Measurement accuracy on superspreader
方案e/10?3
M=500 kBM=600 kBM=700 kBM=800 kBM=900 kBM=1000 kB
CMH37.323.616.112.09.457.82
CUH36.122.514.910.88.266.59
Spread85.586.783.283.886.082.8
WavingSketch32.223.314.711.97.875.04
MFS29.019.011.77.736.374.82
Tab.4 Average relative error of superspread on CAIDA
Fig.7 Update throughput in counting filter
Fig.8 Query throughput in counting filter
Fig.9 Update throughput in superspreader measurement
[1]   WANG R, DU H, SHEN Z, et al DAP-sketch: an accurate and effective network measurement sketch with deterministic admission policy[J]. Computer Networks, 2021, 194: 108155
doi: 10.1016/j.comnet.2021.108155
[2]   GENG N, XU M, YANG Y, et al. Adaptive and low-cost traffic engineering based on traffic matrix classification [C]// International Conference on Computer Communications and Networks. Honolulu: IEEE, 2020: 1-9.
[3]   YANG T, JIANG J, LIU P, et al Adaptive measurements using one elastic sketch[J]. IEEE/ACM Transactions on Networking, 2019, 27 (6): 2236- 2251
doi: 10.1109/TNET.2019.2943939
[4]   YANG K, LI Y, LIU Z, et al. SketchINT: empowering int with towersketch for per-flow per-switch measurement [C] // International Conference on Network Protocols. Dallas: IEEE, 2021: 1-12.
[5]   DING R, YANG S, CHEN X, et al. Bitsense: universal and nearly zero-error optimization for sketch counters with compressive sensing [C]// Proceedings of the ACM SIGCOMM 2023 Conference . New York: ACM, 2023: 220-238.
[6]   LU J, CHEN H, SUN P, et al OrderSketch: an unbiased and fast sketch for frequency estimation of data streams[J]. Computer Networks, 2021, 201: 108563
doi: 10.1016/j.comnet.2021.108563
[7]   ZHENG H, TIAN C, YANG T, et al. Flymon: enabling on-the-fly task reconfiguration for network measurement [C]// Proceedings of the ACM SIGCOMM 2022 Conference. Honolulu: ACM, 2022: 486-502.
[8]   DAI Y, WANG A, GUO Y, et al Elastically augmenting the control-path throughput in SDN to deal with internet DDoS attacks[J]. ACM Transactions on Internet Technology, 2023, 23 (1): 1- 25
[9]   JONATH K, LIU Z. Analysis of cyber security rebuts and its rising aims on current technologies [C]// International Conference on Computer Network Security and Software Engineering. Zhuhai: IEEE, 2022: 134-141.
[10]   SONG G, HE L, ZHAO T, et al. Which doors are open: reinforcement learning-based internet-wide port scanning [C]// International Symposium on Quality of Service. Orlando: IEEE, 2023: 1-10.
[11]   LIU L, DING T, FENG H, et al Tree sketch: an accurate and memory efficient sketch for network-wide measurement[J]. Computer Communications, 2022, 194: 148- 155
doi: 10.1016/j.comcom.2022.07.009
[12]   TANG L, XIAO Y, HUANG Q, et al A high-performance invertible sketch for network-wide superspreader detection[J]. IEEE/ACM Transactions on Networking, 2023, 31 (2): 724- 737
doi: 10.1109/TNET.2022.3198738
[13]   ZHOU Y, JIN H, LIU P, et al. Accurate per-flow measurement with bloom sketch [C] // IEEE Conference on Computer Communications Workshops. Honolulu: IEEE, 2018: 1-2.
[14]   LI Y, YU X, YANG Y, et al Pyramid family: generic frameworks for accurate and fast flow size measurement[J]. IEEE/ACM Transactions on Networking, 2021, 30 (2): 586- 600
[15]   CORMODE G, MUTHUKRISHNAN S An improved data stream summary: the count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55 (1): 58- 75
doi: 10.1016/j.jalgor.2003.12.001
[16]   ESTAN C, VARGHESE G New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice[J]. ACM Transactions on Computer Systems, 2004, 21 (3): 44
[17]   CHARIKAR M, CHEN K, FARACHCOLTON M Finding frequent items in data streams[J]. Theoretical Computer Science, 2004, 312 (1): 3- 15
doi: 10.1016/S0304-3975(03)00400-6
[18]   YANG T, GAO S, SUN Z, et al Diamond sketch: accurate per-flow measurement for big streaming data[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30 (12): 2650- 2662
doi: 10.1109/TPDS.2019.2923772
[19]   LI J, LI Z, XU Y, et al. WavingSketch: an unbiased and generic sketch for finding top-k items in data streams [C] // 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event. California: ACM, 2020: 1574-1584.
[20]   ESTAN C, VARGHESE G, FISK M. Bitmap algorithms for counting active flows on high speed links [C]// Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement . San Francisco: ACM, 2003: 153-166.
[21]   COHEN S, MATIAS Y. Spectral bloom filters [C]// Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data . San Francisco: AC-M, 2003: 241-252.
[22]   EINZIGER G, FRIEDMAN R. A formal analysis of conservative update based approximate counting [C]// International Conference on Computing, Networking and Communications. Anaheim: IEEE, 2015: 255-259.
[23]   CLAUDIO L, SALVATORE O, RAFFAELE P, et al. Webdocs [EB/OL]. (2018-07-16)[2023-11-29]. http://fimi.uantwerpen.be/data/.
[24]   CAIDA. The CAIDA anonymized Internet traces 2018 dataset [EB/OL]. (2018-09-19)[2023-11-29]. https://www.caida.org/catalog/datasets/passive_dataset/.
[1] Yi-xuan ZHANG,Jian GONG. Multi-layer domain name detection and measurement based on DNS traffic[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(12): 2423-2429.
[2] LIU Zheng, LU Na, WU Jian-feng. Review of sketch based on design cognition[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(12): 2376-2382.