Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (2): 289-299    DOI: 10.3785/j.issn.1008-973X.2025.02.007
计算机技术     
基于多重过滤草图的网络超点测量算法
李卓1,2,3,4(),孟晋虎1,4,林盛彦1,高源1,谷大伟1,尤辰至1,刘开华4,5
1. 天津大学 微电子学院,天津 300072
2. 鹏城国家实验室,广东 深圳 518000
3. 天津市成像与感知微电子技术重点实验室,天津 300072
4. 天津市数字信息技术研究中心,天津 300072
5. 天津仁爱学院 信息与智能工程学院,天津 301636
Network superspreader measurement algorithm based on multiple filter sketch
Zhuo LI1,2,3,4(),Jinhu MENG1,4,Shengyan LIN1,Yuan GAO1,Dawei GU1,Chenzhi YOU1,Kaihua LIU4,5
1. School of Microelectronics, Tianjin University, Tianjin 300072, China
2. The Peng Cheng Laboratory, Shenzhen 518000, China
3. Tianjin Microelectronics Technology Key Laboratory of Imaging and Perception, Tianjin 300072, China
4. Tianjin Digital Information Technology Research Center, Tianjin 300072, China
5. School of Information and Intelligent Engineering, Tianjin Ren’ai College, Tianjin 301636, China
 全文: PDF(1716 KB)   HTML
摘要:

针对典型草图无法直接应用到超点测量的问题,提出新型的多重过滤草图 (MFS),给出理论误差分析. MFS以前置分流过滤器来去除重复数据包,在兼顾流频率测量任务的同时完成超点测量. 后级统计过滤器划分为3级子结构,分别用来捕捉不同大小的流,以适应现实中偏态的网络流量分布特征,提升草图的内存利用率及测量精度. 提出非对称的插入和查询算法 (AIQ),该算法利用流量的偏态分布特征,快速完成插入和查询操作,满足链路转发的速度要求. 实验结果表明,与当下的先进方案相比,所提的多重过滤草图的流频率测量精度提升了2~48倍,大象流测量精度提升了3~53倍,超点测量精度提升了2.0~3.0倍,测量吞吐量与经典草图相当.

关键词: 网络测量超点测量草图    
Abstract:

A novel multiple filter sketch (MFS) was proposed and analyzed for theoretical errors to address the problem that the classic sketches cannot be directly applied to superspreader measurement. The MFS removed repetitive packets by a front-end streaming filter, so that the superspreader measurement can be finished while considering the flow size estimation. The post-stage statistical filter was divided into three levels of substructures to capture flows with different sizes so as to adapt to the skewed network traffic distribution characteristics in reality. The memory utilization and the measurement accuracy of the sketch were improved. An asymmetric insertion and query algorithm (AIQ) was proposed to quickly finish the insertion and query operations by using the skewed distribution characteristics of the traffic, which can meet the requirement of link forwarding speed. The experimental results show that the proposed multiple filter sketch improves the measurement accuracy of flow size by approximately 2 to 48 times, the measurement accuracy of heavy hitters by approximately 3 to 53 times, and the measurement accuracy of superspreader by about 2.0 to 3.0 times, compared with the current state-of-the-art schemes. The measurement throughput of MFS can be comparable to that of the classical sketch.

Key words: network measurement    superspreader measurement    sketch
收稿日期: 2023-12-03 出版日期: 2025-02-11
CLC:  TP 393  
基金资助: 国家重点研发计划资助项目(2022YFB2901100).
作者简介: 李卓(1984—),男,副教授,博士,从事网络流量工程的研究. orcid.org/0000-0002-5535-5920. E-mail:zli@tju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李卓
孟晋虎
林盛彦
高源
谷大伟
尤辰至
刘开华

引用本文:

李卓,孟晋虎,林盛彦,高源,谷大伟,尤辰至,刘开华. 基于多重过滤草图的网络超点测量算法[J]. 浙江大学学报(工学版), 2025, 59(2): 289-299.

Zhuo LI,Jinhu MENG,Shengyan LIN,Yuan GAO,Dawei GU,Chenzhi YOU,Kaihua LIU. Network superspreader measurement algorithm based on multiple filter sketch. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 289-299.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.02.007        https://www.zjujournals.com/eng/CN/Y2025/V59/I2/289

图 1  统计过滤器的结构
算法 1 AIQ插入算法
Input: 流标识为$ f $的数据包
1: Procedure INSERT($ f $,1)
2: for i=1 to $ d $
3:   $ {r_i} \leftarrow {h_i}(f) $
4:   if L1[$ {r_i} $]. key =$ f $ then
5:   if L1[$ {r_i} $].$ {v_e} $<$ {\tau _1} - 1 $ then
6:    L1[$ {r_i} $].$ {v_e} $ $ \leftarrow $ L1[$ {r_i} $].$ {v_e} $+1
7:   else if L1[$ {r_i} $].$ {v_e} $=$ {\tau _1} - 1 $ then
8:    L1[$ {r_i} $].$ {v_e} $ $ \leftarrow $ L1[$ {r_i} $].$ {v_e} $+1
9:     将$ (f,{\tau _1}) $插入L2
10:   else //此时为插入前计数器溢出
11:    将$ (f,1) $插入L2
12: if $ d $个桶中均无匹配流标识
13:  将$ f $插入CU
14: function INSERT_L2($ f $,$ v $) //插入L2
15:   if $ v $=$ {\tau _1} $ then
16:  利用$ {r_1} $和遍历的方式在L2查找空桶
17:  if L2中存在空桶
18:    $ {\mathrm{key}} \leftarrow f,{v_e} \leftarrow {\tau _1} $
19:  else
20:   放弃在L2中的插入
21:  else
22:  利用$ {r_1} $和遍历的方式在L2查找匹配$ f $的桶
23:  if L2中存在匹配$ f $的桶
24:    ${v_e}\leftarrow {v_e}+1 $
25:  else
26:   放弃在L2中的插入
27: function INSERT_CU($ f $) //插入CU
28:  $ {\mathrm{cu}} $$ \leftarrow $min (CU[$ {r_i} $]) //$ i $从1到$ d $, min用于求解最小值
29:  if $ {\mathrm{cu}} \lt {\tau _0} - 1 $ then
30:  CU[j] $ \leftarrow $ CU[j]+1 //j$ d $个计数器中最小的几个计数器
31:  else if $ {\mathrm{cu}} $=$ {\tau _0} - 1 $ then
32:  CU[j] $ \leftarrow $ CU[j]+1
33:  将$ (f,{\tau _0}) $插入L1
34:  else //插入前溢出,因此确定$ f $L1$ {v_c} $部分
35:  将$ (f,1) $再次插入L1
36: function INSERT_L1($ f $,$ v $) //插入L1
37:  if $ v $=$ {\tau _0} $ then
38:  if L1[$ {r_i} $]存在空桶k then
39:   L1[$ k $].$ {\text{key}} $$ \leftarrow $$ f $, L1[$ k $].$ {v_e} $$ \leftarrow $ 1
40:  else
41:   L1[$ j $].$ {v_c} $$ \leftarrow $ L1[$ j $].$ {v_c} $+1 //$ j $$ d $$ {v_c} $ 中最小的几个$ {v_c} $
42:  else
43:   $ l \leftarrow $与min (L1[$ {r_i} $].$ {v_e} $)对应桶索引
44:   $ m \leftarrow $与min (L1[$ {r_i} $].$ {v_c} $)对应桶索引
45:  L1[$ m $].$ {v_c} $L1[$ m $].$ {v_c} $+1
46:   $ {X_{\min }} \leftarrow \min $ (L1[$ {r_i} $].$ {v_e} $)
47:   $ {Y_{\min }} \leftarrow \min $ (L1[$ {r_i} $].$ {v_c} $)
48:  if $ {Y_{\min }} \gt \lambda {X_{\min }} $ then
49:   桶$ l \leftarrow (f,{Y_{\min }}) $
50:    $ {f_m} $的值$ {X_{\min }} $插入另外$ d $$ {v_c} $
  
算法2 AIQ查询算法
Input: 流标识$ f $
Output: $ f $的频率
1: Procedure QUERY ($ f $)
2:  $ {v_0} \leftarrow $CU查询值
3:  if $ {v_0} < {\tau _0} $ then
4:   return $ {v_0} $
5:  else
6:   $ {v_1} \leftarrow $L1查询值
7:   if $ {v_1} \lt {\tau _1} $ then
8:    return $ {v_1}+{\tau _0} - 1 $
9:   else
10:  return L2查询值
11: function QUERY_CU($ f $) //返回CU查询值
12:  for $ i $=1 to d
13:   $ {r_i} \leftarrow {h_i}(f) $
14:   return min (CU [$ {r_i} $])
15: function QUERY_L1($ f $) //返回L1查询值
16: for $ i $=1 to d
17: if L1[$ {r_i} $].Key=f then
18:  return L1[$ {r_i} $].${v_e}$
19:  return min (L1[$ {r_i} $].$ {v_c} $)
20: function QUERY_L2($ f $) //返回L2查询值
21:  if L2中存在匹配$ f $的桶$ k $ then
22:  return L2[$ k $].$ {v_e} $+$ {\tau _0} - 1 $
23: else
24: return $ {\tau _1}+{\tau _0} - 1 $
  
图 2  多重过滤草图的整体框架
图 3  Web数据集下$ \lambda $的参数实验
图 4  流频率的测量准确性
方案e
M=400 kBM=500 kBM=600 kBM=700 kBM=800 kBM=900 kB
MFS10.616.544.473.252.481.95
Elastic48.4022.9515.3311.649.468.09
Diamond30.3123.7318.4714.8512.2810.42
TowerCU13.086.613.952.571.751.29
CM81.8757.9243.7734.5127.923.36
CU55.8639.3929.6023.3018.8215.56
CS111.3781.3162.9149.9640.6734.06
表 1  Web数据集下流频率测量的平均相对误差
方案e
s=0s=0.1s=0.2s=0.3s=0.4s=0.5s=0.6s=0.7s=0.8s=0.9s=1.0
MFS0.210.220.240.260.270.290.310.330.340.350.35
Elastic0.770.770.780.800.830.870.930.991.071.211.43
Diamond9.736.335.795.705.545.445.114.513.883.242.93
TowerCU0.400.420.410.420.430.440.420.400.360.280.25
CM3.083.093.153.233.343.473.643.844.044.415.01
CU1.581.611.691.801.932.082.232.432.602.933.39
CS1.121.161.221.331.481.671.922.252.663.304.34
表 2  偏度数据集下流频率测量的平均相对误差
图 5  大象流的测量准确性
方案e/10?3
M=200 kBM=300 kBM=400 kBM=500 kBM=600 kBM=700 kB
Elastic38.536.936.736.636.336.2
CMH17.06.904.292.751.871.53
MVSketch43.822.213.99.877.235.57
WavingSketch16810272.163.048.641.7
MFS2.971.321.040.8220.7250.689
表 3  Web数据集下大象流测量的平均相对误差
图 6  超点的测量准确性
方案e/10?3
M=500 kBM=600 kBM=700 kBM=800 kBM=900 kBM=1000 kB
CMH37.323.616.112.09.457.82
CUH36.122.514.910.88.266.59
Spread85.586.783.283.886.082.8
WavingSketch32.223.314.711.97.875.04
MFS29.019.011.77.736.374.82
表 4  CAIDA数据集下超点测量的平均相对误差
图 7  统计过滤器的插入吞吐量
图 8  统计过滤器的查询吞吐量
图 9  超点测量中的插入吞吐量
1 WANG R, DU H, SHEN Z, et al DAP-sketch: an accurate and effective network measurement sketch with deterministic admission policy[J]. Computer Networks, 2021, 194: 108155
doi: 10.1016/j.comnet.2021.108155
2 GENG N, XU M, YANG Y, et al. Adaptive and low-cost traffic engineering based on traffic matrix classification [C]// International Conference on Computer Communications and Networks. Honolulu: IEEE, 2020: 1-9.
3 YANG T, JIANG J, LIU P, et al Adaptive measurements using one elastic sketch[J]. IEEE/ACM Transactions on Networking, 2019, 27 (6): 2236- 2251
doi: 10.1109/TNET.2019.2943939
4 YANG K, LI Y, LIU Z, et al. SketchINT: empowering int with towersketch for per-flow per-switch measurement [C] // International Conference on Network Protocols. Dallas: IEEE, 2021: 1-12.
5 DING R, YANG S, CHEN X, et al. Bitsense: universal and nearly zero-error optimization for sketch counters with compressive sensing [C]// Proceedings of the ACM SIGCOMM 2023 Conference . New York: ACM, 2023: 220-238.
6 LU J, CHEN H, SUN P, et al OrderSketch: an unbiased and fast sketch for frequency estimation of data streams[J]. Computer Networks, 2021, 201: 108563
doi: 10.1016/j.comnet.2021.108563
7 ZHENG H, TIAN C, YANG T, et al. Flymon: enabling on-the-fly task reconfiguration for network measurement [C]// Proceedings of the ACM SIGCOMM 2022 Conference. Honolulu: ACM, 2022: 486-502.
8 DAI Y, WANG A, GUO Y, et al Elastically augmenting the control-path throughput in SDN to deal with internet DDoS attacks[J]. ACM Transactions on Internet Technology, 2023, 23 (1): 1- 25
9 JONATH K, LIU Z. Analysis of cyber security rebuts and its rising aims on current technologies [C]// International Conference on Computer Network Security and Software Engineering. Zhuhai: IEEE, 2022: 134-141.
10 SONG G, HE L, ZHAO T, et al. Which doors are open: reinforcement learning-based internet-wide port scanning [C]// International Symposium on Quality of Service. Orlando: IEEE, 2023: 1-10.
11 LIU L, DING T, FENG H, et al Tree sketch: an accurate and memory efficient sketch for network-wide measurement[J]. Computer Communications, 2022, 194: 148- 155
doi: 10.1016/j.comcom.2022.07.009
12 TANG L, XIAO Y, HUANG Q, et al A high-performance invertible sketch for network-wide superspreader detection[J]. IEEE/ACM Transactions on Networking, 2023, 31 (2): 724- 737
doi: 10.1109/TNET.2022.3198738
13 ZHOU Y, JIN H, LIU P, et al. Accurate per-flow measurement with bloom sketch [C] // IEEE Conference on Computer Communications Workshops. Honolulu: IEEE, 2018: 1-2.
14 LI Y, YU X, YANG Y, et al Pyramid family: generic frameworks for accurate and fast flow size measurement[J]. IEEE/ACM Transactions on Networking, 2021, 30 (2): 586- 600
15 CORMODE G, MUTHUKRISHNAN S An improved data stream summary: the count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55 (1): 58- 75
doi: 10.1016/j.jalgor.2003.12.001
16 ESTAN C, VARGHESE G New directions in traffic measurement and accounting: focusing on the elephants, ignoring the mice[J]. ACM Transactions on Computer Systems, 2004, 21 (3): 44
17 CHARIKAR M, CHEN K, FARACHCOLTON M Finding frequent items in data streams[J]. Theoretical Computer Science, 2004, 312 (1): 3- 15
doi: 10.1016/S0304-3975(03)00400-6
18 YANG T, GAO S, SUN Z, et al Diamond sketch: accurate per-flow measurement for big streaming data[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30 (12): 2650- 2662
doi: 10.1109/TPDS.2019.2923772
19 LI J, LI Z, XU Y, et al. WavingSketch: an unbiased and generic sketch for finding top-k items in data streams [C] // 26th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual Event. California: ACM, 2020: 1574-1584.
20 ESTAN C, VARGHESE G, FISK M. Bitmap algorithms for counting active flows on high speed links [C]// Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement . San Francisco: ACM, 2003: 153-166.
21 COHEN S, MATIAS Y. Spectral bloom filters [C]// Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data . San Francisco: AC-M, 2003: 241-252.
22 EINZIGER G, FRIEDMAN R. A formal analysis of conservative update based approximate counting [C]// International Conference on Computing, Networking and Communications. Anaheim: IEEE, 2015: 255-259.
23 CLAUDIO L, SALVATORE O, RAFFAELE P, et al. Webdocs [EB/OL]. (2018-07-16)[2023-11-29]. http://fimi.uantwerpen.be/data/.
24 CAIDA. The CAIDA anonymized Internet traces 2018 dataset [EB/OL]. (2018-09-19)[2023-11-29]. https://www.caida.org/catalog/datasets/passive_dataset/.
[1] 张伊璇,龚俭. 基于DNS流量的多层多域名检测与测量[J]. 浙江大学学报(工学版), 2020, 54(12): 2423-2429.
[2] 刘征, 鲁娜, 吴剑锋. 基于设计认知的草图研究综述[J]. J4, 2010, 44(12): 2376-2382.
[3] 黄琦 孙守迁. 基于意象认知模型的汽车草图设计技术研究[J]. J4, 2006, 40(4): 553-559.