Please wait a minute...
浙江大学学报(工学版)  2020, Vol. 54 Issue (7): 1281-1288    DOI: 10.3785/j.issn.1008-973X.2020.07.005
自动化技术、计算机技术     
基于特征符号表示的网络异常流量检测算法
展鹏1,2(),陈琳1,2,*(),曹鲁慧2,李学庆1
1. 山东大学 软件学院, 山东 济南 250100
2. 山东大学 信息化工作办公室,山东 济南 250100
Network traffic anomaly detection based on feature-based symbolic representation
Peng ZHAN1,2(),Lin CHEN1,2,*(),Lu-hui CAO2,Xue-qing LI1
1. School of Software, Shandong University, Jinan 250100, China
2. Informatization Office, Shandong University, Jinan 250100, China
 全文: PDF(881 KB)   HTML
摘要:

为了准确检测网络中的流量异常情况,确保网络正常运行,提出基于特征符号表示的网络异常流量检测算法(NAAD-FD). NAAD-FD算法利用趋势转折点将网络流量数据按照基于趋势特征的符号表示方法进行转化,按照表示结果将原始数据转化为包含7项特征值的子序列,将7项特征值运用到提出的距离计算方法中;结合基于密度的算法,按照时间序列的网络异常流量定义执行异常检测. 通过对算法参数、仿真数据和真实网络流量数据的实验与分析可知,该算法具有较强的鲁棒性,验证了该算法的有效性和稳定性. 该算法通过降维简化表示,显著降低了算法的时间复杂度,有效加速异常检测过程约40%.

关键词: 网络流量异常时间序列趋势特征符号近似转折点    
Abstract:

A network traffic anomaly detection algorithm based on feature-based symbolic representation (NAAD-FD) was proposed in order to accurately detect network traffic anomaly and guarantee network quality. The network traffic data were transformed into feature-based symbolic representation by segmenting data series according to network traffic turning points. Then the seven characteristic values of each subsequence were extracted, which can be used in the proposed distance measure. The network traffic anomaly sequences were detected with density-based algorithm according to the network traffic anomaly definition based on time series. The experimental results for algorithm parameters, simulation data and real network traffic data anomaly detection demonstrate that the proposed algorithm has strong robustness. The validity and stability of the algorithm were verified. The time complexity of the algorithm is significantly reduced by the proposed feature-based symbolic representation, which can accelerate the process of network traffic anomaly detection by around 40%.

Key words: network traffic anomaly    time series    trend feature    symbolic approximation    turning point
收稿日期: 2019-09-19 出版日期: 2020-07-05
CLC:  TP 391  
基金资助: 赛尔网络下一代互联网技术创新项目(NGII20190109);山东省社会科学规划资助项目(18CGLJ49)
通讯作者: 陈琳     E-mail: zhanpeng@sdu.edu.cn;chenlin@sdu.edu.cn
作者简介: 展鹏(1988—),男,博士生,工程师,从事数据挖掘领域的研究. orcid.org/0000-0001-6127-1830. E-mail: zhanpeng@sdu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
展鹏
陈琳
曹鲁慧
李学庆

引用本文:

展鹏,陈琳,曹鲁慧,李学庆. 基于特征符号表示的网络异常流量检测算法[J]. 浙江大学学报(工学版), 2020, 54(7): 1281-1288.

Peng ZHAN,Lin CHEN,Lu-hui CAO,Xue-qing LI. Network traffic anomaly detection based on feature-based symbolic representation. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1281-1288.

链接本文:

http://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2020.07.005        http://www.zjujournals.com/eng/CN/Y2020/V54/I7/1281

图 1  NAAD-FD异常检测算法原理
参数 实验设定值
字母表a [3, 6]
单点最大误差百分比 40
近邻指数k [10, 30]
缓冲区cs N
仿真数据时间变量t 1 000
表 1  字母表对平均局部异常因子的影响实验参数设定
字母表a 子序列序号 ζ
3 6 3.140 19
3 155 1.762 07
3 170 1.526 15
3 154 1.508 26
3 90 1.502 75
4 6 3.136 91
4 155 1.749 24
4 170 1.515 72
4 154 1.498 19
4 90 1.487 92
5 6 3.136 69
5 155 1.751 35
5 170 1.515 45
5 154 1.498 19
5 90 1.490 82
6 6 3.134 31
6 155 1.754 34
6 170 1.508 86
6 154 1.498 29
6 90 1.483 71
表 2  字母表对平均局部异常因子的影响
图 2  近邻指数k的范围选取实验
参数 实验设定值
字母表a 3
单点最大误差百分比 40
k [10, 30]
缓冲区cs N
仿真数据时间变量t 1 000
仿真数据异常序列长度AL 100
表 3  基于高斯分布的异常检测仿真实验参数设定
图 3  仿真数据异常检测
图 4  山东大学2018年10月至2019年6月每日入流量
图 5  真实数据下的异常流量检测结果
图 6  异常检测平均运行时长对比实验
1 ATKINSON A C, HAWKINS D M Identification of outliers[J]. Biometrics, 1981, 37 (4): 860
2 BILLOR N, HADI A S, VELLEMAN P F BACON: blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics and Data Analysis, 2000, 34 (3): 279- 298
doi: 10.1016/S0167-9473(99)00101-2
3 KNORR E M, NG R T. A unified notion of outliers: properties and computation [C]//International Conference on Knowledge Discovery and Data Mining. California: AAAI, 1997: 219-222.
4 GUAN H, LI Q, YAN Z, et al. SLOF: identify density-based local outliers in big data [C]//Web Information System and Application Conference. Jinan: IEEE, 2015.
5 MARKOU M, SINGH S Novelty detection: a review—part 2: neural network based approaches[J]. Signal Processing, 2003, 83 (12): 2499- 2521
doi: 10.1016/j.sigpro.2003.07.019
6 WANG J S, CHIANG J C A cluster validity measure with outlier detection for support vector clustering[J]. IEEE Transactions on Cybernetics, 2008, 38 (1): 78- 89
7 KEOGH E, LIN J, FU A. HOT SAX: efficiently finding the most unusual time series subsequence [C]//5th IEEE International Conference on Data Mining. Houston: IEEE, 2006.
8 FU W C, LEUNG T W, KEOGH E J, et al. Finding time series discords based on Haar transform [C]//Advanced Data Mining and Applications, 2nd International Conference. Xi'an: Springer, 2006.
9 KHANH N D K, ANH D T. Time series discord discovery using WAT algorithm and iSAX representation [C]// Proceedings of the 3rd Symposium on Information and Communication Technology. Ha Long: ACM, 2012: 207–213.
10 SHIEH J, KEOGH E. iSAX: indexing and mining terabyte sized time series [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM, 2012: 207–213.
11 孙梅玉 基于距离和密度的时间序列异常检测方法研究[J]. 计算机工程与应用, 2012, 48 (20): 11- 17
SUN Mei-yu Research on discords detect on time series based on distance and density[J]. Computer Engineering and Applications, 2012, 48 (20): 11- 17
doi: 10.3778/j.issn.1002-8331.2012.20.003
12 余宇峰, 朱跃龙, 万定生, 等 基于滑动窗口预测的水文时间序列异常检测[J]. 计算机应用, 2014, 34 (8): 2217- 2220
YU Yu-feng, ZHU Yue-long, WAN Ding-sheng, et al Time series outlier detection based on sliding window prediction[J]. Journal of Computer Applications, 2014, 34 (8): 2217- 2220
doi: 10.11772/j.issn.1001-9081.2014.08.2217
13 周大镯, 刘月芬, 马文秀 时间序列异常检测[J]. 计算机工程与应用, 2008, 44 (35): 145- 147
ZHOU Da-zhuo, LIU Yue-fen, MA Wen-xiu Effective time series outlier detection algorithm based on segmentation[J]. Computer Engineering and Applications, 2008, 44 (35): 145- 147
doi: 10.3778/j.issn.1002-8331.2008.35.044
14 张力生, 杨美洁, 雷大江 时间序列重要点分割的异常子序列检测[J]. 计算机科学, 2012, 39 (5): 183- 186
ZHANG Li-sheng, YANG Mei-jie, LEI Da-jiang Outlier sub-sequences detection for importance points segmentation of time series[J]. Computer Science, 2012, 39 (5): 183- 186
doi: 10.3969/j.issn.1002-137X.2012.05.043
15 BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: ACM, 2000.
16 KEOGH E, CHU S, HART D, et al. An online algorithm for segmenting time series [C]//Proceedings of 2001 IEEE International Conference on Data Mining. San Jose: IEEE, 2001: 289-296.
17 KEOGH E, CHAKRABARTI K, PAZZANI M, et al Dimensionality reduction for fast similarity search in large time series databases[J]. Knowledge and Information Systems, 2002, 3 (3): 263- 286
18 ZHAN P, HU Y, ZHANG Q, et al. Feature-based dividing symbolic time series representation for streaming data processing [C]//Proceedings of the 9th International Conference on Information Technology in Medicine and Education. Hangzhou: IEEE, 2018: 817-823.
19 ZHAN P, HU Y, LUO W, et al. Feature-based online segmentation algorithm for streaming time series (short paper) [C]// Proceedings of the 14th EAI International Conference CollaborateCom. Shanghai: Springer, 2018: 477-487.
20 YIN J, SI Y W, GONG Z. Financial time series segmentation based on turning points [C]//Proceedings of 2011 International Conference on System Science and Engineering. Macao: IEEE, 2011: 394-399.
21 SUN Y, LI J, LIU J, et al An improvement of symbolic aggregate approximation distance measure for time series[J]. Neurocomputing, 2014, 138: 189- 198
doi: 10.1016/j.neucom.2014.01.045
[1] 陈功,郑春华,翁献明,HAMEEDBaustani,胡鸿昊,马晓宇,柳景青. 基于多水文数据关联分析的雨水口异常诊断[J]. 浙江大学学报(工学版), 2021, 55(1): 55-61.
[2] 王晨霖,杨洁,居文军,顾复,陈芨熙,纪杨建. 基于智能家电的短期电力负荷预测与削峰填谷优化[J]. 浙江大学学报(工学版), 2020, 54(7): 1418-1424.
[3] 汪子龙,王柱,於志文,郭斌,周兴社. 多源数据跨国人口迁移预测[J]. 浙江大学学报(工学版), 2019, 53(9): 1759-1767.
[4] 李麟玮, 吴益平, 苗发盛. 基于灰狼支持向量机的非等时距滑坡位移预测[J]. 浙江大学学报(工学版), 2018, 52(10): 1998-2006.
[5] 巫江虹, 姜峰. 基于动态负荷的空调生命周期气候性能[J]. 浙江大学学报(工学版), 2017, 51(10): 2061-2069.
[6] 蔡青林,陈岭,梅寒蕾,孙建伶. 基于两级过滤的时间序列近似查询[J]. 浙江大学学报(工学版), 2016, 50(7): 1290-1297.
[7] 魏媛,冯天恒,黄平捷,侯迪波,张光新. 管网水质多指标动态关联异常检测方法[J]. 浙江大学学报(工学版), 2016, 50(7): 1402-1409.
[8] 谭海龙, 刘康玲, 金鑫, 石向荣, 梁军. 基于μσ-DWC特征和树结构M-SVM的多维时间序列分类[J]. 浙江大学学报(工学版), 2015, 49(6): 1061-1069.
[9] 赵建军,王毅,杨利斌. 基于时间序列预测的威胁估计方法[J]. J4, 2014, 48(3): 398-403.
[10] 彭令,牛瑞卿,吴婷. 时间序列分析与支持向量机的滑坡位移预测[J]. J4, 2013, 47(9): 1672-1679.
[11] 喻海清, 刘毅, 陈坤, 纪俊, 李平. 鲁棒的递推核学习建模方法在高炉过程的应用[J]. J4, 2012, 46(4): 705-711.
[12] 王智磊,孙红月,刘永莉,尚岳全. 降雨与边坡地下水位关系的时间序列分析[J]. J4, 2011, 45(7): 1301-1307.
[13] 潘伟, 刘祥官, 曾九孙. TGARCH模型预测高炉铁水硅质量分数[J]. J4, 2010, 44(4): 696-699.
[14] 李富强 钱镜林. 大坝监测数据自回归模型特征根的应用研究[J]. J4, 2009, 43(1): 193-196.
[15] 陈勇 叶雨清 孙炳楠 楼文娟 俞菊虎. 模型预测技术在桥梁健康监测中的应用[J]. J4, 2008, 42(1): 157-163.