A network traffic anomaly detection algorithm based on feature-based symbolic representation (NAAD-FD) was proposed in order to accurately detect network traffic anomaly and guarantee network quality. The network traffic data were transformed into feature-based symbolic representation by segmenting data series according to network traffic turning points. Then the seven characteristic values of each subsequence were extracted, which can be used in the proposed distance measure. The network traffic anomaly sequences were detected with density-based algorithm according to the network traffic anomaly definition based on time series. The experimental results for algorithm parameters, simulation data and real network traffic data anomaly detection demonstrate that the proposed algorithm has strong robustness. The validity and stability of the algorithm were verified. The time complexity of the algorithm is significantly reduced by the proposed feature-based symbolic representation, which can accelerate the process of network traffic anomaly detection by around 40%.
Tab.1Parameters setting for experiments of comparing influence of alphabet size on mean local outlier factor
字母表a
子序列序号
ζ
3
6
3.140 19
3
155
1.762 07
3
170
1.526 15
3
154
1.508 26
3
90
1.502 75
4
6
3.136 91
4
155
1.749 24
4
170
1.515 72
4
154
1.498 19
4
90
1.487 92
5
6
3.136 69
5
155
1.751 35
5
170
1.515 45
5
154
1.498 19
5
90
1.490 82
6
6
3.134 31
6
155
1.754 34
6
170
1.508 86
6
154
1.498 29
6
90
1.483 71
Tab.2Influence of alphabet size on mean local outlier factor
Fig.2Experiments on selection of k
参数
实验设定值
字母表a
3
单点最大误差百分比
40
k
[10, 30]
缓冲区cs
N
仿真数据时间变量t
1 000
仿真数据异常序列长度AL
100
Tab.3Parameter settings for anomaly detection simulation experiments based on Gaussian distribution
Fig.3NAAD-FD algorithm detects anomaly sequences
Fig.4Network traffic data between October 2018 and June 2019 in Shandong University
Fig.5Anomaly detection in real network flow data
Fig.6Average running time of different anomaly detection methods
[1]
ATKINSON A C, HAWKINS D M Identification of outliers[J]. Biometrics, 1981, 37 (4): 860
[2]
BILLOR N, HADI A S, VELLEMAN P F BACON: blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics and Data Analysis, 2000, 34 (3): 279- 298
doi: 10.1016/S0167-9473(99)00101-2
[3]
KNORR E M, NG R T. A unified notion of outliers: properties and computation [C]//International Conference on Knowledge Discovery and Data Mining. California: AAAI, 1997: 219-222.
[4]
GUAN H, LI Q, YAN Z, et al. SLOF: identify density-based local outliers in big data [C]//Web Information System and Application Conference. Jinan: IEEE, 2015.
[5]
MARKOU M, SINGH S Novelty detection: a review—part 2: neural network based approaches[J]. Signal Processing, 2003, 83 (12): 2499- 2521
doi: 10.1016/j.sigpro.2003.07.019
[6]
WANG J S, CHIANG J C A cluster validity measure with outlier detection for support vector clustering[J]. IEEE Transactions on Cybernetics, 2008, 38 (1): 78- 89
[7]
KEOGH E, LIN J, FU A. HOT SAX: efficiently finding the most unusual time series subsequence [C]//5th IEEE International Conference on Data Mining. Houston: IEEE, 2006.
[8]
FU W C, LEUNG T W, KEOGH E J, et al. Finding time series discords based on Haar transform [C]//Advanced Data Mining and Applications, 2nd International Conference. Xi'an: Springer, 2006.
[9]
KHANH N D K, ANH D T. Time series discord discovery using WAT algorithm and iSAX representation [C]// Proceedings of the 3rd Symposium on Information and Communication Technology. Ha Long: ACM, 2012: 207–213.
[10]
SHIEH J, KEOGH E. iSAX: indexing and mining terabyte sized time series [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM, 2012: 207–213.
[11]
孙梅玉 基于距离和密度的时间序列异常检测方法研究[J]. 计算机工程与应用, 2012, 48 (20): 11- 17 SUN Mei-yu Research on discords detect on time series based on distance and density[J]. Computer Engineering and Applications, 2012, 48 (20): 11- 17
doi: 10.3778/j.issn.1002-8331.2012.20.003
[12]
余宇峰, 朱跃龙, 万定生, 等 基于滑动窗口预测的水文时间序列异常检测[J]. 计算机应用, 2014, 34 (8): 2217- 2220 YU Yu-feng, ZHU Yue-long, WAN Ding-sheng, et al Time series outlier detection based on sliding window prediction[J]. Journal of Computer Applications, 2014, 34 (8): 2217- 2220
doi: 10.11772/j.issn.1001-9081.2014.08.2217
[13]
周大镯, 刘月芬, 马文秀 时间序列异常检测[J]. 计算机工程与应用, 2008, 44 (35): 145- 147 ZHOU Da-zhuo, LIU Yue-fen, MA Wen-xiu Effective time series outlier detection algorithm based on segmentation[J]. Computer Engineering and Applications, 2008, 44 (35): 145- 147
doi: 10.3778/j.issn.1002-8331.2008.35.044
[14]
张力生, 杨美洁, 雷大江 时间序列重要点分割的异常子序列检测[J]. 计算机科学, 2012, 39 (5): 183- 186 ZHANG Li-sheng, YANG Mei-jie, LEI Da-jiang Outlier sub-sequences detection for importance points segmentation of time series[J]. Computer Science, 2012, 39 (5): 183- 186
doi: 10.3969/j.issn.1002-137X.2012.05.043
[15]
BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: ACM, 2000.
[16]
KEOGH E, CHU S, HART D, et al. An online algorithm for segmenting time series [C]//Proceedings of 2001 IEEE International Conference on Data Mining. San Jose: IEEE, 2001: 289-296.
[17]
KEOGH E, CHAKRABARTI K, PAZZANI M, et al Dimensionality reduction for fast similarity search in large time series databases[J]. Knowledge and Information Systems, 2002, 3 (3): 263- 286
[18]
ZHAN P, HU Y, ZHANG Q, et al. Feature-based dividing symbolic time series representation for streaming data processing [C]//Proceedings of the 9th International Conference on Information Technology in Medicine and Education. Hangzhou: IEEE, 2018: 817-823.
[19]
ZHAN P, HU Y, LUO W, et al. Feature-based online segmentation algorithm for streaming time series (short paper) [C]// Proceedings of the 14th EAI International Conference CollaborateCom. Shanghai: Springer, 2018: 477-487.
[20]
YIN J, SI Y W, GONG Z. Financial time series segmentation based on turning points [C]//Proceedings of 2011 International Conference on System Science and Engineering. Macao: IEEE, 2011: 394-399.
[21]
SUN Y, LI J, LIU J, et al An improvement of symbolic aggregate approximation distance measure for time series[J]. Neurocomputing, 2014, 138: 189- 198
doi: 10.1016/j.neucom.2014.01.045