Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (7): 1281-1288    DOI: 10.3785/j.issn.1008-973X.2020.07.005
    
Network traffic anomaly detection based on feature-based symbolic representation
Peng ZHAN1,2(),Lin CHEN1,2,*(),Lu-hui CAO2,Xue-qing LI1
1. School of Software, Shandong University, Jinan 250100, China
2. Informatization Office, Shandong University, Jinan 250100, China
Download: HTML     PDF(881KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A network traffic anomaly detection algorithm based on feature-based symbolic representation (NAAD-FD) was proposed in order to accurately detect network traffic anomaly and guarantee network quality. The network traffic data were transformed into feature-based symbolic representation by segmenting data series according to network traffic turning points. Then the seven characteristic values of each subsequence were extracted, which can be used in the proposed distance measure. The network traffic anomaly sequences were detected with density-based algorithm according to the network traffic anomaly definition based on time series. The experimental results for algorithm parameters, simulation data and real network traffic data anomaly detection demonstrate that the proposed algorithm has strong robustness. The validity and stability of the algorithm were verified. The time complexity of the algorithm is significantly reduced by the proposed feature-based symbolic representation, which can accelerate the process of network traffic anomaly detection by around 40%.



Key wordsnetwork traffic anomaly      time series      trend feature      symbolic approximation      turning point     
Received: 19 September 2019      Published: 05 July 2020
CLC:  TP 391  
Corresponding Authors: Lin CHEN     E-mail: zhanpeng@sdu.edu.cn;chenlin@sdu.edu.cn
Cite this article:

Peng ZHAN,Lin CHEN,Lu-hui CAO,Xue-qing LI. Network traffic anomaly detection based on feature-based symbolic representation. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1281-1288.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.07.005     OR     http://www.zjujournals.com/eng/Y2020/V54/I7/1281


基于特征符号表示的网络异常流量检测算法

为了准确检测网络中的流量异常情况,确保网络正常运行,提出基于特征符号表示的网络异常流量检测算法(NAAD-FD). NAAD-FD算法利用趋势转折点将网络流量数据按照基于趋势特征的符号表示方法进行转化,按照表示结果将原始数据转化为包含7项特征值的子序列,将7项特征值运用到提出的距离计算方法中;结合基于密度的算法,按照时间序列的网络异常流量定义执行异常检测. 通过对算法参数、仿真数据和真实网络流量数据的实验与分析可知,该算法具有较强的鲁棒性,验证了该算法的有效性和稳定性. 该算法通过降维简化表示,显著降低了算法的时间复杂度,有效加速异常检测过程约40%.


关键词: 网络流量异常,  时间序列,  趋势特征,  符号近似,  转折点 
Fig.1 Thought of NAAD-FD
参数 实验设定值
字母表a [3, 6]
单点最大误差百分比 40
近邻指数k [10, 30]
缓冲区cs N
仿真数据时间变量t 1 000
Tab.1 Parameters setting for experiments of comparing influence of alphabet size on mean local outlier factor
字母表a 子序列序号 ζ
3 6 3.140 19
3 155 1.762 07
3 170 1.526 15
3 154 1.508 26
3 90 1.502 75
4 6 3.136 91
4 155 1.749 24
4 170 1.515 72
4 154 1.498 19
4 90 1.487 92
5 6 3.136 69
5 155 1.751 35
5 170 1.515 45
5 154 1.498 19
5 90 1.490 82
6 6 3.134 31
6 155 1.754 34
6 170 1.508 86
6 154 1.498 29
6 90 1.483 71
Tab.2 Influence of alphabet size on mean local outlier factor
Fig.2 Experiments on selection of k
参数 实验设定值
字母表a 3
单点最大误差百分比 40
k [10, 30]
缓冲区cs N
仿真数据时间变量t 1 000
仿真数据异常序列长度AL 100
Tab.3 Parameter settings for anomaly detection simulation experiments based on Gaussian distribution
Fig.3 NAAD-FD algorithm detects anomaly sequences
Fig.4 Network traffic data between October 2018 and June 2019 in Shandong University
Fig.5 Anomaly detection in real network flow data
Fig.6 Average running time of different anomaly detection methods
[1]   ATKINSON A C, HAWKINS D M Identification of outliers[J]. Biometrics, 1981, 37 (4): 860
[2]   BILLOR N, HADI A S, VELLEMAN P F BACON: blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics and Data Analysis, 2000, 34 (3): 279- 298
doi: 10.1016/S0167-9473(99)00101-2
[3]   KNORR E M, NG R T. A unified notion of outliers: properties and computation [C]//International Conference on Knowledge Discovery and Data Mining. California: AAAI, 1997: 219-222.
[4]   GUAN H, LI Q, YAN Z, et al. SLOF: identify density-based local outliers in big data [C]//Web Information System and Application Conference. Jinan: IEEE, 2015.
[5]   MARKOU M, SINGH S Novelty detection: a review—part 2: neural network based approaches[J]. Signal Processing, 2003, 83 (12): 2499- 2521
doi: 10.1016/j.sigpro.2003.07.019
[6]   WANG J S, CHIANG J C A cluster validity measure with outlier detection for support vector clustering[J]. IEEE Transactions on Cybernetics, 2008, 38 (1): 78- 89
[7]   KEOGH E, LIN J, FU A. HOT SAX: efficiently finding the most unusual time series subsequence [C]//5th IEEE International Conference on Data Mining. Houston: IEEE, 2006.
[8]   FU W C, LEUNG T W, KEOGH E J, et al. Finding time series discords based on Haar transform [C]//Advanced Data Mining and Applications, 2nd International Conference. Xi'an: Springer, 2006.
[9]   KHANH N D K, ANH D T. Time series discord discovery using WAT algorithm and iSAX representation [C]// Proceedings of the 3rd Symposium on Information and Communication Technology. Ha Long: ACM, 2012: 207–213.
[10]   SHIEH J, KEOGH E. iSAX: indexing and mining terabyte sized time series [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM, 2012: 207–213.
[11]   孙梅玉 基于距离和密度的时间序列异常检测方法研究[J]. 计算机工程与应用, 2012, 48 (20): 11- 17
SUN Mei-yu Research on discords detect on time series based on distance and density[J]. Computer Engineering and Applications, 2012, 48 (20): 11- 17
doi: 10.3778/j.issn.1002-8331.2012.20.003
[12]   余宇峰, 朱跃龙, 万定生, 等 基于滑动窗口预测的水文时间序列异常检测[J]. 计算机应用, 2014, 34 (8): 2217- 2220
YU Yu-feng, ZHU Yue-long, WAN Ding-sheng, et al Time series outlier detection based on sliding window prediction[J]. Journal of Computer Applications, 2014, 34 (8): 2217- 2220
doi: 10.11772/j.issn.1001-9081.2014.08.2217
[13]   周大镯, 刘月芬, 马文秀 时间序列异常检测[J]. 计算机工程与应用, 2008, 44 (35): 145- 147
ZHOU Da-zhuo, LIU Yue-fen, MA Wen-xiu Effective time series outlier detection algorithm based on segmentation[J]. Computer Engineering and Applications, 2008, 44 (35): 145- 147
doi: 10.3778/j.issn.1002-8331.2008.35.044
[14]   张力生, 杨美洁, 雷大江 时间序列重要点分割的异常子序列检测[J]. 计算机科学, 2012, 39 (5): 183- 186
ZHANG Li-sheng, YANG Mei-jie, LEI Da-jiang Outlier sub-sequences detection for importance points segmentation of time series[J]. Computer Science, 2012, 39 (5): 183- 186
doi: 10.3969/j.issn.1002-137X.2012.05.043
[15]   BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: ACM, 2000.
[16]   KEOGH E, CHU S, HART D, et al. An online algorithm for segmenting time series [C]//Proceedings of 2001 IEEE International Conference on Data Mining. San Jose: IEEE, 2001: 289-296.
[17]   KEOGH E, CHAKRABARTI K, PAZZANI M, et al Dimensionality reduction for fast similarity search in large time series databases[J]. Knowledge and Information Systems, 2002, 3 (3): 263- 286
[18]   ZHAN P, HU Y, ZHANG Q, et al. Feature-based dividing symbolic time series representation for streaming data processing [C]//Proceedings of the 9th International Conference on Information Technology in Medicine and Education. Hangzhou: IEEE, 2018: 817-823.
[19]   ZHAN P, HU Y, LUO W, et al. Feature-based online segmentation algorithm for streaming time series (short paper) [C]// Proceedings of the 14th EAI International Conference CollaborateCom. Shanghai: Springer, 2018: 477-487.
[20]   YIN J, SI Y W, GONG Z. Financial time series segmentation based on turning points [C]//Proceedings of 2011 International Conference on System Science and Engineering. Macao: IEEE, 2011: 394-399.
[21]   SUN Y, LI J, LIU J, et al An improvement of symbolic aggregate approximation distance measure for time series[J]. Neurocomputing, 2014, 138: 189- 198
doi: 10.1016/j.neucom.2014.01.045
[1] Gong CHEN,Chun-hua ZHENG,Xian-ming WENG,Baustani HAMEED,Hong-hao HU,Xiao-yu MA,Jing-qing LIU. Diagnosis of road drainage inlets’ abnormal condition using multi-hydrological data association analysis[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 55-61.
[2] Chen-lin WANG,Jie YANG,Wen-jun JU,Fu GU,Ji-xi CHEN,Yang-jian JI. Short term load forecasting and peak shaving optimization based on intelligent home appliance[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1418-1424.
[3] Zi-long WANG,Zhu WANG,Zhi-wen YU,Bin GUO,Xing-she ZHOU. Transnational population migration forecast with multi-source data[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1759-1767.
[4] LI Lin-wei, WU Yi-ping, MIAO Fa-sheng. Prediction of non-equidistant landslide displacement time series based on grey wolf support vector machine[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(10): 1998-2006.
[5] WU Jiang-hong, JIANG Feng. Life cycle climate performance of air conditioner based on dynamic loads[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(10): 2061-2069.
[6] CAI Qing lin, CHEN Ling, MEI Han lei, SUN Jian ling. Two-step filtering based time series similarity search[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1290-1297.
[7] WEI Yuan, FENG Tian heng, HUANG Ping jie, HOU Di bo, ZHANG Guang xin. Contamination event detection method based on dynamic correlation analysis of multiple water quality parameters[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1402-1409.
[8] TAN Hailong, LIU Kangling, JIN Xin, SHI Xiang rong, LIANG Jun. Multivariate time series classification based on μσ-DWC feature and tree-structured M-SVM[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(6): 1061-1069.
[9] ZHAO Jian-jun, WANG Yi, YANG Li-bin. Threat assessment method based on time series forecast[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(3): 398-403.
[10] PENG Ling, NIU Rui-qing, WU Ting. Time series analysis and support vector machine for landslide displacement prediction[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(9): 1672-1679.
[11] SUN Zhi-lin, LU Ya-qian, HUANG Sai-hua. Prediction of port throughput based on Markov chain-time
series analysis
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(7): 1289-1294.
[12] TIAN Chen, WANG Qin-hui, CHENG Le-ming, LUO Zhong-yang, NI Ming-jiang. Experimental investigation on local particle volume fractions
distribution in offset-exit circulating fluidized bed
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(4): 577-583.
[13] YU Hai-qing, LIU Yi, CHEN Kun, JI Jun1, LI Ping. Robust recursive kernel learning modeling method with
application to blast furnace
[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(4): 705-711.
[14] LIU Zhen-tao, ZHANG Peng-wei, YU Xiao-li, LI Jian-feng, CHEN Zhan-shan. Fatigue crack diagnosis of engine-block by time series analysis[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(3): 489-493.
[15] WANG Zhi-lei,SUN Hong-yue,LIU Yong-li,SHANG Yue-quan. Time series analysis about groundwater level in slope and rainfall[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(7): 1301-1307.