Network traffic anomaly detection based on feature-based symbolic representation

doi:10.3785/j.issn.1008-973X.2020.07.005

Journal of ZheJiang University (Engineering Science)

2020, Vol. 54

Issue (7): 1281-1288 DOI: 10.3785/j.issn.1008-973X.2020.07.005

Network traffic anomaly detection based on feature-based symbolic representation

Peng ZHAN1,2(

),Lin CHEN1,2,*(

),Lu-hui CAO2,Xue-qing LI1

1. School of Software, Shandong University, Jinan 250100, China
2. Informatization Office, Shandong University, Jinan 250100, China

Download:

HTML

PDF(881KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A network traffic anomaly detection algorithm based on feature-based symbolic representation (NAAD-FD) was proposed in order to accurately detect network traffic anomaly and guarantee network quality. The network traffic data were transformed into feature-based symbolic representation by segmenting data series according to network traffic turning points. Then the seven characteristic values of each subsequence were extracted, which can be used in the proposed distance measure. The network traffic anomaly sequences were detected with density-based algorithm according to the network traffic anomaly definition based on time series. The experimental results for algorithm parameters, simulation data and real network traffic data anomaly detection demonstrate that the proposed algorithm has strong robustness. The validity and stability of the algorithm were verified. The time complexity of the algorithm is significantly reduced by the proposed feature-based symbolic representation, which can accelerate the process of network traffic anomaly detection by around 40%.

Key words： network traffic anomaly time series trend feature symbolic approximation turning point

Received: 19 September 2019 Published: 05 July 2020

CLC:

TP 391

Corresponding Authors: Lin CHEN E-mail: zhanpeng@sdu.edu.cn;chenlin@sdu.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Peng ZHAN
	Lin CHEN
	Lu-hui CAO
	Xue-qing LI

Cite this article:

Peng ZHAN,Lin CHEN,Lu-hui CAO,Xue-qing LI. Network traffic anomaly detection based on feature-based symbolic representation. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1281-1288.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.07.005 OR http://www.zjujournals.com/eng/Y2020/V54/I7/1281

基于特征符号表示的网络异常流量检测算法

为了准确检测网络中的流量异常情况，确保网络正常运行，提出基于特征符号表示的网络异常流量检测算法（NAAD-FD）. NAAD-FD算法利用趋势转折点将网络流量数据按照基于趋势特征的符号表示方法进行转化，按照表示结果将原始数据转化为包含7项特征值的子序列，将7项特征值运用到提出的距离计算方法中；结合基于密度的算法，按照时间序列的网络异常流量定义执行异常检测. 通过对算法参数、仿真数据和真实网络流量数据的实验与分析可知,该算法具有较强的鲁棒性，验证了该算法的有效性和稳定性. 该算法通过降维简化表示，显著降低了算法的时间复杂度，有效加速异常检测过程约40%.

关键词： 网络流量异常, 时间序列, 趋势特征, 符号近似, 转折点

Fig.1 Thought of NAAD-FD

Tab.1 Parameters setting for experiments of comparing influence of alphabet size on mean local outlier factor

Tab.2 Influence of alphabet size on mean local outlier factor

Fig.2 Experiments on selection of k

Tab.3 Parameter settings for anomaly detection simulation experiments based on Gaussian distribution

Fig.3 NAAD-FD algorithm detects anomaly sequences

Fig.4 Network traffic data between October 2018 and June 2019 in Shandong University

Fig.5 Anomaly detection in real network flow data

Fig.6 Average running time of different anomaly detection methods


[1]	ATKINSON A C, HAWKINS D M Identification of outliers[J]. Biometrics, 1981, 37 (4): 860

[2]	BILLOR N, HADI A S, VELLEMAN P F BACON: blocked adaptive computationally efficient outlier nominators[J]. Computational Statistics and Data Analysis, 2000, 34 (3): 279- 298 doi: 10.1016/S0167-9473(99)00101-2

[3]	KNORR E M, NG R T. A unified notion of outliers: properties and computation [C]//International Conference on Knowledge Discovery and Data Mining. California: AAAI, 1997: 219-222.

[4]	GUAN H, LI Q, YAN Z, et al. SLOF: identify density-based local outliers in big data [C]//Web Information System and Application Conference. Jinan: IEEE, 2015.

[5]	MARKOU M, SINGH S Novelty detection: a review—part 2: neural network based approaches[J]. Signal Processing, 2003, 83 (12): 2499- 2521 doi: 10.1016/j.sigpro.2003.07.019

[6]	WANG J S, CHIANG J C A cluster validity measure with outlier detection for support vector clustering[J]. IEEE Transactions on Cybernetics, 2008, 38 (1): 78- 89

[7]	KEOGH E, LIN J, FU A. HOT SAX: efficiently finding the most unusual time series subsequence [C]//5th IEEE International Conference on Data Mining. Houston: IEEE, 2006.

[8]	FU W C, LEUNG T W, KEOGH E J, et al. Finding time series discords based on Haar transform [C]//Advanced Data Mining and Applications, 2nd International Conference. Xi'an: Springer, 2006.

[9]	KHANH N D K, ANH D T. Time series discord discovery using WAT algorithm and iSAX representation [C]// Proceedings of the 3rd Symposium on Information and Communication Technology. Ha Long: ACM, 2012: 207–213.

[10]	SHIEH J, KEOGH E. iSAX: indexing and mining terabyte sized time series [C]// Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Las Vegas: ACM, 2012: 207–213.

[11]	孙梅玉基于距离和密度的时间序列异常检测方法研究[J]. 计算机工程与应用, 2012, 48 (20): 11- 17 SUN Mei-yu Research on discords detect on time series based on distance and density[J]. Computer Engineering and Applications, 2012, 48 (20): 11- 17 doi: 10.3778/j.issn.1002-8331.2012.20.003

[12]	余宇峰, 朱跃龙, 万定生, 等基于滑动窗口预测的水文时间序列异常检测[J]. 计算机应用, 2014, 34 (8): 2217- 2220 YU Yu-feng, ZHU Yue-long, WAN Ding-sheng, et al Time series outlier detection based on sliding window prediction[J]. Journal of Computer Applications, 2014, 34 (8): 2217- 2220 doi: 10.11772/j.issn.1001-9081.2014.08.2217

[13]	周大镯, 刘月芬, 马文秀时间序列异常检测[J]. 计算机工程与应用, 2008, 44 (35): 145- 147 ZHOU Da-zhuo, LIU Yue-fen, MA Wen-xiu Effective time series outlier detection algorithm based on segmentation[J]. Computer Engineering and Applications, 2008, 44 (35): 145- 147 doi: 10.3778/j.issn.1002-8331.2008.35.044

[14]	张力生, 杨美洁, 雷大江时间序列重要点分割的异常子序列检测[J]. 计算机科学, 2012, 39 (5): 183- 186 ZHANG Li-sheng, YANG Mei-jie, LEI Da-jiang Outlier sub-sequences detection for importance points segmentation of time series[J]. Computer Science, 2012, 39 (5): 183- 186 doi: 10.3969/j.issn.1002-137X.2012.05.043

[15]	BREUNIG M M, KRIEGEL H P, NG R T, et al. LOF: identifying density-based local outliers [C]//Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data. Dallas: ACM, 2000.

[16]	KEOGH E, CHU S, HART D, et al. An online algorithm for segmenting time series [C]//Proceedings of 2001 IEEE International Conference on Data Mining. San Jose: IEEE, 2001: 289-296.

[17]	KEOGH E, CHAKRABARTI K, PAZZANI M, et al Dimensionality reduction for fast similarity search in large time series databases[J]. Knowledge and Information Systems, 2002, 3 (3): 263- 286

[18]	ZHAN P, HU Y, ZHANG Q, et al. Feature-based dividing symbolic time series representation for streaming data processing [C]//Proceedings of the 9th International Conference on Information Technology in Medicine and Education. Hangzhou: IEEE, 2018: 817-823.

[19]	ZHAN P, HU Y, LUO W, et al. Feature-based online segmentation algorithm for streaming time series (short paper) [C]// Proceedings of the 14th EAI International Conference CollaborateCom. Shanghai: Springer, 2018: 477-487.

[20]	YIN J, SI Y W, GONG Z. Financial time series segmentation based on turning points [C]//Proceedings of 2011 International Conference on System Science and Engineering. Macao: IEEE, 2011: 394-399.

[21]	SUN Y, LI J, LIU J, et al An improvement of symbolic aggregate approximation distance measure for time series[J]. Neurocomputing, 2014, 138: 189- 198 doi: 10.1016/j.neucom.2014.01.045

[1]	Gong CHEN,Chun-hua ZHENG,Xian-ming WENG,Baustani HAMEED,Hong-hao HU,Xiao-yu MA,Jing-qing LIU. Diagnosis of road drainage inlets’ abnormal condition using multi-hydrological data association analysis[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 55-61.

[2]	Chen-lin WANG,Jie YANG,Wen-jun JU,Fu GU,Ji-xi CHEN,Yang-jian JI. Short term load forecasting and peak shaving optimization based on intelligent home appliance[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1418-1424.

[3]	Zi-long WANG,Zhu WANG,Zhi-wen YU,Bin GUO,Xing-she ZHOU. Transnational population migration forecast with multi-source data[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1759-1767.

[4]	LI Lin-wei, WU Yi-ping, MIAO Fa-sheng. Prediction of non-equidistant landslide displacement time series based on grey wolf support vector machine[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(10): 1998-2006.

[5]	WU Jiang-hong, JIANG Feng. Life cycle climate performance of air conditioner based on dynamic loads[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(10): 2061-2069.

[6]	CAI Qing lin, CHEN Ling, MEI Han lei, SUN Jian ling. Two-step filtering based time series similarity search[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1290-1297.

[7]	WEI Yuan, FENG Tian heng, HUANG Ping jie, HOU Di bo, ZHANG Guang xin. Contamination event detection method based on dynamic correlation analysis of multiple water quality parameters[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1402-1409.

[8]	TAN Hailong, LIU Kangling, JIN Xin, SHI Xiang rong, LIANG Jun. Multivariate time series classification based on μσ-DWC feature and tree-structured M-SVM[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(6): 1061-1069.

[9]	ZHAO Jian-jun, WANG Yi, YANG Li-bin. Threat assessment method based on time series forecast[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(3): 398-403.

[10]	PENG Ling, NIU Rui-qing, WU Ting. Time series analysis and support vector machine for landslide displacement prediction[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(9): 1672-1679.

[11]	SUN Zhi-lin, LU Ya-qian, HUANG Sai-hua. Prediction of port throughput based on Markov chain-time series analysis[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(7): 1289-1294.

[12]	TIAN Chen, WANG Qin-hui, CHENG Le-ming, LUO Zhong-yang, NI Ming-jiang. Experimental investigation on local particle volume fractions distribution in offset-exit circulating fluidized bed[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(4): 577-583.

[13]	YU Hai-qing, LIU Yi, CHEN Kun, JI Jun1, LI Ping. Robust recursive kernel learning modeling method with application to blast furnace[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(4): 705-711.

[14]	LIU Zhen-tao, ZHANG Peng-wei, YU Xiao-li, LI Jian-feng, CHEN Zhan-shan. Fatigue crack diagnosis of engine-block by time series analysis[J]. Journal of ZheJiang University (Engineering Science), 2012, 46(3): 489-493.

[15]	WANG Zhi-lei,SUN Hong-yue,LIU Yong-li,SHANG Yue-quan. Time series analysis about groundwater level in slope and rainfall[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(7): 1301-1307.

Viewed

Full text

Abstract

Cited

Shared

Discussed