1. Information Network Center, Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. Beijing Wrdtech Co. Ltd, Beijing 100876, China
A new template extraction algorithm was designed to handle the template extraction of the composite log, and the algorithm was named composite-log extraction algorithm (CLEA), in order to solve the problem that currently, the composite log cannot be correctly parsed by the template extraction algorithms. Symbols are used to divide all logs into clusters, and the respective log template of each cluster is extracted based on the Drain extraction method. Template extraction results are stored and cached, and the cached template is updated together with the cluster update. The calculation of the difference is introduced into the simple common word algorithm to enhance the sensitivity of the algorithm to different words in the template and calculate the similarity between templates. The BMerge algorithm is designed and used to merge templates with similarity greater than the threshold, and the merged log is got and output as the final result. The difference calculation is introduced into the similarity algorithm, the sensitivity of the algorithm to different words in the template is enhanced, and the BMerge algorithm is designed to merge the templates, and then lossless log is output as result. The proposed method is suitable for processing composite logs with high accuracy.
Fig.1Log classification tree structure example of CLEA algorithm
项目
配置
OS
CentOS release 6.8 (Final)
CPU
Intel (R) Xeon(R) CPU 5110
内存
8 G(4 × 2 G DDR2 667 MHz)
固态硬盘
Samsung 850 EVO SATA III 120 GB
机械硬盘
Seagate 2 TB SATA3 64 MB Cache
网卡
Intel e1000e 1000 Mbps Full Duplex
Tab.1Experimental environment for verifying CLEA log template extraction algorithm
算法
A/%
DNS
DHCP
Dataflow
Huawei
CLEA
34
45
40
55
Drain
27
35
31
40
Tab.2Partition accuracy of CLEA and Drain algorithms on different logs
算法
t/s
DNS
DHCP
Dataflow
Huawei
CLEA
31.81
45.43
2.55
27.70
IPLoM
50.88
73.67
5.17
34.03
SHISO
1 957.14
2 879.23
75.10
1 292.95
Spell
236.44
311.58
23.49
94.64
Drain
35.17
51.01
3.01
29.78
Tab.3Processing time of multiple log template extraction algorithms on different logs
Fig.2Accuracy of multiple log template extraction algorithms to DNS log
Fig.3Accuracy of multiple log template extraction algorithms to Huawei switch log
算法
A/%
DNS
DHCP
Dataflow
Huawei
CLEA
92
100
88
96
IPLoM
64
80
18
69
SHISO
64
80
25
72
Spell
71
90
11
69
Drain
71
80
18
75
Tab.4Final accuracy of multiple log template algorithms on different logs
[1]
崔元, 张琢 基于大规模网络日志的模板提取研究[J]. 计算机科学, 2017, (Suppl.2): 458- 462 CUI Yuan, ZHANG Zhuo Research on template extraction based on large-scale network log[J]. Computer Science, 2017, (Suppl.2): 458- 462
[2]
范惊. 高精度的程序日志解析技术研究[D]. 上海: 上海交通大学, 2013. FAN Jing. Research on high precision program log analysis technology [D]. Shanghai: Shanghai Jiaotong University, 2013.
[3]
张晓箐. 基于海量日志消息的软件系统异常检测技术研究与实现[D]. 西安: 西安电子科技大学, 2015. ZHANG Xiao-jing. Research and implementation of software system anomaly detection technology based on massive log messages [D]. Xi’an: Xidian University, 2015.
[4]
KOBAYASHI S, FUKUDA K, ESAKI H. Towards an NLP-based log template generation algorithm for system log analysis [C]// Proceedings of the 9th International Conference on Future Internet Technologies. Tokyo: ACM, 2014: 11.
[5]
MIZUTANI M. Incremental mining of system log format [C]// Services Computing (SCC), 2013 IEEE International Conference on Santa Clara, CA, USA. Santa Clara: IEEE, 2013: 595-602.
[6]
SHIMA K. Length matters: clustering system log messages using length of words [J/OL]. [2019-09-20]. https://arxiv.org/abs/1611.03213.
[7]
DU M, LI F. Spell: streaming parsing of system event logs [C]// Data Mining (ICDM), 2016 IEEE 16th International Conference on Barcelona, Spain. Barcelona: IEEE, 2016: 859-864.
[8]
HE P, ZHU J, ZHENG Z, et al. Drain: an online log parsing approach with fixed depth tree [C]// Web Services (ICWS), 2017 IEEE International Conference on Honolulu, HI, USA.Honolulu: IEEE, 2017: 33-40.
[9]
MESSAOUDI S, PANICHELLA A, BIANCULLI D, et al. A search-based approach for accurate identification of log message formats [C]// Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC’18). Gothenburg: ACM, 2018.
[10]
ZHANG S, MENG W, BU J, et al. Syslog processing for switch failure diagnosis and prediction in datacenter networks [C]// Quality of Service (IWQoS), 2017 IEEE/ACM 25th International Symposium on Vilanovai la Geltru, Spain. Vilanovai la Geltru: IEEE, 2017: 1-10.
[11]
POGGI N, MUTHUSAMY V, CARRERA D, et al. Business process mining from e-commerce web logs [C]// Business Process Management. Springer, Berlin, Heidelberg∶LNCS, 2013: 65-80.
[12]
LOU J G, FU Q, YANG S, et al. Mining program workflow from interleaved traces [C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington DC: ACM, 2010: 613-622.
[13]
MAKANJU A, ZINCIR-HEYWOOD A N, MILIOS E E A lightweight algorithm for message type extraction in system application logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24 (11): 1921- 1936
[14]
TANG L, LI T. LogTree: a framework for generating system events from raw textual logs [C]// 2010 IEEE International Conference on Data Mining. Sydney: IEEE, 2010: 491-500.