Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2020, Vol. 54 Issue (8): 1557-1561    DOI: 10.3785/j.issn.1008-973X.2020.08.014
    
A template extraction method for composite log
Qi WU(),Xiao-hong HUANG,Yan MA(),Qun CONG
1. Information Network Center, Institute of Network Technology, Beijing University of Posts and Telecommunications, Beijing 100876, China
2. Beijing Wrdtech Co. Ltd, Beijing 100876, China
Download: HTML     PDF(581KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new template extraction algorithm was designed to handle the template extraction of the composite log, and the algorithm was named composite-log extraction algorithm (CLEA), in order to solve the problem that currently, the composite log cannot be correctly parsed by the template extraction algorithms. Symbols are used to divide all logs into clusters, and the respective log template of each cluster is extracted based on the Drain extraction method. Template extraction results are stored and cached, and the cached template is updated together with the cluster update. The calculation of the difference is introduced into the simple common word algorithm to enhance the sensitivity of the algorithm to different words in the template and calculate the similarity between templates. The BMerge algorithm is designed and used to merge templates with similarity greater than the threshold, and the merged log is got and output as the final result. The difference calculation is introduced into the similarity algorithm, the sensitivity of the algorithm to different words in the template is enhanced, and the BMerge algorithm is designed to merge the templates, and then lossless log is output as result. The proposed method is suitable for processing composite logs with high accuracy.



Key wordstemplate extraction      composite log      simple common word      similarity      Json      log extraction     
Received: 24 September 2019      Published: 28 August 2020
CLC:  TP 301  
Corresponding Authors: Yan MA     E-mail: njwuqi123@126.com;mayan@bupt.edu.cn
Cite this article:

Qi WU,Xiao-hong HUANG,Yan MA,Qun CONG. A template extraction method for composite log. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1557-1561.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2020.08.014     OR     http://www.zjujournals.com/eng/Y2020/V54/I8/1557


复合型日志的模板提取方法

为了解决目前复合型日志无法被模板提取算法正确解析的问题,设计新的模板提取算法CLEA来处理复合型日志的模板提取. 该算法使用符号将所有日志划分为集群,基于Drain模板提取算法提取每个集群各自的日志模板,存储并缓存模板提取结果,在更新集群的同时更新缓存的模板;将差异度计算引入简单共有词算法中,增强简单共有词算法对模板中不同词语的敏感度,计算模板之间的相似度;设计BMerge算法,利用该算法对相似度大于阈值的模板进行合并,获取并输出合并日志作为最终结果. 在相似度算法中引入差异度计算,增强算法对模板中不同词语的敏感度,并设计BMerge算法对模板进行合并,输出无损日志作为结果. 所提方法适用于处理复合型日志,且正确率较高.


关键词: 模板提取,  复合型日志,  简单共有词,  相似度,  Json,  日志提取 
Fig.1 Log classification tree structure example of CLEA algorithm
项目 配置
OS CentOS release 6.8 (Final)
CPU Intel (R) Xeon(R) CPU 5110
内存 8 G(4 × 2 G DDR2 667 MHz)
固态硬盘 Samsung 850 EVO SATA III 120 GB
机械硬盘 Seagate 2 TB SATA3 64 MB Cache
网卡 Intel e1000e 1000 Mbps Full Duplex
Tab.1 Experimental environment for verifying CLEA log template extraction algorithm
算法 A/%
DNS DHCP Dataflow Huawei
CLEA 34 45 40 55
Drain 27 35 31 40
Tab.2 Partition accuracy of CLEA and Drain algorithms on different logs
算法 t/s
DNS DHCP Dataflow Huawei
CLEA 31.81 45.43 2.55 27.70
IPLoM 50.88 73.67 5.17 34.03
SHISO 1 957.14 2 879.23 75.10 1 292.95
Spell 236.44 311.58 23.49 94.64
Drain 35.17 51.01 3.01 29.78
Tab.3 Processing time of multiple log template extraction algorithms on different logs
Fig.2 Accuracy of multiple log template extraction algorithms to DNS log
Fig.3 Accuracy of multiple log template extraction algorithms to Huawei switch log
算法 A/%
DNS DHCP Dataflow Huawei
CLEA 92 100 88 96
IPLoM 64 80 18 69
SHISO 64 80 25 72
Spell 71 90 11 69
Drain 71 80 18 75
Tab.4 Final accuracy of multiple log template algorithms on different logs
[1]   崔元, 张琢 基于大规模网络日志的模板提取研究[J]. 计算机科学, 2017, (Suppl.2): 458- 462
CUI Yuan, ZHANG Zhuo Research on template extraction based on large-scale network log[J]. Computer Science, 2017, (Suppl.2): 458- 462
[2]   范惊. 高精度的程序日志解析技术研究[D]. 上海: 上海交通大学, 2013.
FAN Jing. Research on high precision program log analysis technology [D]. Shanghai: Shanghai Jiaotong University, 2013.
[3]   张晓箐. 基于海量日志消息的软件系统异常检测技术研究与实现[D]. 西安: 西安电子科技大学, 2015.
ZHANG Xiao-jing. Research and implementation of software system anomaly detection technology based on massive log messages [D]. Xi’an: Xidian University, 2015.
[4]   KOBAYASHI S, FUKUDA K, ESAKI H. Towards an NLP-based log template generation algorithm for system log analysis [C]// Proceedings of the 9th International Conference on Future Internet Technologies. Tokyo: ACM, 2014: 11.
[5]   MIZUTANI M. Incremental mining of system log format [C]// Services Computing (SCC), 2013 IEEE International Conference on Santa Clara, CA, USA. Santa Clara: IEEE, 2013: 595-602.
[6]   SHIMA K. Length matters: clustering system log messages using length of words [J/OL]. [2019-09-20]. https://arxiv.org/abs/1611.03213.
[7]   DU M, LI F. Spell: streaming parsing of system event logs [C]// Data Mining (ICDM), 2016 IEEE 16th International Conference on Barcelona, Spain. Barcelona: IEEE, 2016: 859-864.
[8]   HE P, ZHU J, ZHENG Z, et al. Drain: an online log parsing approach with fixed depth tree [C]// Web Services (ICWS), 2017 IEEE International Conference on Honolulu, HI, USA.Honolulu: IEEE, 2017: 33-40.
[9]   MESSAOUDI S, PANICHELLA A, BIANCULLI D, et al. A search-based approach for accurate identification of log message formats [C]// Proceedings of the 26th IEEE/ACM International Conference on Program Comprehension (ICPC’18). Gothenburg: ACM, 2018.
[10]   ZHANG S, MENG W, BU J, et al. Syslog processing for switch failure diagnosis and prediction in datacenter networks [C]// Quality of Service (IWQoS), 2017 IEEE/ACM 25th International Symposium on Vilanovai la Geltru, Spain. Vilanovai la Geltru: IEEE, 2017: 1-10.
[11]   POGGI N, MUTHUSAMY V, CARRERA D, et al. Business process mining from e-commerce web logs [C]// Business Process Management. Springer, Berlin, Heidelberg∶LNCS, 2013: 65-80.
[12]   LOU J G, FU Q, YANG S, et al. Mining program workflow from interleaved traces [C]// Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Washington DC: ACM, 2010: 613-622.
[13]   MAKANJU A, ZINCIR-HEYWOOD A N, MILIOS E E A lightweight algorithm for message type extraction in system application logs[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 24 (11): 1921- 1936
[14]   TANG L, LI T. LogTree: a framework for generating system events from raw textual logs [C]// 2010 IEEE International Conference on Data Mining. Sydney: IEEE, 2010: 491-500.
[1] Huang-yao ZENG,Dan-dan LI,Yan MA,Qun CONG. Risky accounts evaluation method of campus network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1761-1767.
[2] Ting-ting ZHAO,zhe WANG,Yi-nan LU. Heterogeneous information network representation learning based on transition probability matrix (HINtpm)[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 548-554.
[3] LIU Zhen, WU Ze-hui, CAO Yan, WEI Qiang. Software vulnerable code reuse detection method based on vulnerability fingerprint[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(11): 2180-2190.
[4] XU Rong-bin, SHI Jun, ZHANG Peng-fei, XIE Ying. Similarity measurement of transition mapping relation using Petri net[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(6): 1205-1213.
[5] JING Yao, GUO Bin, WANG Zhu, YU Zhi-wen, ZHOU Xing-she. CrowdReview: personalized product review presentation based on crowd intelligence mining[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(4): 675-681.
[6] DAI Cai-yan, CHEN Ling, LI Bin, CHEN Bo-lun. Sampling-based link prediction in complex networks[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(3): 554-561.
[7] GUI Long-hui, XIE Ji-ming, LIN Ying-zi, ZHANG Hong-wei. Aeroelastic model study of cantilever skywalk bridge[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(11): 2121-2129.
[8] CAI Qing lin, CHEN Ling, MEI Han lei, SUN Jian ling. Two-step filtering based time series similarity search[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1290-1297.
[9] GUO Jing feng,LIU Miao miao,LUO Xu. Link prediction based on similarity of nodes of multipath in weighted social networks[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1347-1352.
[10] ZENG Xing, ZHAN Liang tong, ZHONG xiao le, CHEN Yun min. Similarity of centrifuge modeling of chloride dispersion in low permeability clay[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(2): 241-249.
[11] BAI Fan, ZHENG Hui feng, SHEN Ping ping, WANG Cheng, YU Sang sang. Plant species identification method based on flower feature coding classification[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(10): 1902-1908.
[12] CUI Guang-mang, ZHAO Ju-feng,FENG Hua-jun, XU Zhi-hai,LI Qi, CHEN Yue-ting. Construction of fast simulation model for degraded image by inhomogeneous medium[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(2): 303-311.
[13] JIANG Zhan, YAO Xiao-ming, LIN Lan-fen. Feature-based adaptive method of ontology mapping[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(1): 76-84.
[14] HU Zhong-kai, ZHENG Xiao-lin, WU Ya-feng, CHEN De-ren. Product recommendation algorithm based on users’ reviews mining[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1475-1485.
[15] YANG Bing, XU Duan-qing, YANG Xin, ZHAO Lei, TANG Da-wei. Painting image classification based on aesthetic style similarity rule[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1486-1492.