|
|
Source code vulnerability detection method based on heterogeneous graph representation |
Xuejun ZHANG1( ),Shubin LIANG1,Wanrong BAI2,Fenghe ZHANG1,Haiyan HUANG1,Meifeng GUO1,Zhuo CHEN1 |
1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China 2. Electric Power Research Institute, State Grid Gansu Electric Power Company, Lanzhou 730070, China |
|
|
Abstract A source code vulnerability detection method based on heterogeneous graph representation was proposed aiming at the problem of low detection accuracy caused by the insufficient extraction of heterogeneous features and low-level information in the existing source code vulnerability detection models. Eight instruction-level features were extracted from the intermediate code representation (IR) to serve as node embeddings for the program dependence graph, which addressed the issue of missing low-level information. Attention aggregation mechanisms were constructed at the node level and dependency level respectively to extract heterogeneous features, and information of key nodes was extracted by adjusting attention coefficients. The aggregated results of the graph data were classified to predict the presence of vulnerability. The experiments on synthetic data sets and two real project data sets show that the proposed method has stronger capabilities in extracting heterogeneous features and higher comprehensive performance in vulnerability detection compared with existing methods.
|
Received: 21 August 2024
Published: 28 July 2025
|
|
Fund: 国家自然科学基金资助项目(61762058, 62461032);甘肃省教育厅产业支撑项目(2022CYZC-38);甘肃省重点研发计划资助项目(25YEFA089);国家电网科技资助项目(W32KJ2722010, 522722220013). |
基于异构图表征的源代码漏洞检测方法
针对现有的源代码漏洞检测模型对异构特征和底层信息提取不足导致的检测准确率不高的问题,提出基于异构图表征的源代码漏洞检测方法. 从中间代码表示(IR)中提取8种指令级特征作为程序依赖图的节点嵌入,解决底层信息提取不足的问题. 在节点层和依赖层分别构建基于注意力机制的聚合模块来提取图表征数据中的异构性特征,通过调整注意力系数捕获关键节点信息. 对图数据的聚合结果进行分类,预测是否存在漏洞. 在合成数据集和2个真实项目数据集上的实验表明,相比于现有方法,本文方法具有更强的异构特征提取能力和更高的漏洞检测综合性能.
关键词:
漏洞检测,
图表征,
注意力机制,
异构特征,
中间代码表示
|
|
[18] |
程靖云, 王布宏, 罗鹏 基于图表示和MHGAT的代码漏洞静态检测方法[J]. 系统工程与电子技术, 2023, 45 (5): 1535- 1543 CHENG Jingyun, WANG Buhong, LUO Peng Code vulnerability static detection method based on graph representation and MHGAT[J]. Journal of Systems Engineering and Electronics, 2023, 45 (5): 1535- 1543
|
|
|
[19] |
MARCELO A, FRANCISCO C, FRANCISCO B. An user configurable clang static analyzer taint checker [C] //Proceedings of the 35th International Conference of the Chilean Computer Science Society. Valparaíso: IEEE, 2016: 10-14.
|
|
|
[20] |
YAN Y T, PAN Z L, YU L, et al. Research on the influencing factors of LLVM IR optimization effect [C] //Proceedings of the 3rd International Conference on Information Technology, Big Data and Artificial Intelligence. Chongqing: IEEE 2023: 756-763.
|
|
|
[21] |
PEELER H, LI S, SLOSS A N, et al. Optimizing LLVM pass sequences with shackleton: a linear genetic programming framework [C] //Proceedings of the 2022 Genetic and Evolutionary Computation Conference Companion. Boston: ACM, 2022: 578-581.
|
|
|
[22] |
MIRSKY Y, MACON G, BROWN M, et al. VulChecker: graph-based vulnerability localization in source code [C] //Proceedings of the 32nd Usenix Security Symposium. Anaheim: ACM, 2023: 6557-6574.
|
|
|
[23] |
GILPIN L H, BAU D, YUAN B Z, et al. Explaining explanations: an overview of interpretability of machine learning [C] //Proceedings of the 5th International Conference on Data Science and Advanced Analytics. Turin: IEEE, 2018: 80-89.
|
|
|
[24] |
CHAKRABORTY S, KRISHNA R, DING Y, et al Deep learning based vulnerability detection: are we there yet?[J]. IEEE Transactions on Software Engineering, 2020, 9 (48): 3280- 3296
|
|
|
[25] |
张学军, 张奉鹤, 盖继扬, 等 mVulSniffer: 一种多类型源代码漏洞检测方法[J]. 通信学报, 2023, 44 (9): 149- 160 ZHANG Xuejun, ZHANG Fenghe, GAI Jiyang, et al mVulSniffer: a multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44 (9): 149- 160
|
|
|
[26] |
YANG G A review of machine learning-based zero-day attack detection: challenges and future directions[J]. The International Journal for the Computer and Telecommunications Industry, 2023, (198): 175- 185
|
|
|
[27] |
ROMERA P B, TORR P. An embarrassingly simple approach to zero-shot learning [C] //Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 100-109.
|
|
|
[1] |
田笑, 常继友, 张弛, 等 开源软件缺陷预测方法综述[J]. 计算机研究与发展, 2023, 60 (7): 1467- 1488 TIAN Xiao, CHANG Jiyou, ZHANG Chi, et al Survey of open-source software defect prediction method[J]. Journal of Computer Research and Development, 2023, 60 (7): 1467- 1488
|
|
|
[2] |
ZHANG X W, ZHOU Y, TAN S H, et al. Efficient pattern-based static analysis approach via regular-expression rules [C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. Taipa: IEEE, 2023: 132-143.
|
|
|
[3] |
CHEN D, ZHANG Y D, WEI W, et al Efficient vulnerability detection based on an optimized rule checking static analysis technique[J]. Frontiers of Information Technology and Electronic Engineering, 2017, 18 (3): 332- 345
doi: 10.1631/FITEE.1500379
|
|
|
[4] |
苏小红, 郑伟宁, 蒋远, 等 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47 (2): 337- 374 SU Xiaohong ZHEN Weining, JIANG Yuan, et al Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47 (2): 337- 374
|
|
|
[5] |
LI Z, ZOU D Q, XU S H, et al. VulDeePecker: a deep learning based system for vulnerability detection [C] //Proceedings of the 25th Network and Distributed System Security Symposium. San Diego: IEEE, 2018.
|
|
|
[6] |
LI Z, ZOU D Q, XU S H, et al SySeVR: a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2244- 2258
doi: 10.1109/TDSC.2021.3051525
|
|
|
[7] |
LI Z, ZOU D Q, XU S H, et al VulDeeLocator: a deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2821- 2837
doi: 10.1109/TDSC.2021.3076142
|
|
|
[8] |
WU Y M, ZOU D Q, DOU S H, et al. VulCNN: an image inspired scalable vulnerability detection system [C] //Proceedings of the 44th International Conference on Software Engineering, Pittsburgh: ACM, 2022: 2365-2376.
|
|
|
[9] |
ZHOU Y Q, LIU S Q, DU X N, et al Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Neural Information Processing Systems, 2019, 2019 (32): 10197- 10207
|
|
|
[10] |
WANG H T, YE G X, TANG Z Y, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2021, 2021 (16): 1943- 1958
|
|
|
[11] |
FAN Y H, WAN C H, FU C, et al VDoTR: vulnerability detection based on tensor representation of comprehensive code graphs[J]. Computers and Security, 2023, 2023 (130): 103247
|
|
|
[12] |
ALOMAR E A, ALOMAR S A, MKAOUER M W. On the use of static analysis to engage students with software quality improvement: an experience with PMD [C] //Proceeding of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training.Melbourne: IEEE, 2023: 179-191.
|
|
|
[13] |
BARTA B, MANZ G, SIKET I, et al. Challenges of sonarqube plug-in maintenance [C] //Proceedings of the 26th International Conference on Software Analysis, Evolution, and Reengineering. Hangzhou: IEEE, 2019: 574-578.
|
|
|
[14] |
PERL H, DECHAND S, SMITH M, et al. VCCFinder: finding potential vulnerabilities in open-source projects to assist code audits [C] //Proceedings of the 22nd ACM Conference on Computer and Communications Security. Denver: ACM, 2015: 426-437.
|
|
|
[15] |
ZOU D Q, XU S H, WANG S J, et al μVulDeePecker: a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18 (5): 2224- 2236
|
|
|
[16] |
FENG Q, FENG C D, HONG W J, et al. Graph neural network-based vulnerability predication [C] //Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. Adelaide: IEEE, 2020: 800-801.
|
|
|
[17] |
徐泽鑫, 段立娟, 王文健, 等 基于上下文特征融合的代码漏洞检测方法[J]. 浙江大学学报: 工学版, 2022, 56 (11): 2260- 2270 XU Zexin, DUAN Lijuan, WANG Wenjian, et al Code vulnerability detection method based on contextual feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (11): 2260- 2270
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|