Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (8): 1644-1652    DOI: 10.3785/j.issn.1008-973X.2025.08.011
计算机技术、控制工程、通信技术     
基于异构图表征的源代码漏洞检测方法
张学军1(),梁书滨1,白万荣2,张奉鹤1,黄海燕1,郭梅凤1,陈卓1
1. 兰州交通大学 电子与信息工程学院,甘肃 兰州 730070
2. 国网甘肃省电力公司 电力科学研究院,甘肃 兰州 730070
Source code vulnerability detection method based on heterogeneous graph representation
Xuejun ZHANG1(),Shubin LIANG1,Wanrong BAI2,Fenghe ZHANG1,Haiyan HUANG1,Meifeng GUO1,Zhuo CHEN1
1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. Electric Power Research Institute, State Grid Gansu Electric Power Company, Lanzhou 730070, China
 全文: PDF(778 KB)   HTML
摘要:

针对现有的源代码漏洞检测模型对异构特征和底层信息提取不足导致的检测准确率不高的问题,提出基于异构图表征的源代码漏洞检测方法. 从中间代码表示(IR)中提取8种指令级特征作为程序依赖图的节点嵌入,解决底层信息提取不足的问题. 在节点层和依赖层分别构建基于注意力机制的聚合模块来提取图表征数据中的异构性特征,通过调整注意力系数捕获关键节点信息. 对图数据的聚合结果进行分类,预测是否存在漏洞. 在合成数据集和2个真实项目数据集上的实验表明,相比于现有方法,本文方法具有更强的异构特征提取能力和更高的漏洞检测综合性能.

关键词: 漏洞检测图表征注意力机制异构特征中间代码表示    
Abstract:

A source code vulnerability detection method based on heterogeneous graph representation was proposed aiming at the problem of low detection accuracy caused by the insufficient extraction of heterogeneous features and low-level information in the existing source code vulnerability detection models. Eight instruction-level features were extracted from the intermediate code representation (IR) to serve as node embeddings for the program dependence graph, which addressed the issue of missing low-level information. Attention aggregation mechanisms were constructed at the node level and dependency level respectively to extract heterogeneous features, and information of key nodes was extracted by adjusting attention coefficients. The aggregated results of the graph data were classified to predict the presence of vulnerability. The experiments on synthetic data sets and two real project data sets show that the proposed method has stronger capabilities in extracting heterogeneous features and higher comprehensive performance in vulnerability detection compared with existing methods.

Key words: vulnerability detection    graph representation    attention mechanism    heterogeneous feature    intermediate code representation
收稿日期: 2024-08-21 出版日期: 2025-07-28
:  TP 391  
基金资助: 国家自然科学基金资助项目(61762058, 62461032);甘肃省教育厅产业支撑项目(2022CYZC-38);甘肃省重点研发计划资助项目(25YEFA089);国家电网科技资助项目(W32KJ2722010, 522722220013).
作者简介: 张学军(1977—),男,教授,博导,从事网络安全、数据隐私与机器学习的研究. orcid.org/0000-0002-0350-359X. E-mail:xuejunzhang@mail.lzjtu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张学军
梁书滨
白万荣
张奉鹤
黄海燕
郭梅凤
陈卓

引用本文:

张学军,梁书滨,白万荣,张奉鹤,黄海燕,郭梅凤,陈卓. 基于异构图表征的源代码漏洞检测方法[J]. 浙江大学学报(工学版), 2025, 59(8): 1644-1652.

Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.08.011        https://www.zjujournals.com/eng/CN/Y2025/V59/I8/1644

图 1  VulHetG方法的整体架构
类型LLVM-IR数量源代码(举例)
算数指令add,sub…12+, ?, *, /, %
位指令shl,lshr…6<<, >>, &, |, ^
转换指令trunc,zext…9char b = 97
内存指令load,store…3free *ptr
比较指令icmp,fcmp…2>, <, ==
分支指令call,ret,br…3goto label
异常处理landingpad…2std::exception
向量指令llvm.vector…3vec[3] = 5
原子指令atomicrmw…3fetch_add()
聚合指令insertvalue…4struct S{float x}
其他指令select,phi…3condition ? a : b
表 1  “操作指令”与源代码信息的对比
图 2  漏洞检测模型的结构
CWE编号AccPF1R
CWE-47696.8797.5296.9796.43
CWE-70695.7395.8195.5594.85
CWE-11996.4196.4997.3598.23
CWE-40496.5594.8396.4998.21
CWE-66596.9297.0697.0697.06
CWE-07495.3896.9795.5294.11
CWE-02093.7593.8593.8893.91
CWE-40095.4593.7595.7497.82
CWE-31186.6690.5986.2080.64
CWE-70495.8396.0296.0296.02
表 2  对10类漏洞的检测
模型CWE-399CWE-119
Acc/%P/%F1/%DT/sAcc/%P/%F1/%DT/s
VulDeePecker71.2975.1871.310.88573.0676.2473.310.891
SySeVR75.9480.5678.241.09279.8084.6279.201.105
mVulSniffer92.7386.3585.901.38095.3293.0988.361.378
FUNDED93.2893.7093.412.15693.8094.4094.382.108
VDoTR94.1294.2794.252.36194.7596.8195.652.258
VulHetG94.0594.1694.622.17396.0196.1395.872.112
表 3  对CWE-399和CWE-119的检测性能对比
模型AEFCAUPU
AccPRAccPRAccPRAccPR
VulDeePecker66.3070.3268.5465.6071.1469.3864.6869.0370.2070.0675.1970.31
SySeVR78.7481.1277.6875.6983.1180.0374.1479.2777.7779.4381.2179.45
mVulSniffer92.6093.0288.3092.7490.6983.1792.1074.7783.1791.5091.0792.69
FUNDED93.8194.4092.1393.7395.7494.3094.2290.6792.4592.4191.8992.62
VDoTR94.0495.1494.0595.1795.3390.1594.3594.7994.6492.0590.4887.83
VulHetG95.1296.2695.1796.0596.1893.4594.3394.6194.7093.1892.1492.45
表 4  对4种语法规则漏洞的检测性能对比
模型AccPRF11?FNR1?FPR
μVulDeePecker35.8936.0135.7936.2242.7952.94
SySeVR45.3252.4045.5548.7445.5552.41
mVulSniffer81.2579.2580.3180.2879.2572.25
VulHetG80.4782.8784.7681.0689.1483.87
表 5  真实数据集Devign上的检测
18 程靖云, 王布宏, 罗鹏 基于图表示和MHGAT的代码漏洞静态检测方法[J]. 系统工程与电子技术, 2023, 45 (5): 1535- 1543
CHENG Jingyun, WANG Buhong, LUO Peng Code vulnerability static detection method based on graph representation and MHGAT[J]. Journal of Systems Engineering and Electronics, 2023, 45 (5): 1535- 1543
19 MARCELO A, FRANCISCO C, FRANCISCO B. An user configurable clang static analyzer taint checker [C] //Proceedings of the 35th International Conference of the Chilean Computer Science Society. Valparaíso: IEEE, 2016: 10-14.
20 YAN Y T, PAN Z L, YU L, et al. Research on the influencing factors of LLVM IR optimization effect [C] //Proceedings of the 3rd International Conference on Information Technology, Big Data and Artificial Intelligence. Chongqing: IEEE 2023: 756-763.
21 PEELER H, LI S, SLOSS A N, et al. Optimizing LLVM pass sequences with shackleton: a linear genetic programming framework [C] //Proceedings of the 2022 Genetic and Evolutionary Computation Conference Companion. Boston: ACM, 2022: 578-581.
22 MIRSKY Y, MACON G, BROWN M, et al. VulChecker: graph-based vulnerability localization in source code [C] //Proceedings of the 32nd Usenix Security Symposium. Anaheim: ACM, 2023: 6557-6574.
23 GILPIN L H, BAU D, YUAN B Z, et al. Explaining explanations: an overview of interpretability of machine learning [C] //Proceedings of the 5th International Conference on Data Science and Advanced Analytics. Turin: IEEE, 2018: 80-89.
24 CHAKRABORTY S, KRISHNA R, DING Y, et al Deep learning based vulnerability detection: are we there yet?[J]. IEEE Transactions on Software Engineering, 2020, 9 (48): 3280- 3296
25 张学军, 张奉鹤, 盖继扬, 等 mVulSniffer: 一种多类型源代码漏洞检测方法[J]. 通信学报, 2023, 44 (9): 149- 160
ZHANG Xuejun, ZHANG Fenghe, GAI Jiyang, et al mVulSniffer: a multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44 (9): 149- 160
26 YANG G A review of machine learning-based zero-day attack detection: challenges and future directions[J]. The International Journal for the Computer and Telecommunications Industry, 2023, (198): 175- 185
27 ROMERA P B, TORR P. An embarrassingly simple approach to zero-shot learning [C] //Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 100-109.
1 田笑, 常继友, 张弛, 等 开源软件缺陷预测方法综述[J]. 计算机研究与发展, 2023, 60 (7): 1467- 1488
TIAN Xiao, CHANG Jiyou, ZHANG Chi, et al Survey of open-source software defect prediction method[J]. Journal of Computer Research and Development, 2023, 60 (7): 1467- 1488
2 ZHANG X W, ZHOU Y, TAN S H, et al. Efficient pattern-based static analysis approach via regular-expression rules [C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. Taipa: IEEE, 2023: 132-143.
3 CHEN D, ZHANG Y D, WEI W, et al Efficient vulnerability detection based on an optimized rule checking static analysis technique[J]. Frontiers of Information Technology and Electronic Engineering, 2017, 18 (3): 332- 345
doi: 10.1631/FITEE.1500379
4 苏小红, 郑伟宁, 蒋远, 等 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47 (2): 337- 374
SU Xiaohong ZHEN Weining, JIANG Yuan, et al Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47 (2): 337- 374
5 LI Z, ZOU D Q, XU S H, et al. VulDeePecker: a deep learning based system for vulnerability detection [C] //Proceedings of the 25th Network and Distributed System Security Symposium. San Diego: IEEE, 2018.
6 LI Z, ZOU D Q, XU S H, et al SySeVR: a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2244- 2258
doi: 10.1109/TDSC.2021.3051525
7 LI Z, ZOU D Q, XU S H, et al VulDeeLocator: a deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2821- 2837
doi: 10.1109/TDSC.2021.3076142
8 WU Y M, ZOU D Q, DOU S H, et al. VulCNN: an image inspired scalable vulnerability detection system [C] //Proceedings of the 44th International Conference on Software Engineering, Pittsburgh: ACM, 2022: 2365-2376.
9 ZHOU Y Q, LIU S Q, DU X N, et al Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Neural Information Processing Systems, 2019, 2019 (32): 10197- 10207
10 WANG H T, YE G X, TANG Z Y, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2021, 2021 (16): 1943- 1958
11 FAN Y H, WAN C H, FU C, et al VDoTR: vulnerability detection based on tensor representation of comprehensive code graphs[J]. Computers and Security, 2023, 2023 (130): 103247
12 ALOMAR E A, ALOMAR S A, MKAOUER M W. On the use of static analysis to engage students with software quality improvement: an experience with PMD [C] //Proceeding of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training.Melbourne: IEEE, 2023: 179-191.
13 BARTA B, MANZ G, SIKET I, et al. Challenges of sonarqube plug-in maintenance [C] //Proceedings of the 26th International Conference on Software Analysis, Evolution, and Reengineering. Hangzhou: IEEE, 2019: 574-578.
14 PERL H, DECHAND S, SMITH M, et al. VCCFinder: finding potential vulnerabilities in open-source projects to assist code audits [C] //Proceedings of the 22nd ACM Conference on Computer and Communications Security. Denver: ACM, 2015: 426-437.
15 ZOU D Q, XU S H, WANG S J, et al μVulDeePecker: a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18 (5): 2224- 2236
16 FENG Q, FENG C D, HONG W J, et al. Graph neural network-based vulnerability predication [C] //Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. Adelaide: IEEE, 2020: 800-801.
17 徐泽鑫, 段立娟, 王文健, 等 基于上下文特征融合的代码漏洞检测方法[J]. 浙江大学学报: 工学版, 2022, 56 (11): 2260- 2270
XU Zexin, DUAN Lijuan, WANG Wenjian, et al Code vulnerability detection method based on contextual feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (11): 2260- 2270
[1] 杨荣泰,邵玉斌,杜庆治. 基于结构感知的少样本知识补全[J]. 浙江大学学报(工学版), 2025, 59(7): 1394-1402.
[2] 杨宇豪,郭永存,李德永,王爽. 基于视觉信息的煤矸识别分割定位方法[J]. 浙江大学学报(工学版), 2025, 59(7): 1421-1433.
[3] 王圣举,张赞. 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报(工学版), 2025, 59(7): 1471-1480.
[4] 蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[5] 鞠文博,董华军. 基于上下文信息融合与动态采样的主板缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1159-1168.
[6] 周翔宇,刘毅志,赵肄江,廖祝华,张德城. 面向目的地预测的层次化空间嵌入BiGRU模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1211-1218.
[7] 李宗民,徐畅,白云,鲜世洋,戎光彩. 面向点云理解的双邻域图卷积方法[J]. 浙江大学学报(工学版), 2025, 59(5): 879-889.
[8] 刘洪伟,王磊,刘阳,张鹏超,乔石. 基于重组二次分解及LSTNet-Atten的短期负荷预测[J]. 浙江大学学报(工学版), 2025, 59(5): 1051-1062.
[9] 刘登峰,郭文静,陈世海. 基于内容引导注意力的车道线检测网络[J]. 浙江大学学报(工学版), 2025, 59(3): 451-459.
[10] 姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.
[11] 尹向雷,屈少鹏,解永芳,苏妮. 基于渐进特征融合及多尺度空洞注意力的遮挡鸟巢检测[J]. 浙江大学学报(工学版), 2025, 59(3): 535-545.
[12] 薛雅丽,贺怡铭,崔闪,欧阳权. 基于改进YOLOv5的SAR图像有向舰船目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(2): 261-268.
[13] 杨冰,徐楚阳,姚金良,向学勤. 基于单目RGB图像的三维手部姿态估计方法[J]. 浙江大学学报(工学版), 2025, 59(1): 18-26.
[14] 李灿林,王新玥,马利庄,邵志文,张文娇. 融合注意力机制和结构线提取的图像卡通化[J]. 浙江大学学报(工学版), 2024, 58(8): 1728-1737.
[15] 李忠良,陈麒,石琳,杨朝,邹先明. 时间感知组合的动态知识图谱补全[J]. 浙江大学学报(工学版), 2024, 58(8): 1738-1747.