Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2025, Vol. 59 Issue (8): 1644-1652    DOI: 10.3785/j.issn.1008-973X.2025.08.011
    
Source code vulnerability detection method based on heterogeneous graph representation
Xuejun ZHANG1(),Shubin LIANG1,Wanrong BAI2,Fenghe ZHANG1,Haiyan HUANG1,Meifeng GUO1,Zhuo CHEN1
1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. Electric Power Research Institute, State Grid Gansu Electric Power Company, Lanzhou 730070, China
Download: HTML     PDF(778KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A source code vulnerability detection method based on heterogeneous graph representation was proposed aiming at the problem of low detection accuracy caused by the insufficient extraction of heterogeneous features and low-level information in the existing source code vulnerability detection models. Eight instruction-level features were extracted from the intermediate code representation (IR) to serve as node embeddings for the program dependence graph, which addressed the issue of missing low-level information. Attention aggregation mechanisms were constructed at the node level and dependency level respectively to extract heterogeneous features, and information of key nodes was extracted by adjusting attention coefficients. The aggregated results of the graph data were classified to predict the presence of vulnerability. The experiments on synthetic data sets and two real project data sets show that the proposed method has stronger capabilities in extracting heterogeneous features and higher comprehensive performance in vulnerability detection compared with existing methods.



Key wordsvulnerability detection      graph representation      attention mechanism      heterogeneous feature      intermediate code representation     
Received: 21 August 2024      Published: 28 July 2025
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(61762058, 62461032);甘肃省教育厅产业支撑项目(2022CYZC-38);甘肃省重点研发计划资助项目(25YEFA089);国家电网科技资助项目(W32KJ2722010, 522722220013).
Cite this article:

Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.08.011     OR     https://www.zjujournals.com/eng/Y2025/V59/I8/1644


基于异构图表征的源代码漏洞检测方法

针对现有的源代码漏洞检测模型对异构特征和底层信息提取不足导致的检测准确率不高的问题,提出基于异构图表征的源代码漏洞检测方法. 从中间代码表示(IR)中提取8种指令级特征作为程序依赖图的节点嵌入,解决底层信息提取不足的问题. 在节点层和依赖层分别构建基于注意力机制的聚合模块来提取图表征数据中的异构性特征,通过调整注意力系数捕获关键节点信息. 对图数据的聚合结果进行分类,预测是否存在漏洞. 在合成数据集和2个真实项目数据集上的实验表明,相比于现有方法,本文方法具有更强的异构特征提取能力和更高的漏洞检测综合性能.


关键词: 漏洞检测,  图表征,  注意力机制,  异构特征,  中间代码表示 
Fig.1 Framework of VulHetG method
类型LLVM-IR数量源代码(举例)
算数指令add,sub…12+, ?, *, /, %
位指令shl,lshr…6<<, >>, &, |, ^
转换指令trunc,zext…9char b = 97
内存指令load,store…3free *ptr
比较指令icmp,fcmp…2>, <, ==
分支指令call,ret,br…3goto label
异常处理landingpad…2std::exception
向量指令llvm.vector…3vec[3] = 5
原子指令atomicrmw…3fetch_add()
聚合指令insertvalue…4struct S{float x}
其他指令select,phi…3condition ? a : b
Tab.1 Comparison of “operation instructions” and source code
Fig.2 Structure of vulnerability detection model
CWE编号AccPF1R
CWE-47696.8797.5296.9796.43
CWE-70695.7395.8195.5594.85
CWE-11996.4196.4997.3598.23
CWE-40496.5594.8396.4998.21
CWE-66596.9297.0697.0697.06
CWE-07495.3896.9795.5294.11
CWE-02093.7593.8593.8893.91
CWE-40095.4593.7595.7497.82
CWE-31186.6690.5986.2080.64
CWE-70495.8396.0296.0296.02
Tab.2 Detection of 10 types of vulnerabilities %
模型CWE-399CWE-119
Acc/%P/%F1/%DT/sAcc/%P/%F1/%DT/s
VulDeePecker71.2975.1871.310.88573.0676.2473.310.891
SySeVR75.9480.5678.241.09279.8084.6279.201.105
mVulSniffer92.7386.3585.901.38095.3293.0988.361.378
FUNDED93.2893.7093.412.15693.8094.4094.382.108
VDoTR94.1294.2794.252.36194.7596.8195.652.258
VulHetG94.0594.1694.622.17396.0196.1395.872.112
Tab.3 Comparison of detection for CWE-399 and CWE-119
模型AEFCAUPU
AccPRAccPRAccPRAccPR
VulDeePecker66.3070.3268.5465.6071.1469.3864.6869.0370.2070.0675.1970.31
SySeVR78.7481.1277.6875.6983.1180.0374.1479.2777.7779.4381.2179.45
mVulSniffer92.6093.0288.3092.7490.6983.1792.1074.7783.1791.5091.0792.69
FUNDED93.8194.4092.1393.7395.7494.3094.2290.6792.4592.4191.8992.62
VDoTR94.0495.1494.0595.1795.3390.1594.3594.7994.6492.0590.4887.83
VulHetG95.1296.2695.1796.0596.1893.4594.3394.6194.7093.1892.1492.45
Tab.4 Detection performance comparison for 4 types of syntax rule vulnerabilities %
模型AccPRF11?FNR1?FPR
μVulDeePecker35.8936.0135.7936.2242.7952.94
SySeVR45.3252.4045.5548.7445.5552.41
mVulSniffer81.2579.2580.3180.2879.2572.25
VulHetG80.4782.8784.7681.0689.1483.87
Tab.5 Detection on real-world dataset Devign %
[18]   程靖云, 王布宏, 罗鹏 基于图表示和MHGAT的代码漏洞静态检测方法[J]. 系统工程与电子技术, 2023, 45 (5): 1535- 1543
CHENG Jingyun, WANG Buhong, LUO Peng Code vulnerability static detection method based on graph representation and MHGAT[J]. Journal of Systems Engineering and Electronics, 2023, 45 (5): 1535- 1543
[19]   MARCELO A, FRANCISCO C, FRANCISCO B. An user configurable clang static analyzer taint checker [C] //Proceedings of the 35th International Conference of the Chilean Computer Science Society. Valparaíso: IEEE, 2016: 10-14.
[20]   YAN Y T, PAN Z L, YU L, et al. Research on the influencing factors of LLVM IR optimization effect [C] //Proceedings of the 3rd International Conference on Information Technology, Big Data and Artificial Intelligence. Chongqing: IEEE 2023: 756-763.
[21]   PEELER H, LI S, SLOSS A N, et al. Optimizing LLVM pass sequences with shackleton: a linear genetic programming framework [C] //Proceedings of the 2022 Genetic and Evolutionary Computation Conference Companion. Boston: ACM, 2022: 578-581.
[22]   MIRSKY Y, MACON G, BROWN M, et al. VulChecker: graph-based vulnerability localization in source code [C] //Proceedings of the 32nd Usenix Security Symposium. Anaheim: ACM, 2023: 6557-6574.
[23]   GILPIN L H, BAU D, YUAN B Z, et al. Explaining explanations: an overview of interpretability of machine learning [C] //Proceedings of the 5th International Conference on Data Science and Advanced Analytics. Turin: IEEE, 2018: 80-89.
[24]   CHAKRABORTY S, KRISHNA R, DING Y, et al Deep learning based vulnerability detection: are we there yet?[J]. IEEE Transactions on Software Engineering, 2020, 9 (48): 3280- 3296
[25]   张学军, 张奉鹤, 盖继扬, 等 mVulSniffer: 一种多类型源代码漏洞检测方法[J]. 通信学报, 2023, 44 (9): 149- 160
ZHANG Xuejun, ZHANG Fenghe, GAI Jiyang, et al mVulSniffer: a multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44 (9): 149- 160
[26]   YANG G A review of machine learning-based zero-day attack detection: challenges and future directions[J]. The International Journal for the Computer and Telecommunications Industry, 2023, (198): 175- 185
[27]   ROMERA P B, TORR P. An embarrassingly simple approach to zero-shot learning [C] //Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 100-109.
[1]   田笑, 常继友, 张弛, 等 开源软件缺陷预测方法综述[J]. 计算机研究与发展, 2023, 60 (7): 1467- 1488
TIAN Xiao, CHANG Jiyou, ZHANG Chi, et al Survey of open-source software defect prediction method[J]. Journal of Computer Research and Development, 2023, 60 (7): 1467- 1488
[2]   ZHANG X W, ZHOU Y, TAN S H, et al. Efficient pattern-based static analysis approach via regular-expression rules [C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. Taipa: IEEE, 2023: 132-143.
[3]   CHEN D, ZHANG Y D, WEI W, et al Efficient vulnerability detection based on an optimized rule checking static analysis technique[J]. Frontiers of Information Technology and Electronic Engineering, 2017, 18 (3): 332- 345
doi: 10.1631/FITEE.1500379
[4]   苏小红, 郑伟宁, 蒋远, 等 基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47 (2): 337- 374
SU Xiaohong ZHEN Weining, JIANG Yuan, et al Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47 (2): 337- 374
[5]   LI Z, ZOU D Q, XU S H, et al. VulDeePecker: a deep learning based system for vulnerability detection [C] //Proceedings of the 25th Network and Distributed System Security Symposium. San Diego: IEEE, 2018.
[6]   LI Z, ZOU D Q, XU S H, et al SySeVR: a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2244- 2258
doi: 10.1109/TDSC.2021.3051525
[7]   LI Z, ZOU D Q, XU S H, et al VulDeeLocator: a deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2821- 2837
doi: 10.1109/TDSC.2021.3076142
[8]   WU Y M, ZOU D Q, DOU S H, et al. VulCNN: an image inspired scalable vulnerability detection system [C] //Proceedings of the 44th International Conference on Software Engineering, Pittsburgh: ACM, 2022: 2365-2376.
[9]   ZHOU Y Q, LIU S Q, DU X N, et al Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Neural Information Processing Systems, 2019, 2019 (32): 10197- 10207
[10]   WANG H T, YE G X, TANG Z Y, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2021, 2021 (16): 1943- 1958
[11]   FAN Y H, WAN C H, FU C, et al VDoTR: vulnerability detection based on tensor representation of comprehensive code graphs[J]. Computers and Security, 2023, 2023 (130): 103247
[12]   ALOMAR E A, ALOMAR S A, MKAOUER M W. On the use of static analysis to engage students with software quality improvement: an experience with PMD [C] //Proceeding of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training.Melbourne: IEEE, 2023: 179-191.
[13]   BARTA B, MANZ G, SIKET I, et al. Challenges of sonarqube plug-in maintenance [C] //Proceedings of the 26th International Conference on Software Analysis, Evolution, and Reengineering. Hangzhou: IEEE, 2019: 574-578.
[14]   PERL H, DECHAND S, SMITH M, et al. VCCFinder: finding potential vulnerabilities in open-source projects to assist code audits [C] //Proceedings of the 22nd ACM Conference on Computer and Communications Security. Denver: ACM, 2015: 426-437.
[15]   ZOU D Q, XU S H, WANG S J, et al μVulDeePecker: a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18 (5): 2224- 2236
[16]   FENG Q, FENG C D, HONG W J, et al. Graph neural network-based vulnerability predication [C] //Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. Adelaide: IEEE, 2020: 800-801.
[17]   徐泽鑫, 段立娟, 王文健, 等 基于上下文特征融合的代码漏洞检测方法[J]. 浙江大学学报: 工学版, 2022, 56 (11): 2260- 2270
XU Zexin, DUAN Lijuan, WANG Wenjian, et al Code vulnerability detection method based on contextual feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (11): 2260- 2270
[1] Rongtai YANG,Yubin SHAO,Qingzhi DU. Structure-aware model for few-shot knowledge completion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1394-1402.
[2] Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.
[3] Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.
[4] Wenbo JU,Huajun DONG. Motherboard defect detection method based on context information fusion and dynamic sampling[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1159-1168.
[5] Xiangyu ZHOU,Yizhi LIU,Yijiang ZHAO,Zhuhua LIAO,Decheng ZHANG. Hierarchical spatial embedding BiGRU model for destination prediction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1211-1218.
[6] Zongmin LI,Chang XU,Yun BAI,Shiyang XIAN,Guangcai RONG. Dual-neighborhood graph convolution method for point cloud understanding[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 879-889.
[7] Hongwei LIU,Lei WANG,Yang LIU,Pengchao ZHANG,Shi QIAO. Short term load forecasting based on recombination quadratic decomposition and LSTNet-Atten[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 1051-1062.
[8] Dengfeng LIU,Wenjing GUO,Shihai CHEN. Content-guided attention-based lane detection network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 451-459.
[9] Minghui YAO,Yueyan WANG,Qiliang WU,Yan NIU,Cong WANG. Siamese networks algorithm based on small human motion behavior recognition[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 504-511.
[10] Xianglei YIN,Shaopeng QU,Yongfang XIE,Ni SU. Occluded bird nest detection based on asymptotic feature fusion and multi-scale dilated attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 535-545.
[11] Yali XUE,Yiming HE,Shan CUI,Quan OUYANG. Oriented ship detection algorithm in SAR image based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 261-268.
[12] Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.
[13] Zhongliang LI,Qi CHEN,Lin SHI,Chao YANG,Xianming ZOU. Dynamic knowledge graph completion of temporal aware combination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1738-1747.
[14] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[15] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.