Source code vulnerability detection method based on heterogeneous graph representation

doi:10.3785/j.issn.1008-973X.2025.08.011

Journal of ZheJiang University (Engineering Science)

2025, Vol. 59

Issue (8): 1644-1652 DOI: 10.3785/j.issn.1008-973X.2025.08.011

Source code vulnerability detection method based on heterogeneous graph representation

Xuejun ZHANG1(

),Shubin LIANG1,Wanrong BAI2,Fenghe ZHANG1,Haiyan HUANG1,Meifeng GUO1,Zhuo CHEN1

1. College of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070, China
2. Electric Power Research Institute, State Grid Gansu Electric Power Company, Lanzhou 730070, China

Download:

HTML

PDF(778KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A source code vulnerability detection method based on heterogeneous graph representation was proposed aiming at the problem of low detection accuracy caused by the insufficient extraction of heterogeneous features and low-level information in the existing source code vulnerability detection models. Eight instruction-level features were extracted from the intermediate code representation (IR) to serve as node embeddings for the program dependence graph, which addressed the issue of missing low-level information. Attention aggregation mechanisms were constructed at the node level and dependency level respectively to extract heterogeneous features, and information of key nodes was extracted by adjusting attention coefficients. The aggregated results of the graph data were classified to predict the presence of vulnerability. The experiments on synthetic data sets and two real project data sets show that the proposed method has stronger capabilities in extracting heterogeneous features and higher comprehensive performance in vulnerability detection compared with existing methods.

Key words： vulnerability detection graph representation attention mechanism heterogeneous feature intermediate code representation

Received: 21 August 2024 Published: 28 July 2025

CLC:

TP 391

Fund: 国家自然科学基金资助项目（61762058, 62461032）；甘肃省教育厅产业支撑项目（2022CYZC-38）；甘肃省重点研发计划资助项目（25YEFA089）；国家电网科技资助项目（W32KJ2722010, 522722220013）.

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Xuejun ZHANG
	Shubin LIANG
	Wanrong BAI
	Fenghe ZHANG
	Haiyan HUANG
	Meifeng GUO
	Zhuo CHEN

Cite this article:

Xuejun ZHANG,Shubin LIANG,Wanrong BAI,Fenghe ZHANG,Haiyan HUANG,Meifeng GUO,Zhuo CHEN. Source code vulnerability detection method based on heterogeneous graph representation. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1644-1652.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2025.08.011 OR https://www.zjujournals.com/eng/Y2025/V59/I8/1644

基于异构图表征的源代码漏洞检测方法

针对现有的源代码漏洞检测模型对异构特征和底层信息提取不足导致的检测准确率不高的问题，提出基于异构图表征的源代码漏洞检测方法. 从中间代码表示（IR）中提取8种指令级特征作为程序依赖图的节点嵌入，解决底层信息提取不足的问题. 在节点层和依赖层分别构建基于注意力机制的聚合模块来提取图表征数据中的异构性特征，通过调整注意力系数捕获关键节点信息. 对图数据的聚合结果进行分类，预测是否存在漏洞. 在合成数据集和2个真实项目数据集上的实验表明，相比于现有方法，本文方法具有更强的异构特征提取能力和更高的漏洞检测综合性能.

关键词： 漏洞检测, 图表征, 注意力机制, 异构特征, 中间代码表示

Fig.1 Framework of VulHetG method

Tab.1 Comparison of “operation instructions” and source code

Fig.2 Structure of vulnerability detection model

Tab.2 Detection of 10 types of vulnerabilities %

Tab.3 Comparison of detection for CWE-399 and CWE-119

Tab.4 Detection performance comparison for 4 types of syntax rule vulnerabilities %

Tab.5 Detection on real-world dataset Devign %


[18]	程靖云, 王布宏, 罗鹏基于图表示和MHGAT的代码漏洞静态检测方法[J]. 系统工程与电子技术, 2023, 45 (5): 1535- 1543 CHENG Jingyun, WANG Buhong, LUO Peng Code vulnerability static detection method based on graph representation and MHGAT[J]. Journal of Systems Engineering and Electronics, 2023, 45 (5): 1535- 1543

[19]	MARCELO A, FRANCISCO C, FRANCISCO B. An user configurable clang static analyzer taint checker [C] //Proceedings of the 35th International Conference of the Chilean Computer Science Society. Valparaíso: IEEE, 2016: 10-14.

[20]	YAN Y T, PAN Z L, YU L, et al. Research on the influencing factors of LLVM IR optimization effect [C] //Proceedings of the 3rd International Conference on Information Technology, Big Data and Artificial Intelligence. Chongqing: IEEE 2023: 756-763.

[21]	PEELER H, LI S, SLOSS A N, et al. Optimizing LLVM pass sequences with shackleton: a linear genetic programming framework [C] //Proceedings of the 2022 Genetic and Evolutionary Computation Conference Companion. Boston: ACM, 2022: 578-581.

[22]	MIRSKY Y, MACON G, BROWN M, et al. VulChecker: graph-based vulnerability localization in source code [C] //Proceedings of the 32nd Usenix Security Symposium. Anaheim: ACM, 2023: 6557-6574.

[23]	GILPIN L H, BAU D, YUAN B Z, et al. Explaining explanations: an overview of interpretability of machine learning [C] //Proceedings of the 5th International Conference on Data Science and Advanced Analytics. Turin: IEEE, 2018: 80-89.

[24]	CHAKRABORTY S, KRISHNA R, DING Y, et al Deep learning based vulnerability detection: are we there yet?[J]. IEEE Transactions on Software Engineering, 2020, 9 (48): 3280- 3296

[25]	张学军, 张奉鹤, 盖继扬, 等 mVulSniffer: 一种多类型源代码漏洞检测方法[J]. 通信学报, 2023, 44 (9): 149- 160 ZHANG Xuejun, ZHANG Fenghe, GAI Jiyang, et al mVulSniffer: a multi-type source code vulnerability sniffer method[J]. Journal on Communications, 2023, 44 (9): 149- 160

[26]	YANG G A review of machine learning-based zero-day attack detection: challenges and future directions[J]. The International Journal for the Computer and Telecommunications Industry, 2023, (198): 175- 185

[27]	ROMERA P B, TORR P. An embarrassingly simple approach to zero-shot learning [C] //Proceedings of the 2020 Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 100-109.

[1]	田笑, 常继友, 张弛, 等开源软件缺陷预测方法综述[J]. 计算机研究与发展, 2023, 60 (7): 1467- 1488 TIAN Xiao, CHANG Jiyou, ZHANG Chi, et al Survey of open-source software defect prediction method[J]. Journal of Computer Research and Development, 2023, 60 (7): 1467- 1488

[2]	ZHANG X W, ZHOU Y, TAN S H, et al. Efficient pattern-based static analysis approach via regular-expression rules [C]//Proceedings of the 2023 IEEE International Conference on Software Analysis, Evolution and Reengineering. Taipa: IEEE, 2023: 132-143.

[3]	CHEN D, ZHANG Y D, WEI W, et al Efficient vulnerability detection based on an optimized rule checking static analysis technique[J]. Frontiers of Information Technology and Electronic Engineering, 2017, 18 (3): 332- 345 doi: 10.1631/FITEE.1500379

[4]	苏小红, 郑伟宁, 蒋远, 等基于学习的源代码漏洞检测研究与进展[J]. 计算机学报, 2024, 47 (2): 337- 374 SU Xiaohong ZHEN Weining, JIANG Yuan, et al Research and progress on learning-based source code vulnerability detection[J]. Chinese Journal of Computers, 2024, 47 (2): 337- 374

[5]	LI Z, ZOU D Q, XU S H, et al. VulDeePecker: a deep learning based system for vulnerability detection [C] //Proceedings of the 25th Network and Distributed System Security Symposium. San Diego: IEEE, 2018.

[6]	LI Z, ZOU D Q, XU S H, et al SySeVR: a framework for using deep learning to detect software vulnerabilities[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2244- 2258 doi: 10.1109/TDSC.2021.3051525

[7]	LI Z, ZOU D Q, XU S H, et al VulDeeLocator: a deep learning-based fine-grained vulnerability detector[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (4): 2821- 2837 doi: 10.1109/TDSC.2021.3076142

[8]	WU Y M, ZOU D Q, DOU S H, et al. VulCNN: an image inspired scalable vulnerability detection system [C] //Proceedings of the 44th International Conference on Software Engineering, Pittsburgh: ACM, 2022: 2365-2376.

[9]	ZHOU Y Q, LIU S Q, DU X N, et al Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks[J]. Neural Information Processing Systems, 2019, 2019 (32): 10197- 10207

[10]	WANG H T, YE G X, TANG Z Y, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2021, 2021 (16): 1943- 1958

[11]	FAN Y H, WAN C H, FU C, et al VDoTR: vulnerability detection based on tensor representation of comprehensive code graphs[J]. Computers and Security, 2023, 2023 (130): 103247

[12]	ALOMAR E A, ALOMAR S A, MKAOUER M W. On the use of static analysis to engage students with software quality improvement: an experience with PMD [C] //Proceeding of 2023 IEEE/ACM 45th International Conference on Software Engineering: Software Engineering Education and Training.Melbourne: IEEE, 2023: 179-191.

[13]	BARTA B, MANZ G, SIKET I, et al. Challenges of sonarqube plug-in maintenance [C] //Proceedings of the 26th International Conference on Software Analysis, Evolution, and Reengineering. Hangzhou: IEEE, 2019: 574-578.

[14]	PERL H, DECHAND S, SMITH M, et al. VCCFinder: finding potential vulnerabilities in open-source projects to assist code audits [C] //Proceedings of the 22nd ACM Conference on Computer and Communications Security. Denver: ACM, 2015: 426-437.

[15]	ZOU D Q, XU S H, WANG S J, et al μVulDeePecker: a deep learning-based system for multiclass vulnerability detection[J]. IEEE Transactions on Dependable and Secure Computing, 2019, 18 (5): 2224- 2236

[16]	FENG Q, FENG C D, HONG W J, et al. Graph neural network-based vulnerability predication [C] //Proceedings of the 2020 IEEE International Conference on Software Maintenance and Evolution. Adelaide: IEEE, 2020: 800-801.

[17]	徐泽鑫, 段立娟, 王文健, 等基于上下文特征融合的代码漏洞检测方法[J]. 浙江大学学报: 工学版, 2022, 56 (11): 2260- 2270 XU Zexin, DUAN Lijuan, WANG Wenjian, et al Code vulnerability detection method based on contextual feature fusion[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (11): 2260- 2270

[1]	Rongtai YANG,Yubin SHAO,Qingzhi DU. Structure-aware model for few-shot knowledge completion[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1394-1402.

[2]	Shengju WANG,Zan ZHANG. Missing value imputation algorithm based on accelerated diffusion model[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(7): 1471-1480.

[3]	Yongqing CAI,Cheng HAN,Wei QUAN,Wudi CHEN. Visual induced motion sickness estimation model based on attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1110-1118.

[4]	Wenbo JU,Huajun DONG. Motherboard defect detection method based on context information fusion and dynamic sampling[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1159-1168.

[5]	Xiangyu ZHOU,Yizhi LIU,Yijiang ZHAO,Zhuhua LIAO,Decheng ZHANG. Hierarchical spatial embedding BiGRU model for destination prediction[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1211-1218.

[6]	Zongmin LI,Chang XU,Yun BAI,Shiyang XIAN,Guangcai RONG. Dual-neighborhood graph convolution method for point cloud understanding[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 879-889.

[7]	Hongwei LIU,Lei WANG,Yang LIU,Pengchao ZHANG,Shi QIAO. Short term load forecasting based on recombination quadratic decomposition and LSTNet-Atten[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 1051-1062.

[8]	Dengfeng LIU,Wenjing GUO,Shihai CHEN. Content-guided attention-based lane detection network[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 451-459.

[9]	Minghui YAO,Yueyan WANG,Qiliang WU,Yan NIU,Cong WANG. Siamese networks algorithm based on small human motion behavior recognition[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 504-511.

[10]	Xianglei YIN,Shaopeng QU,Yongfang XIE,Ni SU. Occluded bird nest detection based on asymptotic feature fusion and multi-scale dilated attention[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 535-545.

[11]	Yali XUE,Yiming HE,Shan CUI,Quan OUYANG. Oriented ship detection algorithm in SAR image based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(2): 261-268.

[12]	Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.

[13]	Zhongliang LI,Qi CHEN,Lin SHI,Chao YANG,Xianming ZOU. Dynamic knowledge graph completion of temporal aware combination[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1738-1747.

[14]	Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.

[15]	Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.

Viewed

Full text

Abstract

Cited

Shared

Discussed