Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (1): 155-169    DOI: 10.3785/j.issn.1008-973X.2023.01.016
计算机技术、通信工程     
程序表示学习综述
马骏驰(),迪骁鑫,段宗涛,唐蕾
长安大学 信息工程学院,陕西 西安 710064
Survey on program representation learning
Jun-chi MA(),Xiao-xin DI,Zong-tao DUAN,Lei TANG
College of Information Engineering, Chang’an University, Xi’an 710064, China
 全文: PDF(1100 KB)   HTML
摘要:

为了提高软件的开发效率,目前已出现应用人工智能技术进行智能化开发的趋势,如何理解程序语义是智能化开发中需要重点解决的问题. 针对该问题,出现了一系列程序表示学习的研究,程序表示学习可以自动地从程序中学习有用的特征,将特征表示为低维稠密向量,高效地提取程序语义并使用于相应的下游任务. 对程序表示学习的研究工作进行综述,介绍了主流的程序表示学习模型,包括基于图结构和基于token序列的程序表示学习框架. 展示了程序表示学习技术在缺陷检测、缺陷定位、代码补全等任务上的应用,总结了程序表示学习的常用工具集和测试集. 分析了程序表示学习未来需要应对的挑战.

关键词: 软件工程表示学习程序语义神经网络深度学习    
Abstract:

There has been a trend of intelligent development using artificial intelligence technology in order to improve the efficiency of software development. It is important to understand program semantics to support intelligent development. A series of research work on program representation learning has emerged to solve the problem. Program representation learning can automatically learn useful features from programs and represent the features as low-dimensional dense vectors in order to efficiently extract program semantic and apply it to corresponding downstream tasks. A comprehensive review to categorize and analyze existing research work of program representation learning was provided. The mainstream models for program representation learning were introduced, including the frameworks based on graph structure and token sequence. Then the applications of program representation learning technology in defect detection, defect localization, code completion and other tasks were described. The common toolsets and benchmarks for program representation learning were summarized. The challenges for program representation learning in the future were analyzed.

Key words: software engineering    representation learning    program semantics    neural network    deep learning
收稿日期: 2022-03-08 出版日期: 2023-01-17
CLC:  TP 391  
基金资助: 国家自然基金青年资助项目(62002030);陕西省重点研发资助项目(2019ZDLGY17-08, 2019ZDLGY03-09-01, 2019GY-006, 2020GY-013)
作者简介: 马骏驰(1988—),男,讲师,从事程序表示学习、交通数据挖掘的研究. orcid.org/ 0000-0001-6944-7511. E-mail: majunchi@chd.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
马骏驰
迪骁鑫
段宗涛
唐蕾

引用本文:

马骏驰,迪骁鑫,段宗涛,唐蕾. 程序表示学习综述[J]. 浙江大学学报(工学版), 2023, 57(1): 155-169.

Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG. Survey on program representation learning. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 155-169.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.01.016        https://www.zjujournals.com/eng/CN/Y2023/V57/I1/155

图 1  程序表示学习的流程图
研究思路 相关文献
思路1:基于AST路径表示学习 Code2vec[8]、code2seq[9]、文献[10]、mocktail[11]
思路2:基于子树表示学习 CAST[12]、ASTNN[13]、InferCode[14]、TreeCaps[15]、文献[16]
思路3:扁平化AST 文献[17]、DeepCom[18]、文献[19]、
文献[20]、文献[21]
思路4:基于网络表示学习 文献[22]、CCAG[23]、文献[24]
表 1  基于AST的模型
模型 相关文献
n-gram语言模型 文献[36~38]
RNN 文献[6, 39]
LSTM 文献[40, 41]
GRU 文献[42, 43]
表 2  基于token序列的模型
相关文献 语法模型 语义模型 融合方式
代码token AST CDFG DFG CFG PDG 向量连接 向量均值 多层感知机 矩阵乘法
POEM[50]
GraphCode2Vec[51]
文献[10]
mocktail[11]
Devign[52]
文献[53]
文献[47]
表 3  读出层语法与语义融合模型
应用 问题属性 常用图结构 常用技术
AST 依赖图
缺陷检测 分类 CNN、RNN、LSTM、GRU、GNN
缺陷定位 分类 CNN、RNN、LSTM、GRU、GNN
异构设备映射 分类 CNN、RNN、LSTM、GRU、GNN
可靠性评估 分类 CNN、RNN、LSTM、GRU、GNN
缺陷修复 生成序列 序列到序列模型、NMT模型框架
命名推荐 生成序列 序列到序列模型、NMT模型框架
代码补全 生成序列 序列到序列模型、NMT模型框架
注释生成 生成序列 序列到序列模型、NMT模型框架
代码克隆检测 相似性比较 孪生网络
二进制代码的相关应用 相似性比较 孪生网络
表 4  程序表示学习的相关应用
应用名称 测试集
缺陷检测 SARD,Defects4J,POJ-104[101]
缺陷定位 文献[102]的测试集
缺陷修复 DeepFix,ManySStuBs4J
命名推荐 CodeSearchNet
代码补全 WikiSQL,CodeXGLUE,CoNaLa
注释生成 CodeSearchNet,PyTorrent,DeepCom
代码克隆检测 POJ-104,BigCloneBench
异构设备映射 文献[103]的测试集
表 5  程序表示学习常用的测试集
1 PEROZZI B, ALRFOU R, SKIENA S. Deepwalk: Online learning of social representations [C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
2 VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [C]// International Conference on Learning Representations. Vancouver: IEEE, 2018: 164-175.
3 KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 12-26.
4 FERRANTE J, OTTENSTEIN K J, WARREN J D The program dependence graph and its use in optimization[J]. ACM Transactions on Programming Languages and Systems, 1987, 9 (3): 319- 349
doi: 10.1145/24039.24041
5 MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C]// International Conference on Learning Representations. Scottsdale: IEEE, 2013: 1-12.
6 WHITE M, VENDOME C, LINARES-VASQUEZ M, et al. Toward deep learning software repositories [C]// IEEE/ACM 12th Working Conference on Mining Software Repositories. Florence: IEEE, 2015: 334-345.
7 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// 31st Conference on Neural Information Processing Systems. Long Beach: IEEE. 2017: 5998-6008.
8 ALON U, ZILBERSTEIN M, LEVY O, et al. code2vec: learning distributed representations of code [C]// Proceedings of the ACM on Programming Languages. Phoenix: ACM, 2019: 1-29.
9 ALON U, BRODY S, LEVY O, et al. code2seq: generating sequences from structured representations of code [C]// International Conference on Learning Representations. New Orleans: IEEE, 2019: 1-22.
10 LI Y, WANG S, NGUYEN T N, et al. Improving bug detection via context-based code representation learning and attention-based neural networks [C]// Proceedings of the ACM on Programming Languages. Phoenix: ACM, 2019: 1-30.
11 VAGAVOLU D, SWARNA K C, CHIMALAKONDA S. A mocktail of source code representations [C]// International Conference on Automated Software Engineering. Melbourne: IEEE, 2021: 1269-1300.
12 SHI E, WANG Y, DU L, et al. Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: IEEE, 2021: 4053-4062.
13 ZHANG J, WANG X, ZHANG H, et al. A novel neural source code representation based on abstract syntax tree [C]// 41st International Conference on Software Engineering. Montreal: IEEE, 2019: 783-794.
14 BUI N D Q, YU Y, JIANG L. InferCode: self-supervised learning of code representations by predicting subtrees [C]// 43rd International Conference on Software Engineering. Madrid: IEEE, 2021: 1186-1197.
15 JAYASUNDARA M H V Y, BUI D Q N, JIANG L, et al. TreeCaps: tree-structured capsule networks for program source code processing [C]// Workshop on Machine Learning for Systems at the Conference on Neural Information Processing Systems. Vancouver: IEEE, 2019: 8-14.
16 BUCH L, ANDRZEJAK A. Learning-based recursive aggregation of abstract syntax trees for code clone detection [C]// International Conference on Software Analysis, Evolution and Reengineering. Hangzhou: IEEE, 2019: 95-104.
17 LIU C, WANG X, SHIN R, et al. Neural code completion [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 1-14.
18 HU X, LI G, XIA X, et al. Deep code comment generation [C]// International Conference on Program Comprehension. Gothenburg: IEEE, 2018: 200-210.
19 LECLAIR A, JIANG S, MCMILLANCM C. A neural model for generating natural language summaries of program subroutines [C]// 41st International Conference on Software Engineering. Montreal: IEEE, 2019: 795-806.
20 HAQUE S, LECLAIR A, WU L, et al. Improved automatic summarization of subroutines via attention to file context [C]// International Conference on Mining Software Repositories. New York: ACM, 2020: 300-310.
21 JIANG H, SONG L, GE Y, et al An AST structure enhanced decoder for code generation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 468- 476
22 ALLAMANIS M, BROCKSCHMIDT M, KHADEMI M. Learning to represent programs with graphs [C]// International Conference on Learning Representations. Vancouver: IEEE, 2018: 1-17.
23 WANG Y, LI H. Code completion by modeling flattened abstract syntax trees as graphs [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 14015-14023.
24 YANG K, YU H, FAN G, et al A graph sequence neural architecture for code completion with semantic structure features[J]. Journal of Software: Evolution and Process, 2022, 34 (1): 1- 22
25 BEN-NUN T, JAKOBOVITS A S, HOEFLER T Neural code comprehension: a learnable representation of code semantics[J]. Advances in Neural Information Processing Systems, 2018, 31 (1): 3589- 3601
26 LATTNER C, ADVE V. LLVM: a compilation framework for lifelong program analysis and transformation [C]// International Symposium on Code Generation and Optimization. San Jose: IEEE, 2004: 75-86.
27 WANG Z, YU L, WANG S, et al. Spotting silent buffer overflows in execution trace through graph neural network assisted data flow analysis [EB/OL]. (2021-02-20). https://arxiv.org/abs/2102.10452.
28 SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks [C]// European Semantic Web Conference. Heraklion: IEEE, 2018: 593-607.
29 WANG W, ZHANG K, LI G, et al. Learning to represent programs with heterogeneous graphs [EB/OL]. (2020-12-08). https://arxiv.org/abs/2012.04188.
30 CUMMINS C, FISCHES Z V, BEN-NUN T, et al. PROGRAML: a graph-based program representation for data flow analysis and compiler optimizations [C]// International Conference on Machine Learning. Vienna: IEEE, 2021: 2244-2253.
31 TUFANO M, WATSON C, BAVOTA G, et al. Deep learning similarities from different representations of source code [C]// International Conference on Mining Software Repositories. Gothenburg: IEEE, 2018: 542-553.
32 OU M, WATSON C, PEI J, et al. Asymmetric transitivity preserving graph embedding [C]// International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1105-1114.
33 SUI Y, CHENG X, ZHANG G, et al. Flow2vec: Value-flow-based precise code embedding [C]// Proceedings of the ACM on Programming Languages. [S. l.]: ACM, 2020: 1-27.
34 MEHROTRA N, AGARWAL N, GUPTA P, et al Modeling functional similarity in source code with graph-based Siamese networks[J]. IEEE Transactions on Software Engineering, 2021, 48 (10): 1- 22
35 KARMAKAR A, ROBBES R. What do pre-trained code models know about code? [C]// International Conference on Automated Software Engineering. Melbourne: IEEE, 2021: 1332-1336.
36 HINDLE A, BARR E T, SU Z, et al. On the naturalness of software [C]// Proceedings of the 34th International Conference on Software Engineering. Zurich: IEEE, 2012: 837-847.
37 BIELIK P, RAYCHEV V, VECHEV M. Program synthesis for character level language modeling [C]// 5th International Conference on Learning Representations. Toulon: IEEE, 2017: 1-17.
38 HELLENDOORN V, DEVANBU P. Are deep neural networks the best choice for modeling source code? [C]// Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: IEEE, 2017: 763-773.
39 DAM H K, TRAN T, PHAM T T M. A deep language model for software code [C]// Proceedings of the Foundations Software Engineering International Symposium. Seattle: ACM, 2016: 1-4.
40 BHOOPCHAND A, ROCKTASCHEL T, BARR E, et al. Learning python code suggestion with a sparse pointer network [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 1-11.
41 LIU F, ZHANG L, JIN Z Modeling programs hierarchically with stack-augmented LSTM[J]. Journal of Systems and Software, 2020, 164 (11): 1- 16
42 LI B, YAN M, XIA X, et al. DeepCommenter: a deep code comment generation tool with hybrid lexical and syntactical information [C]// Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2020: 1571-1575.
43 LECLAIR A, HAQUE S, WU L, et al. Improved code summarization via a graph neural network [C]// International Conference on Program Comprehension. Seoul: ACM, 2020: 184-195.
44 PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Conference on Empirical Methods in Natural Language Processing. Doha: ACM, 2014: 1532-1543.
45 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2018-10-11). https://arxiv.org/abs/1810.04805.
46 FENG Z, GUO D, TANG D, et al. Codebert: a pre-trained model for programming and natural languages [C]. // Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACM, 2020: 1536-1547.
47 JIANG N, LUTELLIER T, TAN L. CURE: code-aware neural machine translation for automatic program repair [C]// International Conference on Software Engineering. Madrid: IEEE, 2021: 1161-1173.
48 GAO S, CHEN C, XING Z, et al. A neural model for method name generation from functional description [C]// 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. Hangzhou: IEEE, 2019: 414-421.
49 KARAMPATSIS R M, SUTTON C. Maybe deep neural networks are the best choice for modeling source code [EB/OL]. (2019-03-13). https://arxiv.org/abs/1903.05734.
50 YE G, TANG Z, WANG H, et al. Deep program structure modeling through multi-relational graph-based learning [C]// Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Georgia: ACM, 2020: 111-123.
51 MA W, ZHAO M, SOREMEKUN E, et al. GraphCode2Vec: generic code embedding via lexical and program dependence analyses [EB/OL]. (2021-12-02). https://arxiv.org/abs/2112.01218.
52 ZHOU Y, LIU S, SIOW J K, et al. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks [C]// Advances in Neural Information Processing Systems. Vancouver: IEEE, 2019: 1-11.
53 FANG C, LIU Z, SHI Y, et al. Functional code clone detection with syntax and semantics fusion learning [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2020: 516-527.
54 WANG H, YE G, TANG Z, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2020, 16 (1): 1943- 1958
55 WU H, ZHAO H, ZHANG M. SIT3: code summarization with structure-induced transformer [C]// Annual Meeting of the Association for Computational Linguistics. [S. l.]: IEEE, 2021: 1078-1090.
56 GUO D, REN S, LU S, et al. GraphCodeBERT: pre-training code representations with data flow [C]// International Conference on Learning Representations. [S. l. ]: IEEE, 2021: 1-18.
57 GAO S, GAO C, HE Y, et al. Code structure guided transformer for source code summarization [EB/OL]. (2021-04-19). https://arxiv.org/abs/2104.09340.
58 RAY B, HELLENDOORN V, GODHANE S, et al. On the naturalness of buggy code [C]// 38th IEEE/ACM International Conference on Software Engineering. Austin: IEEE, 2016: 428-439.
59 张献, 贲可荣, 曾杰 基于代码自然性的切片粒度缺陷预测方法[J]. 软件学报, 2021, 32 (7): 2219- 2241
ZHANG Xian, BEN Ke-rong, ZENG Jie Slice granularity defect prediction method based on code naturalness[J]. Journal of Software, 2021, 32 (7): 2219- 2241
60 陈皓, 易平 基于图神经网络的代码漏洞检测方法[J]. 网络与信息安全学报, 2021, 7 (3): 37- 40
CHEN Hao, YI Ping Code vulnerability detection method based on graph neural network[J]. Journal of Network and Information Security, 2021, 7 (3): 37- 40
61 PHAN A V, LE NGUYEN M, BUI L T. Convolutional neural networks over control flow graphs for software defect prediction [C]// 29th International Conference on Tools with Artificial Intelligence. Boston: IEEE, 2017: 45-52.
62 WANG S, LIU T, NAM J, et al Deep semantic feature learning for software defect prediction[J]. IEEE Transactions on Software Engineering, 2018, 46 (12): 1267- 1293
63 XU J, WANG F, AI J Defect prediction with semantics and context features of codes based on graph representation learning[J]. IEEE Transactions on Reliability, 2020, 70 (2): 613- 625
64 WANG H, ZHUANG W, ZHANG X Software defect prediction based on gated hierarchical LSTMs[J]. IEEE Transactions on Reliability, 2021, 70 (2): 711- 727
65 GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks [C]// 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016: 855-864.
66 HUO X, THUNG F, LI M, et al Deep transfer bug localization[J]. IEEE Transactions on Software Engineering, 2019, 47 (7): 1368- 1380
67 ZHU Z, LI Y, TONG H, et al. CooBa: cross-project bug localization via adversarial transfer learning [C]// 29th International Joint Conference on Artificial Intelligence. Yokohama: IEEE, 2020: 3565-3571.
68 YANG S, CAO J, ZENG H, et al. Locating faulty methods with a mixed RNN and attention model [C]// 29th International Conference on Program Comprehension. Madrid: IEEE, 2021: 207-218.
69 LUTELLIER T, PHAM H V, PANG L, et al. Coconut: combining context-aware neural translation models using ensemble for program repair [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2020: 101-114.
70 LI Y, WANG S, NGUYEN T N. Dlfix: context-based code transformation learning for automated program repair [C]// Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Seoul: IEEE, 2020: 602-614.
71 杨博, 张能, 李善平, 等 智能代码补全研究综述[J]. 软件学报, 2020, 31 (5): 1435- 1453
YANG Bo, ZHANG Neng, LI Shan-ping, et al Review of intelligent code completion[J]. Journal of Software, 2020, 31 (5): 1435- 1453
72 LI J, WANG Y, LYU M R, et al. Code completion with neural attention and pointer networks [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 2018: 4159-4225.
73 LIU F, LI G, WEI B, et al. A self-attentional neural architecture for code completion with multi-task learning [C]// Proceedings of the 28th International Conference on Program Comprehension. Seoul: ACM, 2020: 37-47.
74 BROCKSCHMIDT M, ALLAMANIS M, GAUNT A L, et al. Generative code modeling with graphs [C]// International Conference on Learning Representations. New Orleans: IEEE, 2019: 1-24.
75 YONAI H, HAYASE Y, KITAGAWA H. Mercem: method name recommendation based on call graph embedding [C]// 26th Asia-Pacific Software Engineering Conference. Putrajaya: IEEE, 2019: 134-141.
76 ALLAMANIS M, PENG H, SUTTON C. A convolutional attention network for extreme summarization of source code [C]// International Conference on Machine Learning. New York: IEEE, 2016: 2091-2100.
77 ZHANG F, CHEN B, LI R, et al A hybrid code representation learning approach for predicting method names[J]. Journal of Systems and Software, 2021, 180 (16): 110- 111
78 IYER S, KONSTAS I, CHEUNG A, et al. Summarizing source code using a neural attention model [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: ACM, 2016: 2073-2083.
79 YANG Z, KEUNG J, YU X, et al. A multi-modal transformer-based code summarization approach for smart contracts [C]// 29th International Conference on Program Comprehension. Madrid: IEEE, 2021: 1-12.
80 陈秋远, 李善平, 鄢萌, 等 代码克隆检测研究进展[J]. 软件学报, 2019, 30 (4): 962- 980
CHEN Qiu-yuan, LI Shan-ping, YAN Meng, et al Research progress of code clone detection[J]. Journal of Software, 2019, 30 (4): 962- 980
doi: 10.13328/j.cnki.jos.005711
81 BARCHI F, PARISI E, URGESE G, et al Exploration of convolutional neural network models for source code classification[J]. Engineering Applications of Artificial Intelligence, 2021, 97 (20): 104- 175
82 PARISI E, BARCHI F, BARTOLINI A, et al Making the most of scarce input data in deep learning-based source code classification for heterogeneous device mapping[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 41 (6): 1- 12
83 XIAO Y, MA G, AHMED N K, et al. Deep graph learning for program analysis and system optimization [C]// Graph Neural Networks and Systems Workshop. [S. l. ]: IEEE, 2021: 1-8.
84 BRAUCKMANN A, GOENS A, ERTEL S, et al. Compiler-based graph representations for deep learning models of code [C]// Proceedings of the 29th International Conference on Compiler Construction. San Diego: ACM, 2020: 201-211.
85 JIAO J, PAL D, DENG C, et al. GLAIVE: graph learning assisted instruction vulnerability estimation [C]// Design, Automation and Test in Europe Conference and Exhibition. Grenoble: IEEE, 2021: 82-87.
86 HAMILTON W L, YING R, LESKOVEC J. Inductive representation learning on large graphs [J]. Advances in Neural Information Processing Systems, 2017, 30(1): 128-156.
87 MA J, DUAN Z, TANG L. GATPS: an attention-based graph neural network for predicting SDC-causing instructions [C]// 39th IEEE VLSI Test Symposium. San Diego: IEEE, 2021: 1-7.
88 WANG J, ZHANG C. Software reliability prediction using a deep learning model based on the RNN encoder–decoder [J]. Reliability Engineering and System Safety, 2018, 170(20): 73-82.
89 NIU W, ZHANG X, DU X, et al A deep learning based static taint analysis approach for IoT software vulnerability location[J]. Measurement, 2020, 32 (152): 107- 139
90 XU X, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection [C]// ACM SIGSAC Conference on Computer and Communications Security. Dallas: ACM, 2017: 363-376.
91 DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [C]// International Conference on Machine Learning. Dallas: IEEE, 2016: 2702-2711.
92 YU Z, CAO R, TANG Q, et al. Order matters: semantic-aware neural networks for binary code similarity detection [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020, 34: 1145-1152.
93 DING S H H, FUNG B C M, CHARLAND P. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization [C]// IEEE Symposium on Security and Privacy. San Francisco: IEEE, 2019: 472-489.
94 LE Q, MIKOLOV T. Distributed representations of sentences and documents [C]// International Conference on Machine Learning. Dallas: IEEE, 2014: 1188-1196.
95 YANG J, FU C, LIU X Y, et al Codee: a tensor embedding scheme for binary code search[J]. IEEE Transactions on Software Engineering, 2021, 48 (7): 1- 20
96 DUAN Y, LI X, WANG J, et al. Deepbindiff: learning program-wide code representations for binary diffing [C]// Network and Distributed System Security Symposium. San Diego: IEEE, 2020: 1-12.
97 TANG J, QU M, WANG M, et al. Line: large-scale information network embedding [C]// Proceedings of the 24th International Conference on World Wide Web. Florence: ACM, 2015: 1067-1077.
98 YANG C, LIU Z, ZHAO D, et al. Network representation learning with rich text information [C]// International Joint Conference on Artificial Intelligence. Buenos Aires: AAAI, 2015: 2111-2117.
99 BRAUCKMANN A, GOENS A, CASTRILLON J. ComPy-Learn: a toolbox for exploring machine learning representations for compilers [C]// 2020 Forum for Specification and Design Languages. Kiel: IEEE, 2020: 1-4.
100 CUMMINS C, WASTI B, GUO J, et al. CompilerGym: robust, performant compiler optimization environments for AI research [EB/OL]. (2021-09-17). https://arxiv.org/abs/2109.08267.
101 MOU L, LI G, ZHANG L, et al. Convolutional neural networks over tree structures for programming language processing [C]// AAAI Conference on Artificial Intelligence. Washington: AAAI, 2016: 1287-1293.
102 YE X, BUNESCU R, LIU C. Learning to rank relevant files for bug reports using domain knowledge [C]// Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. Hong Kong: ACM, 2014: 689-699.
103 CUMMINS C, PETOUMENOS P, WANG Z, et al. End-to-end deep learning of optimization heuristics [C]// 26th International Conference on Parallel Architectures and Compilation Techniques. Portland: IEEE, 2017: 219-232.
104 KANG H J, BISSYANDE T F, LO D. Assessing the generalizability of code2vec token embeddings [C]// 34th IEEE/ACM International Conference on Automated Software Engineering. San Diego: IEEE, 2019: 1-12.
105 VENKATAKEERTHY S, AGGARWAL R, JAIN S, et al Ir2vec: LLVM ir based scalable program embeddings[J]. ACM Transactions on Architecture and Code Optimization, 2020, 17 (4): 1- 27
[1] 黄华,赵秋舸,何再兴,李嘉然. 基于LSTM与牛顿迭代的两轴系统轮廓误差控制[J]. 浙江大学学报(工学版), 2023, 57(1): 10-20.
[2] 叶晨,战洪飞,林颖俊,余军合,王瑞,钟武昌. 基于推理-情境感知激活模型的设计知识推荐[J]. 浙江大学学报(工学版), 2023, 57(1): 32-46.
[3] 戴理朝,曹威,易善昌,王磊. 基于WPT-SVD和GA-BPNN的混凝土结构损伤识别[J]. 浙江大学学报(工学版), 2023, 57(1): 100-110.
[4] 刘近贞,陈飞,熊慧. 多尺度残差网络模型的开放式电阻抗成像算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1789-1795.
[5] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.
[6] 郝琨,王阔,王贝贝. 基于改进Mobilenet-YOLOv3的轻量级水下生物检测算法[J]. 浙江大学学报(工学版), 2022, 56(8): 1622-1632.
[7] 夏杰锋,唐武勤,杨强. 光伏航拍红外图像的热斑自动检测方法[J]. 浙江大学学报(工学版), 2022, 56(8): 1640-1647.
[8] 杨正银,胡健,姚建勇,沙英哲,宋秋雨. 基于自适应神经网络滑模观测器的容错控制[J]. 浙江大学学报(工学版), 2022, 56(8): 1656-1665.
[9] 贺俊,张雅声,尹灿斌. 基于深度学习的星载SAR工作模式鉴别[J]. 浙江大学学报(工学版), 2022, 56(8): 1676-1684.
[10] 牛伟宏,黄晓峰,祁伟,殷海兵,颜成钢. H.266/VVC分步全零块判决快速算法[J]. 浙江大学学报(工学版), 2022, 56(7): 1285-1293, 1319.
[11] 莫仁鹏,司小胜,李天梅,朱旭. 基于多尺度特征与注意力机制的轴承寿命预测[J]. 浙江大学学报(工学版), 2022, 56(7): 1447-1456.
[12] 赵永胜,李瑞祥,牛娜娜,赵志勇. 数字孪生驱动的机身形状控制方法[J]. 浙江大学学报(工学版), 2022, 56(7): 1457-1463.
[13] 温竹鹏,陈捷,刘连华,焦玲玲. 基于小波变换和优化CNN的风电齿轮箱故障诊断[J]. 浙江大学学报(工学版), 2022, 56(6): 1212-1219.
[14] 于军琪,杨思远,赵安军,高之坤. 基于神经网络的建筑能耗混合预测模型[J]. 浙江大学学报(工学版), 2022, 56(6): 1220-1231.
[15] 朱雅光,刘春潮,张亮. 基于虚拟运动神经网络的六足机器人行为控制[J]. 浙江大学学报(工学版), 2022, 56(6): 1107-1118.