Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (1): 155-169    DOI: 10.3785/j.issn.1008-973X.2023.01.016
    
Survey on program representation learning
Jun-chi MA(),Xiao-xin DI,Zong-tao DUAN,Lei TANG
College of Information Engineering, Chang’an University, Xi’an 710064, China
Download: HTML     PDF(1100KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

There has been a trend of intelligent development using artificial intelligence technology in order to improve the efficiency of software development. It is important to understand program semantics to support intelligent development. A series of research work on program representation learning has emerged to solve the problem. Program representation learning can automatically learn useful features from programs and represent the features as low-dimensional dense vectors in order to efficiently extract program semantic and apply it to corresponding downstream tasks. A comprehensive review to categorize and analyze existing research work of program representation learning was provided. The mainstream models for program representation learning were introduced, including the frameworks based on graph structure and token sequence. Then the applications of program representation learning technology in defect detection, defect localization, code completion and other tasks were described. The common toolsets and benchmarks for program representation learning were summarized. The challenges for program representation learning in the future were analyzed.



Key wordssoftware engineering      representation learning      program semantics      neural network      deep learning     
Received: 08 March 2022      Published: 17 January 2023
CLC:  TP 391  
Fund:  国家自然基金青年资助项目(62002030);陕西省重点研发资助项目(2019ZDLGY17-08, 2019ZDLGY03-09-01, 2019GY-006, 2020GY-013)
Cite this article:

Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG. Survey on program representation learning. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 155-169.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.01.016     OR     https://www.zjujournals.com/eng/Y2023/V57/I1/155


程序表示学习综述

为了提高软件的开发效率,目前已出现应用人工智能技术进行智能化开发的趋势,如何理解程序语义是智能化开发中需要重点解决的问题. 针对该问题,出现了一系列程序表示学习的研究,程序表示学习可以自动地从程序中学习有用的特征,将特征表示为低维稠密向量,高效地提取程序语义并使用于相应的下游任务. 对程序表示学习的研究工作进行综述,介绍了主流的程序表示学习模型,包括基于图结构和基于token序列的程序表示学习框架. 展示了程序表示学习技术在缺陷检测、缺陷定位、代码补全等任务上的应用,总结了程序表示学习的常用工具集和测试集. 分析了程序表示学习未来需要应对的挑战.


关键词: 软件工程,  表示学习,  程序语义,  神经网络,  深度学习 
Fig.1 Flow chart of program representation learning
研究思路 相关文献
思路1:基于AST路径表示学习 Code2vec[8]、code2seq[9]、文献[10]、mocktail[11]
思路2:基于子树表示学习 CAST[12]、ASTNN[13]、InferCode[14]、TreeCaps[15]、文献[16]
思路3:扁平化AST 文献[17]、DeepCom[18]、文献[19]、
文献[20]、文献[21]
思路4:基于网络表示学习 文献[22]、CCAG[23]、文献[24]
Tab.1 AST-based model
模型 相关文献
n-gram语言模型 文献[36~38]
RNN 文献[6, 39]
LSTM 文献[40, 41]
GRU 文献[42, 43]
Tab.2 Model based on token sequence
相关文献 语法模型 语义模型 融合方式
代码token AST CDFG DFG CFG PDG 向量连接 向量均值 多层感知机 矩阵乘法
POEM[50]
GraphCode2Vec[51]
文献[10]
mocktail[11]
Devign[52]
文献[53]
文献[47]
Tab.3 Model based on the fusion of grammar and semantics
应用 问题属性 常用图结构 常用技术
AST 依赖图
缺陷检测 分类 CNN、RNN、LSTM、GRU、GNN
缺陷定位 分类 CNN、RNN、LSTM、GRU、GNN
异构设备映射 分类 CNN、RNN、LSTM、GRU、GNN
可靠性评估 分类 CNN、RNN、LSTM、GRU、GNN
缺陷修复 生成序列 序列到序列模型、NMT模型框架
命名推荐 生成序列 序列到序列模型、NMT模型框架
代码补全 生成序列 序列到序列模型、NMT模型框架
注释生成 生成序列 序列到序列模型、NMT模型框架
代码克隆检测 相似性比较 孪生网络
二进制代码的相关应用 相似性比较 孪生网络
Tab.4 Typical application of program representation learning
应用名称 测试集
缺陷检测 SARD,Defects4J,POJ-104[101]
缺陷定位 文献[102]的测试集
缺陷修复 DeepFix,ManySStuBs4J
命名推荐 CodeSearchNet
代码补全 WikiSQL,CodeXGLUE,CoNaLa
注释生成 CodeSearchNet,PyTorrent,DeepCom
代码克隆检测 POJ-104,BigCloneBench
异构设备映射 文献[103]的测试集
Tab.5 Benchmarks for program representation learning
[1]   PEROZZI B, ALRFOU R, SKIENA S. Deepwalk: Online learning of social representations [C]// Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2014: 701-710.
[2]   VELIČKOVIĆ P, CUCURULL G, CASANOVA A, et al. Graph attention networks [C]// International Conference on Learning Representations. Vancouver: IEEE, 2018: 164-175.
[3]   KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 12-26.
[4]   FERRANTE J, OTTENSTEIN K J, WARREN J D The program dependence graph and its use in optimization[J]. ACM Transactions on Programming Languages and Systems, 1987, 9 (3): 319- 349
doi: 10.1145/24039.24041
[5]   MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space [C]// International Conference on Learning Representations. Scottsdale: IEEE, 2013: 1-12.
[6]   WHITE M, VENDOME C, LINARES-VASQUEZ M, et al. Toward deep learning software repositories [C]// IEEE/ACM 12th Working Conference on Mining Software Repositories. Florence: IEEE, 2015: 334-345.
[7]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// 31st Conference on Neural Information Processing Systems. Long Beach: IEEE. 2017: 5998-6008.
[8]   ALON U, ZILBERSTEIN M, LEVY O, et al. code2vec: learning distributed representations of code [C]// Proceedings of the ACM on Programming Languages. Phoenix: ACM, 2019: 1-29.
[9]   ALON U, BRODY S, LEVY O, et al. code2seq: generating sequences from structured representations of code [C]// International Conference on Learning Representations. New Orleans: IEEE, 2019: 1-22.
[10]   LI Y, WANG S, NGUYEN T N, et al. Improving bug detection via context-based code representation learning and attention-based neural networks [C]// Proceedings of the ACM on Programming Languages. Phoenix: ACM, 2019: 1-30.
[11]   VAGAVOLU D, SWARNA K C, CHIMALAKONDA S. A mocktail of source code representations [C]// International Conference on Automated Software Engineering. Melbourne: IEEE, 2021: 1269-1300.
[12]   SHI E, WANG Y, DU L, et al. Enhancing code summarization with hierarchical splitting and reconstruction of abstract syntax trees [C]// Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Punta Cana: IEEE, 2021: 4053-4062.
[13]   ZHANG J, WANG X, ZHANG H, et al. A novel neural source code representation based on abstract syntax tree [C]// 41st International Conference on Software Engineering. Montreal: IEEE, 2019: 783-794.
[14]   BUI N D Q, YU Y, JIANG L. InferCode: self-supervised learning of code representations by predicting subtrees [C]// 43rd International Conference on Software Engineering. Madrid: IEEE, 2021: 1186-1197.
[15]   JAYASUNDARA M H V Y, BUI D Q N, JIANG L, et al. TreeCaps: tree-structured capsule networks for program source code processing [C]// Workshop on Machine Learning for Systems at the Conference on Neural Information Processing Systems. Vancouver: IEEE, 2019: 8-14.
[16]   BUCH L, ANDRZEJAK A. Learning-based recursive aggregation of abstract syntax trees for code clone detection [C]// International Conference on Software Analysis, Evolution and Reengineering. Hangzhou: IEEE, 2019: 95-104.
[17]   LIU C, WANG X, SHIN R, et al. Neural code completion [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 1-14.
[18]   HU X, LI G, XIA X, et al. Deep code comment generation [C]// International Conference on Program Comprehension. Gothenburg: IEEE, 2018: 200-210.
[19]   LECLAIR A, JIANG S, MCMILLANCM C. A neural model for generating natural language summaries of program subroutines [C]// 41st International Conference on Software Engineering. Montreal: IEEE, 2019: 795-806.
[20]   HAQUE S, LECLAIR A, WU L, et al. Improved automatic summarization of subroutines via attention to file context [C]// International Conference on Mining Software Repositories. New York: ACM, 2020: 300-310.
[21]   JIANG H, SONG L, GE Y, et al An AST structure enhanced decoder for code generation[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2021, 30: 468- 476
[22]   ALLAMANIS M, BROCKSCHMIDT M, KHADEMI M. Learning to represent programs with graphs [C]// International Conference on Learning Representations. Vancouver: IEEE, 2018: 1-17.
[23]   WANG Y, LI H. Code completion by modeling flattened abstract syntax trees as graphs [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2021: 14015-14023.
[24]   YANG K, YU H, FAN G, et al A graph sequence neural architecture for code completion with semantic structure features[J]. Journal of Software: Evolution and Process, 2022, 34 (1): 1- 22
[25]   BEN-NUN T, JAKOBOVITS A S, HOEFLER T Neural code comprehension: a learnable representation of code semantics[J]. Advances in Neural Information Processing Systems, 2018, 31 (1): 3589- 3601
[26]   LATTNER C, ADVE V. LLVM: a compilation framework for lifelong program analysis and transformation [C]// International Symposium on Code Generation and Optimization. San Jose: IEEE, 2004: 75-86.
[27]   WANG Z, YU L, WANG S, et al. Spotting silent buffer overflows in execution trace through graph neural network assisted data flow analysis [EB/OL]. (2021-02-20). https://arxiv.org/abs/2102.10452.
[28]   SCHLICHTKRULL M, KIPF T N, BLOEM P, et al. Modeling relational data with graph convolutional networks [C]// European Semantic Web Conference. Heraklion: IEEE, 2018: 593-607.
[29]   WANG W, ZHANG K, LI G, et al. Learning to represent programs with heterogeneous graphs [EB/OL]. (2020-12-08). https://arxiv.org/abs/2012.04188.
[30]   CUMMINS C, FISCHES Z V, BEN-NUN T, et al. PROGRAML: a graph-based program representation for data flow analysis and compiler optimizations [C]// International Conference on Machine Learning. Vienna: IEEE, 2021: 2244-2253.
[31]   TUFANO M, WATSON C, BAVOTA G, et al. Deep learning similarities from different representations of source code [C]// International Conference on Mining Software Repositories. Gothenburg: IEEE, 2018: 542-553.
[32]   OU M, WATSON C, PEI J, et al. Asymmetric transitivity preserving graph embedding [C]// International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2016: 1105-1114.
[33]   SUI Y, CHENG X, ZHANG G, et al. Flow2vec: Value-flow-based precise code embedding [C]// Proceedings of the ACM on Programming Languages. [S. l.]: ACM, 2020: 1-27.
[34]   MEHROTRA N, AGARWAL N, GUPTA P, et al Modeling functional similarity in source code with graph-based Siamese networks[J]. IEEE Transactions on Software Engineering, 2021, 48 (10): 1- 22
[35]   KARMAKAR A, ROBBES R. What do pre-trained code models know about code? [C]// International Conference on Automated Software Engineering. Melbourne: IEEE, 2021: 1332-1336.
[36]   HINDLE A, BARR E T, SU Z, et al. On the naturalness of software [C]// Proceedings of the 34th International Conference on Software Engineering. Zurich: IEEE, 2012: 837-847.
[37]   BIELIK P, RAYCHEV V, VECHEV M. Program synthesis for character level language modeling [C]// 5th International Conference on Learning Representations. Toulon: IEEE, 2017: 1-17.
[38]   HELLENDOORN V, DEVANBU P. Are deep neural networks the best choice for modeling source code? [C]// Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering. New York: IEEE, 2017: 763-773.
[39]   DAM H K, TRAN T, PHAM T T M. A deep language model for software code [C]// Proceedings of the Foundations Software Engineering International Symposium. Seattle: ACM, 2016: 1-4.
[40]   BHOOPCHAND A, ROCKTASCHEL T, BARR E, et al. Learning python code suggestion with a sparse pointer network [C]// International Conference on Learning Representations. Toulon: IEEE, 2017: 1-11.
[41]   LIU F, ZHANG L, JIN Z Modeling programs hierarchically with stack-augmented LSTM[J]. Journal of Systems and Software, 2020, 164 (11): 1- 16
[42]   LI B, YAN M, XIA X, et al. DeepCommenter: a deep code comment generation tool with hybrid lexical and syntactical information [C]// Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. New York: ACM, 2020: 1571-1575.
[43]   LECLAIR A, HAQUE S, WU L, et al. Improved code summarization via a graph neural network [C]// International Conference on Program Comprehension. Seoul: ACM, 2020: 184-195.
[44]   PENNINGTON J, SOCHER R, MANNING C D. Glove: global vectors for word representation [C]// Conference on Empirical Methods in Natural Language Processing. Doha: ACM, 2014: 1532-1543.
[45]   DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [EB/OL]. (2018-10-11). https://arxiv.org/abs/1810.04805.
[46]   FENG Z, GUO D, TANG D, et al. Codebert: a pre-trained model for programming and natural languages [C]. // Conference on Empirical Methods in Natural Language Processing. [S. l.]: ACM, 2020: 1536-1547.
[47]   JIANG N, LUTELLIER T, TAN L. CURE: code-aware neural machine translation for automatic program repair [C]// International Conference on Software Engineering. Madrid: IEEE, 2021: 1161-1173.
[48]   GAO S, CHEN C, XING Z, et al. A neural model for method name generation from functional description [C]// 26th IEEE International Conference on Software Analysis, Evolution and Reengineering. Hangzhou: IEEE, 2019: 414-421.
[49]   KARAMPATSIS R M, SUTTON C. Maybe deep neural networks are the best choice for modeling source code [EB/OL]. (2019-03-13). https://arxiv.org/abs/1903.05734.
[50]   YE G, TANG Z, WANG H, et al. Deep program structure modeling through multi-relational graph-based learning [C]// Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques. Georgia: ACM, 2020: 111-123.
[51]   MA W, ZHAO M, SOREMEKUN E, et al. GraphCode2Vec: generic code embedding via lexical and program dependence analyses [EB/OL]. (2021-12-02). https://arxiv.org/abs/2112.01218.
[52]   ZHOU Y, LIU S, SIOW J K, et al. Devign: effective vulnerability identification by learning comprehensive program semantics via graph neural networks [C]// Advances in Neural Information Processing Systems. Vancouver: IEEE, 2019: 1-11.
[53]   FANG C, LIU Z, SHI Y, et al. Functional code clone detection with syntax and semantics fusion learning [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2020: 516-527.
[54]   WANG H, YE G, TANG Z, et al Combining graph-based learning with automated data collection for code vulnerability detection[J]. IEEE Transactions on Information Forensics and Security, 2020, 16 (1): 1943- 1958
[55]   WU H, ZHAO H, ZHANG M. SIT3: code summarization with structure-induced transformer [C]// Annual Meeting of the Association for Computational Linguistics. [S. l.]: IEEE, 2021: 1078-1090.
[56]   GUO D, REN S, LU S, et al. GraphCodeBERT: pre-training code representations with data flow [C]// International Conference on Learning Representations. [S. l. ]: IEEE, 2021: 1-18.
[57]   GAO S, GAO C, HE Y, et al. Code structure guided transformer for source code summarization [EB/OL]. (2021-04-19). https://arxiv.org/abs/2104.09340.
[58]   RAY B, HELLENDOORN V, GODHANE S, et al. On the naturalness of buggy code [C]// 38th IEEE/ACM International Conference on Software Engineering. Austin: IEEE, 2016: 428-439.
[59]   张献, 贲可荣, 曾杰 基于代码自然性的切片粒度缺陷预测方法[J]. 软件学报, 2021, 32 (7): 2219- 2241
ZHANG Xian, BEN Ke-rong, ZENG Jie Slice granularity defect prediction method based on code naturalness[J]. Journal of Software, 2021, 32 (7): 2219- 2241
[60]   陈皓, 易平 基于图神经网络的代码漏洞检测方法[J]. 网络与信息安全学报, 2021, 7 (3): 37- 40
CHEN Hao, YI Ping Code vulnerability detection method based on graph neural network[J]. Journal of Network and Information Security, 2021, 7 (3): 37- 40
[61]   PHAN A V, LE NGUYEN M, BUI L T. Convolutional neural networks over control flow graphs for software defect prediction [C]// 29th International Conference on Tools with Artificial Intelligence. Boston: IEEE, 2017: 45-52.
[62]   WANG S, LIU T, NAM J, et al Deep semantic feature learning for software defect prediction[J]. IEEE Transactions on Software Engineering, 2018, 46 (12): 1267- 1293
[63]   XU J, WANG F, AI J Defect prediction with semantics and context features of codes based on graph representation learning[J]. IEEE Transactions on Reliability, 2020, 70 (2): 613- 625
[64]   WANG H, ZHUANG W, ZHANG X Software defect prediction based on gated hierarchical LSTMs[J]. IEEE Transactions on Reliability, 2021, 70 (2): 711- 727
[65]   GROVER A, LESKOVEC J. node2vec: scalable feature learning for networks [C]// 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco: ACM, 2016: 855-864.
[66]   HUO X, THUNG F, LI M, et al Deep transfer bug localization[J]. IEEE Transactions on Software Engineering, 2019, 47 (7): 1368- 1380
[67]   ZHU Z, LI Y, TONG H, et al. CooBa: cross-project bug localization via adversarial transfer learning [C]// 29th International Joint Conference on Artificial Intelligence. Yokohama: IEEE, 2020: 3565-3571.
[68]   YANG S, CAO J, ZENG H, et al. Locating faulty methods with a mixed RNN and attention model [C]// 29th International Conference on Program Comprehension. Madrid: IEEE, 2021: 207-218.
[69]   LUTELLIER T, PHAM H V, PANG L, et al. Coconut: combining context-aware neural translation models using ensemble for program repair [C]// Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis. New York: ACM, 2020: 101-114.
[70]   LI Y, WANG S, NGUYEN T N. Dlfix: context-based code transformation learning for automated program repair [C]// Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering. Seoul: IEEE, 2020: 602-614.
[71]   杨博, 张能, 李善平, 等 智能代码补全研究综述[J]. 软件学报, 2020, 31 (5): 1435- 1453
YANG Bo, ZHANG Neng, LI Shan-ping, et al Review of intelligent code completion[J]. Journal of Software, 2020, 31 (5): 1435- 1453
[72]   LI J, WANG Y, LYU M R, et al. Code completion with neural attention and pointer networks [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: AAAI, 2018: 4159-4225.
[73]   LIU F, LI G, WEI B, et al. A self-attentional neural architecture for code completion with multi-task learning [C]// Proceedings of the 28th International Conference on Program Comprehension. Seoul: ACM, 2020: 37-47.
[74]   BROCKSCHMIDT M, ALLAMANIS M, GAUNT A L, et al. Generative code modeling with graphs [C]// International Conference on Learning Representations. New Orleans: IEEE, 2019: 1-24.
[75]   YONAI H, HAYASE Y, KITAGAWA H. Mercem: method name recommendation based on call graph embedding [C]// 26th Asia-Pacific Software Engineering Conference. Putrajaya: IEEE, 2019: 134-141.
[76]   ALLAMANIS M, PENG H, SUTTON C. A convolutional attention network for extreme summarization of source code [C]// International Conference on Machine Learning. New York: IEEE, 2016: 2091-2100.
[77]   ZHANG F, CHEN B, LI R, et al A hybrid code representation learning approach for predicting method names[J]. Journal of Systems and Software, 2021, 180 (16): 110- 111
[78]   IYER S, KONSTAS I, CHEUNG A, et al. Summarizing source code using a neural attention model [C]// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Berlin: ACM, 2016: 2073-2083.
[79]   YANG Z, KEUNG J, YU X, et al. A multi-modal transformer-based code summarization approach for smart contracts [C]// 29th International Conference on Program Comprehension. Madrid: IEEE, 2021: 1-12.
[80]   陈秋远, 李善平, 鄢萌, 等 代码克隆检测研究进展[J]. 软件学报, 2019, 30 (4): 962- 980
CHEN Qiu-yuan, LI Shan-ping, YAN Meng, et al Research progress of code clone detection[J]. Journal of Software, 2019, 30 (4): 962- 980
doi: 10.13328/j.cnki.jos.005711
[81]   BARCHI F, PARISI E, URGESE G, et al Exploration of convolutional neural network models for source code classification[J]. Engineering Applications of Artificial Intelligence, 2021, 97 (20): 104- 175
[82]   PARISI E, BARCHI F, BARTOLINI A, et al Making the most of scarce input data in deep learning-based source code classification for heterogeneous device mapping[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2021, 41 (6): 1- 12
[83]   XIAO Y, MA G, AHMED N K, et al. Deep graph learning for program analysis and system optimization [C]// Graph Neural Networks and Systems Workshop. [S. l. ]: IEEE, 2021: 1-8.
[84]   BRAUCKMANN A, GOENS A, ERTEL S, et al. Compiler-based graph representations for deep learning models of code [C]// Proceedings of the 29th International Conference on Compiler Construction. San Diego: ACM, 2020: 201-211.
[85]   JIAO J, PAL D, DENG C, et al. GLAIVE: graph learning assisted instruction vulnerability estimation [C]// Design, Automation and Test in Europe Conference and Exhibition. Grenoble: IEEE, 2021: 82-87.
[86]   HAMILTON W L, YING R, LESKOVEC J. Inductive representation learning on large graphs [J]. Advances in Neural Information Processing Systems, 2017, 30(1): 128-156.
[87]   MA J, DUAN Z, TANG L. GATPS: an attention-based graph neural network for predicting SDC-causing instructions [C]// 39th IEEE VLSI Test Symposium. San Diego: IEEE, 2021: 1-7.
[88]   WANG J, ZHANG C. Software reliability prediction using a deep learning model based on the RNN encoder–decoder [J]. Reliability Engineering and System Safety, 2018, 170(20): 73-82.
[89]   NIU W, ZHANG X, DU X, et al A deep learning based static taint analysis approach for IoT software vulnerability location[J]. Measurement, 2020, 32 (152): 107- 139
[90]   XU X, LIU C, FENG Q, et al. Neural network-based graph embedding for cross-platform binary code similarity detection [C]// ACM SIGSAC Conference on Computer and Communications Security. Dallas: ACM, 2017: 363-376.
[91]   DAI H, DAI B, SONG L. Discriminative embeddings of latent variable models for structured data [C]// International Conference on Machine Learning. Dallas: IEEE, 2016: 2702-2711.
[92]   YU Z, CAO R, TANG Q, et al. Order matters: semantic-aware neural networks for binary code similarity detection [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Palo Alto: AAAI, 2020, 34: 1145-1152.
[93]   DING S H H, FUNG B C M, CHARLAND P. Asm2vec: boosting static representation robustness for binary clone search against code obfuscation and compiler optimization [C]// IEEE Symposium on Security and Privacy. San Francisco: IEEE, 2019: 472-489.
[94]   LE Q, MIKOLOV T. Distributed representations of sentences and documents [C]// International Conference on Machine Learning. Dallas: IEEE, 2014: 1188-1196.
[95]   YANG J, FU C, LIU X Y, et al Codee: a tensor embedding scheme for binary code search[J]. IEEE Transactions on Software Engineering, 2021, 48 (7): 1- 20
[96]   DUAN Y, LI X, WANG J, et al. Deepbindiff: learning program-wide code representations for binary diffing [C]// Network and Distributed System Security Symposium. San Diego: IEEE, 2020: 1-12.
[97]   TANG J, QU M, WANG M, et al. Line: large-scale information network embedding [C]// Proceedings of the 24th International Conference on World Wide Web. Florence: ACM, 2015: 1067-1077.
[98]   YANG C, LIU Z, ZHAO D, et al. Network representation learning with rich text information [C]// International Joint Conference on Artificial Intelligence. Buenos Aires: AAAI, 2015: 2111-2117.
[99]   BRAUCKMANN A, GOENS A, CASTRILLON J. ComPy-Learn: a toolbox for exploring machine learning representations for compilers [C]// 2020 Forum for Specification and Design Languages. Kiel: IEEE, 2020: 1-4.
[100]   CUMMINS C, WASTI B, GUO J, et al. CompilerGym: robust, performant compiler optimization environments for AI research [EB/OL]. (2021-09-17). https://arxiv.org/abs/2109.08267.
[101]   MOU L, LI G, ZHANG L, et al. Convolutional neural networks over tree structures for programming language processing [C]// AAAI Conference on Artificial Intelligence. Washington: AAAI, 2016: 1287-1293.
[102]   YE X, BUNESCU R, LIU C. Learning to rank relevant files for bug reports using domain knowledge [C]// Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. Hong Kong: ACM, 2014: 689-699.
[103]   CUMMINS C, PETOUMENOS P, WANG Z, et al. End-to-end deep learning of optimization heuristics [C]// 26th International Conference on Parallel Architectures and Compilation Techniques. Portland: IEEE, 2017: 219-232.
[104]   KANG H J, BISSYANDE T F, LO D. Assessing the generalizability of code2vec token embeddings [C]// 34th IEEE/ACM International Conference on Automated Software Engineering. San Diego: IEEE, 2019: 1-12.
[105]   VENKATAKEERTHY S, AGGARWAL R, JAIN S, et al Ir2vec: LLVM ir based scalable program embeddings[J]. ACM Transactions on Architecture and Code Optimization, 2020, 17 (4): 1- 27
[1] Hua HUANG,Qiu-ge ZHAO,Zai-xing HE,Jia-ran LI. Contour error control of two-axis system based on LSTM and Newton iteration[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 10-20.
[2] Chen YE,Hong-fei ZHAN,Ying-jun LIN,Jun-he YU,Rui WANG,Wu-chang ZHONG. Design knowledge recommendation based on inference-context-aware activation model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 32-46.
[3] Li-zhao DAI,Wei CAO,Shan-chang YI,Lei WANG. Damage identification of concrete structure based on WPT-SVD and GA-BPNN[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 100-110.
[4] Jin-zhen LIU,Fei CHEN,Hui XIONG. Open electrical impedance imaging algorithm based on multi-scale residual network model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1789-1795.
[5] Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.
[6] Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.
[7] Zheng-yin YANG,Jian HU,Jian-yong YAO,Ying-zhe SHA,Qiu-yu SONG. Fault-tolerant control based on adaptive neural network sliding mode observer[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1656-1665.
[8] Jun HE,Ya-sheng ZHANG,Can-bin YIN. Operating modes identification of spaceborne SAR based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1676-1684.
[9] Wei-hong NIU,Xiao-feng HUANG,Wei QI,Hai-bing YIN,Cheng-gang YAN. Fast stepwise all zero block detection algorithm for H.266/VVC[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1285-1293, 1319.
[10] Ren-peng MO,Xiao-sheng SI,Tian-mei LI,Xu ZHU. Bearing life prediction based on multi-scale features and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1447-1456.
[11] Yong-sheng ZHAO,Rui-xiang LI,Na-na NIU,Zhi-yong ZHAO. Shape control method of fuselage driven by digital twin[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1457-1463.
[12] Zhu-peng WEN,Jie CHEN,Lian-hua LIU,Ling-ling JIAO. Fault diagnosis of wind power gearbox based on wavelet transform and improved CNN[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1212-1219.
[13] Jun-qi YU,Si-yuan YANG,An-jun ZHAO,Zhi-kun GAO. Hybrid prediction model of building energy consumption based on neural network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(6): 1220-1231.
[14] Li HE,Shan-min PANG. Face reconstruction from voice based on age-supervised learning and face prior information[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 1006-1016.
[15] Xue-qin ZHANG,Tian-ren LI. Breast cancer pathological image classification based on Cycle-GAN and improved DPN network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(4): 727-735.