Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (3): 494-502    DOI: 10.3785/j.issn.1008-973X.2022.03.008
计算机与控制工程     
基于自然近邻与协同过滤的API推荐方法
郑黄河(),黄志球*(),李伟湋,喻垚慎,王永超
南京航空航天大学 计算机科学与技术学院,江苏 南京 210016
API recommendation method based on natural nearest neighbors and collaborative filtering
Huang-he ZHENG(),Zhi-qiu HUANG*(),Wei-wei LI,Yao-shen YU,Yong-chao WANG
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
 全文: PDF(914 KB)   HTML
摘要:

为了解决由于近邻选择不恰当导致的推荐性能下降问题,提出基于自然近邻与协同过滤的API推荐方法——N-APIRec. 该方法利用BM25算法将项目转换成向量,以自然近邻算法筛选数据集中的相似项目以减少搜索范围,从相似项目中筛选相似的方法声明,通过协同过滤的方式推荐API. 将N-APIRec在MV、SH数据集上与前沿方法进行实验对比,结果验证了N-APIRec的有效性,在MV、SH数据集上的推荐成功率分别为77.38%、30.00%,优于现有方法.

关键词: 代码复用API推荐自然近邻BM25协同过滤    
Abstract:

An API recommendation method based on natural nearest neighbors and collaborative filtering named N-APIRec was proposed in order to solve the problem of recommendation performance degradation caused by improper neighbor selection. In this model, BM25 algorithm was used to transform the projects into vectors. Then the natural neighbor algorithm was used to filter the similar projects in the dataset to reduce the search scope, and the similar method declarations were filtered from the similar projects. Finally, the APIs were recommended through collaborative filtering. N-APIRec was compared with the state-of-the-art approach on MV and SH data sets. The results were verified the effectiveness of N-APIRec, the success rate of MV and SH data sets recommendation was 77.38%and 30.00% respectively, which was better than the existing methods.

Key words: code reuse    API recommendation    natural nearest neighbors    BM25    collaborative filtering
收稿日期: 2021-08-19 出版日期: 2022-03-29
CLC:  TP 391  
基金资助: 国家重点研发计划资助项目(2018YFB1003900)
通讯作者: 黄志球     E-mail: sz1916053@nuaa.edu.cn;zqhuang@nuaa.edu.cn
作者简介: 郑黄河(1996—),男,硕士生,从事智能化软件开发研究. orcid.org/0000-0001-9934-9453. E-mail: sz1916053@nuaa.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
郑黄河
黄志球
李伟湋
喻垚慎
王永超

引用本文:

郑黄河,黄志球,李伟湋,喻垚慎,王永超. 基于自然近邻与协同过滤的API推荐方法[J]. 浙江大学学报(工学版), 2022, 56(3): 494-502.

Huang-he ZHENG,Zhi-qiu HUANG,Wei-wei LI,Yao-shen YU,Yong-chao WANG. API recommendation method based on natural nearest neighbors and collaborative filtering. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 494-502.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.03.008        https://www.zjujournals.com/eng/CN/Y2022/V56/I3/494

图 1  N-APIRec的工作流程
图 2  API调用结构图
项目名称 sp
g_xmlgraphics-commons-1.5 0.418
g_batik-awt-util-1.7 0.284
g_filters-2.0.235 0.107
g_fop-2.0 0.054
g_jcaptcha-2.0-alpha-1-SNA 0.031
g_jwi-2.2.3 0.029
g_pdfxstream-3.5.0 0.026
g_objenesis-1.2 0.026
g_apache-mime4j-core-0.7.2 0.025
g_spring-security-crypto-3.2 0.023
g_batik-svggen-1.7 0.022
g_itextpdf-5.5.13 0.021
表 1  项目g_batik-codec-1.8的相似度列表
图 3  项目相似度折线图
数据集 项目数量 方法数量 API数量 API调用次数
MV 1 600 97 255 30 442 939 645
SH 200 4 530 5 351 27 312
表 2  实验数据集统计信息
n MV SH
SUCn/% PREn RECn SUCn/% PREn RECn
1 77.4 0.774 0.107 30.0 0.300 0.098
5 87.5 0.569 0.366 42.0 0.143 0.185
10 90.3 0.394 0.481 47.5 0.099 0.234
15 91.9 0.297 0.529 50.5 0.078 0.269
20 92.7 0.238 0.555 52.5 0.062 0.287
表 3  N-APIRec在数据集MV、SH上的表现
%
方法 SUC1 SUC5 SUC10 SUC15 SUC20
PAM 8.0 15.0 27.4 29.5 33.5
FOCUS 19.0 31.5 37.0 37.5 41.0
N-APIRec-S 25.5 33.0 40.5 44.0 46.5
N-APIRec-B 29.0 42.0 48.5 49.5 51.5
N-APIRec-N 25.0 40.0 45.0 49.0 51.5
N-APIRec 30.0 42.0 47.5 50.5 52.5
表 4  在SH数据集上不同方法的成功率
图 4  不同数据集上的精确度召回率曲线
%
方法 SUC1 SUC5 SUC10 SUC15 SUC20
FOCUS 65.3 81.3 85.8 87.7 88.8
N-APIRec-S 74.9 85.5 89.1 91.0 91.9
N-APIRec-B 75.1 86.2 89.9 91.6 92.6
N-APIRec-N 74.0 85.8 88.6 89.9 90.9
N-APIRec 77.4 87.5 90.3 91.9 92.7
表 5  在MV数据集上不同方法的成功率
%
Cp SUC1 SUC5 SUC10 SUC15 SUC20
20 21.0 26.5 29.5 31.5 32.0
40 22.5 26.5 29.5 32.0 32.0
60 22.5 27.0 30.0 33.0 33.5
80 22.5 27.5 30.5 33.0 35.5
表 6  SH数据集上不同项目完成度下的成功率
Cd REC1 REC5 REC10 REC15 REC20
1 0.030 0.082 0.110 0.123 0.134
2 0.054 0.135 0.175 0.206 0.220
3 0.074 0.160 0.208 0.237 0.251
4 0.098 0.185 0.234 0.269 0.287
表 7  SH数据集上不同方法声明完成度下的召回率
1 NIE L, JIANG H, REN Z, et al Query expansion based on crowd knowledge for code search[J]. IEEE Transactions on Services Computing, 2016, 9 (5): 771- 783
doi: 10.1109/TSC.2016.2560165
2 JIANG H, NIE L, SUN Z, et al ROSF: leveraging information retrieval and supervised learning for recommending code snippets[J]. IEEE Transactions on Services Computing, 2019, 12 (1): 34- 46
doi: 10.1109/TSC.2016.2592909
3 RAGHOTHAMAN M, WEI Y, HAMADI Y. SWIM: synthesizing what I mean—code search and idiomatic snippet synthesis [C]// Proceedings of the 38th International Conference on Software Engineering. Austin: ACM, 2016: 357–367.
4 GU X, ZHANG H, ZHANG D, et al. Deep API learning [C]// Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle: ACM, 2016: 631-642.
5 CAI L, WANG H, HUANG Q, et al. BIKER: a tool for Bi-information source based API method recommendation [C]// Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Tallinn: ACM, 2019: 1075-1079.
6 ZHOU Y, YANG X, CHEN T, et al Boosting API recommendation with implicit feedback[J]. IEEE Transactions on Software Engineering, 2021, 1 (1): 1
7 XIE W, PENG X, LIU M, et al. API method recommendation via explicit matching of functionality verb phrases [C]// Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Sacramento: ACM, 2020: 1015-1026.
8 LECUN Y, YOSHUA B Convolutional networks for images, speech, and time series[J]. The Handbook of Brain Theory and Neural Networks, 1995, 3361 (10): 1995
9 HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
10 SCARSELLI F, GORI M, TSOI A, et al The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20 (1): 61- 80
11 LING C, ZOU Y, XIE B. Graph neural network based collaborative filtering for API usage recommendation [C]// 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering . Tokyo: IEEE, 2021: 36-47.
12 ZHONG H, XIE T, ZHANG L, et al. MAPO: Mining and recommending API usage patterns [C]// European Conference on Object-Oriented Programming. Genoa: Springer, 2009: 318-343.
13 WANG J, DANG Y, ZHANG H, et al. Mining succinct and high-coverage API usage patterns from source code [C]// 2013 10th Working Conference on Mining Software Repositories. San Francisco: IEEE, 2013: 319-328.
14 FOWKES J, SUTTON C. Parameter-free probabilistic API mining across GitHub [C]// Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle : ACM, 2016: 254-265.
15 NGUYEN P T, ROCCO J D, RUSCIO D D, et al. FOCUS: a recommender system for mining API function calls and usage patterns [C]// International Conference on Software Engineering. Montreal: IEEE, 2019: 1050-1060.
16 CHEN A. Context-aware collaborative filtering system: predicting the user’s preference in the ubiquitous computing environment [C]// International Symposium on Location-and Context-Awareness. Berlin: Springer, 2005: 244-253.
17 GUO G, WANG H, BELL D, et al. KNN model-based approach in classification [C]// On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Montpellier: Springer, 2003: 986-996.
18 NGUYEN T, NGUYEN A, PHAN H, et al. Combining Word2Vec with revised vector space model for better code retrieval [C]// 2017 IEEE/ACM 39th International Conference on Software Engineering Companion. Buenos Aires: IEEE, 2017: 183-185.
19 NIU J, ZHAO Q, WANG L, et al. OnSeS: a novel online short text summarization based on BM25 and neural network [C]// 2016 IEEE Global Communications Conference. Washington: IEEE, 2016: 1-6.
20 RAMOS J. Using TF-IDF to determine word relevance in document queries [C]// Proceedings of the first instructional conference on machine learning. Moscow: Citeseer, 2003: 29-48.
21 ZHU Q, HUANG J, FENG J, et al A clustering algorithm based on natural nearest neighbor[J]. Journal of Computational Information Systems, 2014, 10 (13): 5473- 5480
22 XIE R, KONG X, WANG L, et al. Hirec: API recommendation using hierarchical context [C]// 2019 IEEE 30th International Symposium on Software Reliability Engineering. Berlin: IEEE, 2019: 369-379.
23 BASTEN B, HILLS M, KLINT P, et al. M3: a general model for code analytics in rascal [C]// 2015 IEEE 1st International Workshop on Software Analytics. Montreal: IEEE, 2015: 25-28.
24 JACCARD P The distribution of the flora in the alpine zone[J]. New phytologist, 1912, XI (2): 37- 50
25 NAH F A study on tolerable waiting time: how long are web users willing to wait?[J]. Behaviour Information Technology, 2004, 23 (3): 153- 163
doi: 10.1080/01449290410001669914
[1] 李诺,郭斌,刘琰,景瑶,於志文. 神经协同过滤智能商业选址方法[J]. 浙江大学学报(工学版), 2019, 53(9): 1788-1794.
[2] 董立岩,金佳欢,方塬程,王越群,李永丽,孙铭会. 基于非负矩阵分解的Slope One算法[J]. 浙江大学学报(工学版), 2019, 53(7): 1349-1353.
[3] 王红霞,陈健,程艳芬. 采用评论挖掘修正用户评分的改进协同过滤算法[J]. 浙江大学学报(工学版), 2019, 53(3): 522-532.
[4] 厉小军,柳虹,施寒潇,朱柳青,张亚辉. 基于深度学习的课程推荐模型[J]. 浙江大学学报(工学版), 2019, 53(11): 2139-2145.
[5] 刘臻, 武泽慧, 曹琰, 魏强. 基于漏洞指纹的软件脆弱性代码复用检测方法[J]. 浙江大学学报(工学版), 2018, 52(11): 2180-2190.
[6] 任迪, 万健, 殷昱煜, 周丽, 高敏. 基于贝叶斯分类的Web服务质量预测方法研究[J]. 浙江大学学报(工学版), 2017, 51(6): 1242-1251.
[7] 毛宜钰, 刘建勋, 胡蓉, 唐明董. 基于Logistic函数和用户聚类的协同过滤算法[J]. 浙江大学学报(工学版), 2017, 51(6): 1252-1258.
[8] 居斌, 钱沄涛, 叶敏超. 基于结构投影非负矩阵分解的协同过滤算法[J]. 浙江大学学报(工学版), 2015, 49(7): 1319-1325.
[9] 扈中凯, 郑小林, 吴亚峰, 陈德人. 基于用户评论挖掘的产品推荐算法[J]. J4, 2013, 47(8): 1475-1485.