Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2022, Vol. 56 Issue (3): 494-502    DOI: 10.3785/j.issn.1008-973X.2022.03.008
    
API recommendation method based on natural nearest neighbors and collaborative filtering
Huang-he ZHENG(),Zhi-qiu HUANG*(),Wei-wei LI,Yao-shen YU,Yong-chao WANG
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
Download: HTML     PDF(914KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An API recommendation method based on natural nearest neighbors and collaborative filtering named N-APIRec was proposed in order to solve the problem of recommendation performance degradation caused by improper neighbor selection. In this model, BM25 algorithm was used to transform the projects into vectors. Then the natural neighbor algorithm was used to filter the similar projects in the dataset to reduce the search scope, and the similar method declarations were filtered from the similar projects. Finally, the APIs were recommended through collaborative filtering. N-APIRec was compared with the state-of-the-art approach on MV and SH data sets. The results were verified the effectiveness of N-APIRec, the success rate of MV and SH data sets recommendation was 77.38%and 30.00% respectively, which was better than the existing methods.



Key wordscode reuse      API recommendation      natural nearest neighbors      BM25      collaborative filtering     
Received: 19 August 2021      Published: 29 March 2022
CLC:  TP 391  
Fund:  国家重点研发计划资助项目(2018YFB1003900)
Corresponding Authors: Zhi-qiu HUANG     E-mail: sz1916053@nuaa.edu.cn;zqhuang@nuaa.edu.cn
Cite this article:

Huang-he ZHENG,Zhi-qiu HUANG,Wei-wei LI,Yao-shen YU,Yong-chao WANG. API recommendation method based on natural nearest neighbors and collaborative filtering. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 494-502.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2022.03.008     OR     https://www.zjujournals.com/eng/Y2022/V56/I3/494


基于自然近邻与协同过滤的API推荐方法

为了解决由于近邻选择不恰当导致的推荐性能下降问题,提出基于自然近邻与协同过滤的API推荐方法——N-APIRec. 该方法利用BM25算法将项目转换成向量,以自然近邻算法筛选数据集中的相似项目以减少搜索范围,从相似项目中筛选相似的方法声明,通过协同过滤的方式推荐API. 将N-APIRec在MV、SH数据集上与前沿方法进行实验对比,结果验证了N-APIRec的有效性,在MV、SH数据集上的推荐成功率分别为77.38%、30.00%,优于现有方法.


关键词: 代码复用,  API推荐,  自然近邻,  BM25,  协同过滤 
Fig.1 Workflow of N-APIRec
Fig.2 Graph of API call structure
项目名称 sp
g_xmlgraphics-commons-1.5 0.418
g_batik-awt-util-1.7 0.284
g_filters-2.0.235 0.107
g_fop-2.0 0.054
g_jcaptcha-2.0-alpha-1-SNA 0.031
g_jwi-2.2.3 0.029
g_pdfxstream-3.5.0 0.026
g_objenesis-1.2 0.026
g_apache-mime4j-core-0.7.2 0.025
g_spring-security-crypto-3.2 0.023
g_batik-svggen-1.7 0.022
g_itextpdf-5.5.13 0.021
Tab.1 Similarity list of project g_batik-codec-1.8
Fig.3 Similarity curve of project
数据集 项目数量 方法数量 API数量 API调用次数
MV 1 600 97 255 30 442 939 645
SH 200 4 530 5 351 27 312
Tab.2 Statistics of experimental data sets
n MV SH
SUCn/% PREn RECn SUCn/% PREn RECn
1 77.4 0.774 0.107 30.0 0.300 0.098
5 87.5 0.569 0.366 42.0 0.143 0.185
10 90.3 0.394 0.481 47.5 0.099 0.234
15 91.9 0.297 0.529 50.5 0.078 0.269
20 92.7 0.238 0.555 52.5 0.062 0.287
Tab.3 Performance of N-APIRec on MV and SH data sets
%
方法 SUC1 SUC5 SUC10 SUC15 SUC20
PAM 8.0 15.0 27.4 29.5 33.5
FOCUS 19.0 31.5 37.0 37.5 41.0
N-APIRec-S 25.5 33.0 40.5 44.0 46.5
N-APIRec-B 29.0 42.0 48.5 49.5 51.5
N-APIRec-N 25.0 40.0 45.0 49.0 51.5
N-APIRec 30.0 42.0 47.5 50.5 52.5
Tab.4 Success rates of different approach on SH data set
Fig.4 Precision-recall curves on different data sets
%
方法 SUC1 SUC5 SUC10 SUC15 SUC20
FOCUS 65.3 81.3 85.8 87.7 88.8
N-APIRec-S 74.9 85.5 89.1 91.0 91.9
N-APIRec-B 75.1 86.2 89.9 91.6 92.6
N-APIRec-N 74.0 85.8 88.6 89.9 90.9
N-APIRec 77.4 87.5 90.3 91.9 92.7
Tab.5 Success rates of different approach on MV data set
%
Cp SUC1 SUC5 SUC10 SUC15 SUC20
20 21.0 26.5 29.5 31.5 32.0
40 22.5 26.5 29.5 32.0 32.0
60 22.5 27.0 30.0 33.0 33.5
80 22.5 27.5 30.5 33.0 35.5
Tab.6 Success rate of different project completeness on SH data set
Cd REC1 REC5 REC10 REC15 REC20
1 0.030 0.082 0.110 0.123 0.134
2 0.054 0.135 0.175 0.206 0.220
3 0.074 0.160 0.208 0.237 0.251
4 0.098 0.185 0.234 0.269 0.287
Tab.7 Recall rate of different method declaration completeness on SH data set
[1]   NIE L, JIANG H, REN Z, et al Query expansion based on crowd knowledge for code search[J]. IEEE Transactions on Services Computing, 2016, 9 (5): 771- 783
doi: 10.1109/TSC.2016.2560165
[2]   JIANG H, NIE L, SUN Z, et al ROSF: leveraging information retrieval and supervised learning for recommending code snippets[J]. IEEE Transactions on Services Computing, 2019, 12 (1): 34- 46
doi: 10.1109/TSC.2016.2592909
[3]   RAGHOTHAMAN M, WEI Y, HAMADI Y. SWIM: synthesizing what I mean—code search and idiomatic snippet synthesis [C]// Proceedings of the 38th International Conference on Software Engineering. Austin: ACM, 2016: 357–367.
[4]   GU X, ZHANG H, ZHANG D, et al. Deep API learning [C]// Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle: ACM, 2016: 631-642.
[5]   CAI L, WANG H, HUANG Q, et al. BIKER: a tool for Bi-information source based API method recommendation [C]// Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Tallinn: ACM, 2019: 1075-1079.
[6]   ZHOU Y, YANG X, CHEN T, et al Boosting API recommendation with implicit feedback[J]. IEEE Transactions on Software Engineering, 2021, 1 (1): 1
[7]   XIE W, PENG X, LIU M, et al. API method recommendation via explicit matching of functionality verb phrases [C]// Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering. Sacramento: ACM, 2020: 1015-1026.
[8]   LECUN Y, YOSHUA B Convolutional networks for images, speech, and time series[J]. The Handbook of Brain Theory and Neural Networks, 1995, 3361 (10): 1995
[9]   HOCHREITER S, SCHMIDHUBER J Long short-term memory[J]. Neural computation, 1997, 9 (8): 1735- 1780
doi: 10.1162/neco.1997.9.8.1735
[10]   SCARSELLI F, GORI M, TSOI A, et al The graph neural network model[J]. IEEE Transactions on Neural Networks, 2008, 20 (1): 61- 80
[11]   LING C, ZOU Y, XIE B. Graph neural network based collaborative filtering for API usage recommendation [C]// 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering . Tokyo: IEEE, 2021: 36-47.
[12]   ZHONG H, XIE T, ZHANG L, et al. MAPO: Mining and recommending API usage patterns [C]// European Conference on Object-Oriented Programming. Genoa: Springer, 2009: 318-343.
[13]   WANG J, DANG Y, ZHANG H, et al. Mining succinct and high-coverage API usage patterns from source code [C]// 2013 10th Working Conference on Mining Software Repositories. San Francisco: IEEE, 2013: 319-328.
[14]   FOWKES J, SUTTON C. Parameter-free probabilistic API mining across GitHub [C]// Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. Seattle : ACM, 2016: 254-265.
[15]   NGUYEN P T, ROCCO J D, RUSCIO D D, et al. FOCUS: a recommender system for mining API function calls and usage patterns [C]// International Conference on Software Engineering. Montreal: IEEE, 2019: 1050-1060.
[16]   CHEN A. Context-aware collaborative filtering system: predicting the user’s preference in the ubiquitous computing environment [C]// International Symposium on Location-and Context-Awareness. Berlin: Springer, 2005: 244-253.
[17]   GUO G, WANG H, BELL D, et al. KNN model-based approach in classification [C]// On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. Montpellier: Springer, 2003: 986-996.
[18]   NGUYEN T, NGUYEN A, PHAN H, et al. Combining Word2Vec with revised vector space model for better code retrieval [C]// 2017 IEEE/ACM 39th International Conference on Software Engineering Companion. Buenos Aires: IEEE, 2017: 183-185.
[19]   NIU J, ZHAO Q, WANG L, et al. OnSeS: a novel online short text summarization based on BM25 and neural network [C]// 2016 IEEE Global Communications Conference. Washington: IEEE, 2016: 1-6.
[20]   RAMOS J. Using TF-IDF to determine word relevance in document queries [C]// Proceedings of the first instructional conference on machine learning. Moscow: Citeseer, 2003: 29-48.
[21]   ZHU Q, HUANG J, FENG J, et al A clustering algorithm based on natural nearest neighbor[J]. Journal of Computational Information Systems, 2014, 10 (13): 5473- 5480
[22]   XIE R, KONG X, WANG L, et al. Hirec: API recommendation using hierarchical context [C]// 2019 IEEE 30th International Symposium on Software Reliability Engineering. Berlin: IEEE, 2019: 369-379.
[23]   BASTEN B, HILLS M, KLINT P, et al. M3: a general model for code analytics in rascal [C]// 2015 IEEE 1st International Workshop on Software Analytics. Montreal: IEEE, 2015: 25-28.
[24]   JACCARD P The distribution of the flora in the alpine zone[J]. New phytologist, 1912, XI (2): 37- 50
[25]   NAH F A study on tolerable waiting time: how long are web users willing to wait?[J]. Behaviour Information Technology, 2004, 23 (3): 153- 163
doi: 10.1080/01449290410001669914
[1] Nuo LI,Bin GUO,Yan LIU,Yao JING,Zhi-wen YU. Intelligent commercial site recommendation with neural collaborative filtering[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1788-1794.
[2] Li-yan DONG,Jia-huan JIN,Yuan-cheng FANG,Yue-qun WANG,Yong-li LI,Ming-hui SUN. Slope One algorithm based on nonnegative matrix factorization[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(7): 1349-1353.
[3] Hong-xia WANG,Jian CHEN,Yan-fen CHENG. Improved collaborative filtering algorithm to revise users' rating by review mining[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(3): 522-532.
[4] Xiao-jun LI,Hong LIU,Han-xiao SHI,Liu-qing ZHU,Ya-hui ZHANG. Deep learning based course recommendation model[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(11): 2139-2145.
[5] LIU Zhen, WU Ze-hui, CAO Yan, WEI Qiang. Software vulnerable code reuse detection method based on vulnerability fingerprint[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(11): 2180-2190.
[6] REN Di, WAN Jian, YIN Yu-yu, ZHOU Li, GAO Min. Web services QoS prediction method based on Bayes classification[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(6): 1242-1251.
[7] MAO Yi-yu, LIU Jian-xun, HU Rong, TANG Ming-dong. Collaborative filtering algorithm based on Logistic function and user clustering[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(6): 1252-1258.
[8] JU Bin, QIAN Yun-tao, YE Min-chao. Collaborative filtering algorithm based on structured projective nonnegative matrix factorization[J]. Journal of ZheJiang University (Engineering Science), 2015, 49(7): 1319-1325.
[9] HU Zhong-kai, ZHENG Xiao-lin, WU Ya-feng, CHEN De-ren. Product recommendation algorithm based on users’ reviews mining[J]. Journal of ZheJiang University (Engineering Science), 2013, 47(8): 1475-1485.