浙江大学学报(工学版)  2022, Vol. 56 Issue (3): 494-502    DOI: 10.3785/j.issn.1008-973X.2022.03.008
南京航空航天大学 计算机科学与技术学院,江苏 南京 210016
API recommendation method based on natural nearest neighbors and collaborative filtering
Huang-he ZHENG(),Zhi-qiu HUANG*(),Wei-wei LI,Yao-shen YU,Yong-chao WANG
College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China
为了解决由于近邻选择不恰当导致的推荐性能下降问题,提出基于自然近邻与协同过滤的API推荐方法——N-APIRec. 该方法利用BM25算法将项目转换成向量,以自然近邻算法筛选数据集中的相似项目以减少搜索范围,从相似项目中筛选相似的方法声明,通过协同过滤的方式推荐API. 将N-APIRec在MV、SH数据集上与前沿方法进行实验对比,结果验证了N-APIRec的有效性,在MV、SH数据集上的推荐成功率分别为77.38%、30.00%,优于现有方法.

关键词: 代码复用API推荐自然近邻BM25协同过滤    

An API recommendation method based on natural nearest neighbors and collaborative filtering named N-APIRec was proposed in order to solve the problem of recommendation performance degradation caused by improper neighbor selection. In this model, BM25 algorithm was used to transform the projects into vectors. Then the natural neighbor algorithm was used to filter the similar projects in the dataset to reduce the search scope, and the similar method declarations were filtered from the similar projects. Finally, the APIs were recommended through collaborative filtering. N-APIRec was compared with the state-of-the-art approach on MV and SH data sets. The results were verified the effectiveness of N-APIRec, the success rate of MV and SH data sets recommendation was 77.38%and 30.00% respectively, which was better than the existing methods.

Key words: code reuse    API recommendation    natural nearest neighbors    BM25    collaborative filtering
收稿日期: 2021-08-19 出版日期: 2022-03-29
CLC:  TP 391  
基金资助: 国家重点研发计划资助项目(2018YFB1003900)
通讯作者: 黄志球
作者简介: 郑黄河(1996—),男,硕士生,从事智能化软件开发研究. E-mail:
郑黄河,黄志球,李伟湋,喻垚慎,王永超. 基于自然近邻与协同过滤的API推荐方法[J]. 浙江大学学报(工学版), 2022, 56(3): 494-502.

Huang-he ZHENG,Zhi-qiu HUANG,Wei-wei LI,Yao-shen YU,Yong-chao WANG. API recommendation method based on natural nearest neighbors and collaborative filtering. Journal of ZheJiang University (Engineering Science), 2022, 56(3): 494-502.


图 1  N-APIRec的工作流程
图 2  API调用结构图
项目名称 sp
g_xmlgraphics-commons-1.5 0.418
g_batik-awt-util-1.7 0.284
g_filters-2.0.235 0.107
g_fop-2.0 0.054
g_jcaptcha-2.0-alpha-1-SNA 0.031
g_jwi-2.2.3 0.029
g_pdfxstream-3.5.0 0.026
g_objenesis-1.2 0.026
g_apache-mime4j-core-0.7.2 0.025
g_spring-security-crypto-3.2 0.023
g_batik-svggen-1.7 0.022
g_itextpdf-5.5.13 0.021
表 1  项目g_batik-codec-1.8的相似度列表
图 3  项目相似度折线图
数据集 项目数量 方法数量 API数量 API调用次数
MV 1 600 97 255 30 442 939 645
SH 200 4 530 5 351 27 312
表 2  实验数据集统计信息
1 77.4 0.774 0.107 30.0 0.300 0.098
5 87.5 0.569 0.366 42.0 0.143 0.185
10 90.3 0.394 0.481 47.5 0.099 0.234
15 91.9 0.297 0.529 50.5 0.078 0.269
20 92.7 0.238 0.555 52.5 0.062 0.287
表 3  N-APIRec在数据集MV、SH上的表现
方法 SUC1 SUC5 SUC10 SUC15 SUC20
PAM 8.0 15.0 27.4 29.5 33.5
FOCUS 19.0 31.5 37.0 37.5 41.0
N-APIRec-S 25.5 33.0 40.5 44.0 46.5
N-APIRec-B 29.0 42.0 48.5 49.5 51.5
N-APIRec-N 25.0 40.0 45.0 49.0 51.5
N-APIRec 30.0 42.0 47.5 50.5 52.5
表 4  在SH数据集上不同方法的成功率
图 4  不同数据集上的精确度召回率曲线
方法 SUC1 SUC5 SUC10 SUC15 SUC20
FOCUS 65.3 81.3 85.8 87.7 88.8
N-APIRec-S 74.9 85.5 89.1 91.0 91.9
N-APIRec-B 75.1 86.2 89.9 91.6 92.6
N-APIRec-N 74.0 85.8 88.6 89.9 90.9
N-APIRec 77.4 87.5 90.3 91.9 92.7
表 5  在MV数据集上不同方法的成功率
20 21.0 26.5 29.5 31.5 32.0
40 22.5 26.5 29.5 32.0 32.0
60 22.5 27.0 30.0 33.0 33.5
80 22.5 27.5 30.5 33.0 35.5
表 6  SH数据集上不同项目完成度下的成功率
1 0.030 0.082 0.110 0.123 0.134
2 0.054 0.135 0.175 0.206 0.220
3 0.074 0.160 0.208 0.237 0.251
4 0.098 0.185 0.234 0.269 0.287
表 7  SH数据集上不同方法声明完成度下的召回率
