Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (7): 1317-1325    DOI: 10.3785/j.issn.1008-973X.2023.07.006
自动化技术     
特征融合与分发的多专家并行推荐算法框架
杨哲1,2(),葛洪伟1,2,*(),李婷1,2
1. 江南大学 人工智能与计算机学院,江苏 无锡 214122
2. 江苏省模式识别与计算智能工程实验室,江苏 无锡 214122
Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm
Zhe YANG1,2(),Hong-wei GE1,2,*(),Ting LI1,2
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi 214122, China
 全文: PDF(1195 KB)   HTML
摘要:

为了解决点击率预测任务中现存的参数共享和计算耗费较高的问题,提出特征融合与分发的多专家并行推荐算法框架. 利用该方法不仅可以提高并行架构对不同类型特征的分辨能力,学习表现力更强的特征输入,还能够在显式特征和隐式特征之间进行参数共享,缓和反向传播期间的梯度,提高模型的性能. 该框架是轻量级而且与模型无关的,可以泛化应用在众多主流并行架构的推荐算法上. 在3个公共数据集上的大量实验结果表明,利用该算法框架,能够有效地提高SOTA模型的性能.

关键词: 推荐系统点击率预测深度学习多专家模型    
Abstract:

A mixture of experts parallel recommendation algorithm framework which combined feature fusion and distribution was proposed in order to address the issues of parameter sharing and high computational costs in click-through rate prediction. The ability of parallel architecture can be improved to distinguish different types of features and learn more expressive feature inputs, and parameters between explicit and implicit features can be shared. The gradients during backpropagation were mitigated and the performance of the model was improved. The framework is lightweight and model-agnostic, and can be generalized to a variety of mainstream parallel recommendation algorithms. Extensive experimental results on three public datasets demonstrate that the algorithm framework can be used to effectively improve the performance of SOTA models.

Key words: recommender system    click-through rate prediction    deep learning    mixture of experts
收稿日期: 2022-07-25 出版日期: 2023-07-17
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61806006);江苏高校优势学科建设工程资助项目;111引智计划资助项目( B12018)
通讯作者: 葛洪伟     E-mail: yz9909@qq.com;ghw8601@163.com
作者简介: 杨哲(1998—),男,硕士生,从事推荐系统的研究. orcid.org/0000-0002-8252-6625. E-mail: yz9909@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
杨哲
葛洪伟
李婷

引用本文:

杨哲,葛洪伟,李婷. 特征融合与分发的多专家并行推荐算法框架[J]. 浙江大学学报(工学版), 2023, 57(7): 1317-1325.

Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.07.006        https://www.zjujournals.com/eng/CN/Y2023/V57/I7/1317

图 1  串行架构和并行架构的示意图
图 2  多专家并行推荐算法框架的整体示意图
图 3  Broker模块的内部结构
数据集 M/106 F C/106
Criteo 45 39 33
Avazu 40 23 9.4
Movielens-1M 0.74 7 0.013
表 1  3个实验数据集的参数
模型 Criteo Avazu MovieLens-1M
AUC LogLoss AUC LogLoss AUC LogLoss
DeepFM 0.8007 0.4508 0.7852 0.3780 0.8932 0.3202
DCN 0.8099 0.4419 0.7905 0.3744 0.8935 0.3197
xDeepFM 0.8052 0.4418 0.7894 0.3794 0.8923 0.3251
AutoInt+ 0.8083 0.4434 0.7774 0.3811 0.8488 0.3753
DCN-v2 0.8115 0.4406 0.7907 0.3742 0.8964 0.3160
EDCN 0.8001 0.5415 0.7793 0.3803 0.8722 0.3469
CowClip 0.8097 0.4420 0.7906 0.3740 0.8961 0.3174
本文方法 0.8122 0.4398 0.7928 0.3732 0.8970 0.3163
表 2  ME-DCN与其他SOTA模型在3个数据集上的性能比较
模型 $N_{\rm{p}} /10^6 $
DeepFM 1.4
DCN 3.1
xDeepFM 4.2
AutoInt+ 3.7
DCN-v2 7.2
EDCN 11
本文方法 5.7
表 3  ME-DCN与其他模型参数量的对比(Criteo)
模型 Criteo Avazu MovieLens-1M
AUC LogLoss AUC LogLoss AUC LogLoss
DCN 0.8099 0.4419 0.7905 0.3744 0.8935 0.3197
DCNME 0.8116 0.4403 0.7919 0.3731 0.8962 0.3174
AutoInt+ 0.8083 0.4434 0.7774 0.3811 0.8488 0.3753
AutoInt+ME 0.8104 0.4414 0.7899 0.3737 0.8928 0.3250
DCN-v2 0.8115 0.4406 0.7907 0.3742 0.8964 0.3160
DCN-v2ME 0.8122 0.4398 0.7928 0.3732 0.8970 0.3163
表 4  SOTA并行架构模型使用ME-PRAF后在3个数据集上的性能比较
模型 AUC LogLoss
ME-DCN -w/o FB 0.811 7 0.440 3
ME-DCN -w/o EB 0.811 3 0.440 7
ME-DCN 0.812 2 0.439 8
表 5  ME-DCN模型上的Broker模块消融实验(Criteo)
模型 AUC LogLoss
ME-DCN -w/ concat 0.812 2 0.439 8
ME-DCN -w/ add 0.810 5 0.441 8
ME-DCN -w/ Hardmard 0.810 7 0.441 6
表 6  ME-DCN模型上Fusion模块不同融合方式的性能对比(Criteo)
图 4  Broker模块特征权重的差异度分析
1 KHAWAR F, HANG X, TANG R, et al. Autofeature: searching for feature interactions and their architectures for click-through rate prediction [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. [S. l.]: ACM, 2020: 625-634.
2 HU D, WANG C, NIE F, et al. Dense multimodal fusion for hierarchically joint representation [C]// 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton: IEEE, 2019: 3941-3945.
3 CHENG H T, KOC L, HARMSEN J, et al. Wide and deep learning for recommender systems [C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston: ACM, 2016: 7-10.
4 RENDLE S. Factorization machines [C]// IEEE International Conference on Data Mining. Sydney: IEEE, 2010: 995-1000.
5 GUO H, TANG R, YE Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: AAAI, 2017: 1725-1731.
6 WANG R, SHIVANNA R, CHENG D, et al. DCN v2: improved deep & cross network and practical lessons for web-scale learning to rank systems [C]// Proceedings of the Web Conference. Ljubljana: ACM, 2021: 1785-1797.
7 BEUTEL A, COVINGTON P, JAIN S, et al. Latent cross: making use of context in recurrent recommender systems [C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. Marina Del Rey: ACM, 2018: 46-54.
8 QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2018, 37 (1): 1- 35
9 ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1059-1068.
10 ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[C]// Proceedings of the AAAI Conference on Artificial Intelligence. California: AAAI, 2019: 5941-5948.
11 WANG R, FU B, FU G, et al. Deep & cross network for ad click predictions [M]// Proceedings of the ADKDD'17. Halifax: ACM, 2017: 1-7.
12 SONG W, SHI C, XIAO Z, et al. Autoint: automatic feature interaction learning via self-attentive neural networks [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 1161-1170.
13 LIAN J, ZHOU X, ZHANG F, et al. xdeepfm: combining explicit and implicit feature interactions for recommender systems [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1754-1763.
14 HUANG T, SHE Q, WANG Z, et al GateNet: gating-enhanced deep network for click-through rate prediction[J]. ArXiv, 2020, 7 (1): 1- 7
15 CHEN B, WANG Y, LIU Z, et al. Enhancing explicit and implicit feature interactions via information sharing for parallel deep CTR models [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management. Queensland: ACM, 2021: 3757-3766.
16 MA J, ZHAO Z, YI X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1930-1939.
17 HOLMES N P, SPENCE C Multisensory integration: space, time and superadditivity[J]. Current Biology, 2005, 15 (18): R762- R764
doi: 10.1016/j.cub.2005.08.058
18 ZHENG Z, XU P, ZOU X, et al CowClip: reducing CTR prediction model training time from 12 hours to 10 minutes on 1 GPU[J]. ArXiv, 2022, 4 (1): 1- 18
19 KINGMA D P, BA J Adam: a method for stochastic optimization[J]. ArXiv, 2014, 12 (1): 1- 13
20 HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
[1] 周欣磊,顾海挺,刘晶,许月萍,耿芳,王冲. 基于集成学习与深度学习的日供水量预测方法[J]. 浙江大学学报(工学版), 2023, 57(6): 1120-1127.
[2] 刘沛丰,钱璐,赵兴炜,陶波. 航空装配领域中命名实体识别的持续学习框架[J]. 浙江大学学报(工学版), 2023, 57(6): 1186-1194.
[3] 赵嘉墀,王天琪,曾丽芳,邵雪明. 基于GRU的扑翼非定常气动特性快速预测[J]. 浙江大学学报(工学版), 2023, 57(6): 1251-1256.
[4] 曹晓璐,卢富男,朱翔,翁立波,卢书芳,高飞. 基于草图的兼容性服装生成方法[J]. 浙江大学学报(工学版), 2023, 57(5): 939-947.
[5] 苏育挺,陆荣烜,张为. 基于注意力和自适应权重的车辆重识别算法[J]. 浙江大学学报(工学版), 2023, 57(4): 712-718.
[6] 马庆禄,鲁佳萍,唐小垚,段学锋. 改进YOLOv5s的公路隧道烟火检测方法[J]. 浙江大学学报(工学版), 2023, 57(4): 784-794.
[7] 曾耀,高法钦. 基于改进YOLOv5的电子元件表面缺陷检测算法[J]. 浙江大学学报(工学版), 2023, 57(3): 455-465.
[8] 兰欢,余建波. 基于深度学习三维成型的钢板表面缺陷检测[J]. 浙江大学学报(工学版), 2023, 57(3): 466-476.
[9] 曾菊香,王平辉,丁益东,兰林,蔡林熹,管晓宏. 面向节点分类的图神经网络节点嵌入增强模型[J]. 浙江大学学报(工学版), 2023, 57(2): 219-225.
[10] 张京京,张兆功,许鑫. 融合图增强和采样策略的图卷积协同过滤模型[J]. 浙江大学学报(工学版), 2023, 57(2): 243-251.
[11] 鲁建厦,包秦,汤洪涛,邵益平,赵文彬. 无设备人体追踪系统的择优标签方法[J]. 浙江大学学报(工学版), 2023, 57(2): 415-425.
[12] 马骏驰,迪骁鑫,段宗涛,唐蕾. 程序表示学习综述[J]. 浙江大学学报(工学版), 2023, 57(1): 155-169.
[13] 叶晨,战洪飞,林颖俊,余军合,王瑞,钟武昌. 基于推理-情境感知激活模型的设计知识推荐[J]. 浙江大学学报(工学版), 2023, 57(1): 32-46.
[14] 刘近贞,陈飞,熊慧. 多尺度残差网络模型的开放式电阻抗成像算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1789-1795.
[15] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.