|
|
Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm |
Zhe YANG1,2(),Hong-wei GE1,2,*(),Ting LI1,2 |
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China 2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi 214122, China |
|
|
Abstract A mixture of experts parallel recommendation algorithm framework which combined feature fusion and distribution was proposed in order to address the issues of parameter sharing and high computational costs in click-through rate prediction. The ability of parallel architecture can be improved to distinguish different types of features and learn more expressive feature inputs, and parameters between explicit and implicit features can be shared. The gradients during backpropagation were mitigated and the performance of the model was improved. The framework is lightweight and model-agnostic, and can be generalized to a variety of mainstream parallel recommendation algorithms. Extensive experimental results on three public datasets demonstrate that the algorithm framework can be used to effectively improve the performance of SOTA models.
|
Received: 25 July 2022
Published: 17 July 2023
|
|
Fund: 国家自然科学基金资助项目(61806006);江苏高校优势学科建设工程资助项目;111引智计划资助项目( B12018) |
Corresponding Authors:
Hong-wei GE
E-mail: yz9909@qq.com;ghw8601@163.com
|
特征融合与分发的多专家并行推荐算法框架
为了解决点击率预测任务中现存的参数共享和计算耗费较高的问题,提出特征融合与分发的多专家并行推荐算法框架. 利用该方法不仅可以提高并行架构对不同类型特征的分辨能力,学习表现力更强的特征输入,还能够在显式特征和隐式特征之间进行参数共享,缓和反向传播期间的梯度,提高模型的性能. 该框架是轻量级而且与模型无关的,可以泛化应用在众多主流并行架构的推荐算法上. 在3个公共数据集上的大量实验结果表明,利用该算法框架,能够有效地提高SOTA模型的性能.
关键词:
推荐系统,
点击率预测,
深度学习,
多专家模型
|
|
[1] |
KHAWAR F, HANG X, TANG R, et al. Autofeature: searching for feature interactions and their architectures for click-through rate prediction [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. [S. l.]: ACM, 2020: 625-634.
|
|
|
[2] |
HU D, WANG C, NIE F, et al. Dense multimodal fusion for hierarchically joint representation [C]// 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton: IEEE, 2019: 3941-3945.
|
|
|
[3] |
CHENG H T, KOC L, HARMSEN J, et al. Wide and deep learning for recommender systems [C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston: ACM, 2016: 7-10.
|
|
|
[4] |
RENDLE S. Factorization machines [C]// IEEE International Conference on Data Mining. Sydney: IEEE, 2010: 995-1000.
|
|
|
[5] |
GUO H, TANG R, YE Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: AAAI, 2017: 1725-1731.
|
|
|
[6] |
WANG R, SHIVANNA R, CHENG D, et al. DCN v2: improved deep & cross network and practical lessons for web-scale learning to rank systems [C]// Proceedings of the Web Conference. Ljubljana: ACM, 2021: 1785-1797.
|
|
|
[7] |
BEUTEL A, COVINGTON P, JAIN S, et al. Latent cross: making use of context in recurrent recommender systems [C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. Marina Del Rey: ACM, 2018: 46-54.
|
|
|
[8] |
QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2018, 37 (1): 1- 35
|
|
|
[9] |
ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1059-1068.
|
|
|
[10] |
ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[C]// Proceedings of the AAAI Conference on Artificial Intelligence. California: AAAI, 2019: 5941-5948.
|
|
|
[11] |
WANG R, FU B, FU G, et al. Deep & cross network for ad click predictions [M]// Proceedings of the ADKDD'17. Halifax: ACM, 2017: 1-7.
|
|
|
[12] |
SONG W, SHI C, XIAO Z, et al. Autoint: automatic feature interaction learning via self-attentive neural networks [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 1161-1170.
|
|
|
[13] |
LIAN J, ZHOU X, ZHANG F, et al. xdeepfm: combining explicit and implicit feature interactions for recommender systems [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1754-1763.
|
|
|
[14] |
HUANG T, SHE Q, WANG Z, et al GateNet: gating-enhanced deep network for click-through rate prediction[J]. ArXiv, 2020, 7 (1): 1- 7
|
|
|
[15] |
CHEN B, WANG Y, LIU Z, et al. Enhancing explicit and implicit feature interactions via information sharing for parallel deep CTR models [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management. Queensland: ACM, 2021: 3757-3766.
|
|
|
[16] |
MA J, ZHAO Z, YI X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1930-1939.
|
|
|
[17] |
HOLMES N P, SPENCE C Multisensory integration: space, time and superadditivity[J]. Current Biology, 2005, 15 (18): R762- R764
doi: 10.1016/j.cub.2005.08.058
|
|
|
[18] |
ZHENG Z, XU P, ZOU X, et al CowClip: reducing CTR prediction model training time from 12 hours to 10 minutes on 1 GPU[J]. ArXiv, 2022, 4 (1): 1- 18
|
|
|
[19] |
KINGMA D P, BA J Adam: a method for stochastic optimization[J]. ArXiv, 2014, 12 (1): 1- 13
|
|
|
[20] |
HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|