Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (7): 1317-1325    DOI: 10.3785/j.issn.1008-973X.2023.07.006
    
Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm
Zhe YANG1,2(),Hong-wei GE1,2,*(),Ting LI1,2
1. School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China
2. Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence, Wuxi 214122, China
Download: HTML     PDF(1195KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A mixture of experts parallel recommendation algorithm framework which combined feature fusion and distribution was proposed in order to address the issues of parameter sharing and high computational costs in click-through rate prediction. The ability of parallel architecture can be improved to distinguish different types of features and learn more expressive feature inputs, and parameters between explicit and implicit features can be shared. The gradients during backpropagation were mitigated and the performance of the model was improved. The framework is lightweight and model-agnostic, and can be generalized to a variety of mainstream parallel recommendation algorithms. Extensive experimental results on three public datasets demonstrate that the algorithm framework can be used to effectively improve the performance of SOTA models.



Key wordsrecommender system      click-through rate prediction      deep learning      mixture of experts     
Received: 25 July 2022      Published: 17 July 2023
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(61806006);江苏高校优势学科建设工程资助项目;111引智计划资助项目( B12018)
Corresponding Authors: Hong-wei GE     E-mail: yz9909@qq.com;ghw8601@163.com
Cite this article:

Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.07.006     OR     https://www.zjujournals.com/eng/Y2023/V57/I7/1317


特征融合与分发的多专家并行推荐算法框架

为了解决点击率预测任务中现存的参数共享和计算耗费较高的问题,提出特征融合与分发的多专家并行推荐算法框架. 利用该方法不仅可以提高并行架构对不同类型特征的分辨能力,学习表现力更强的特征输入,还能够在显式特征和隐式特征之间进行参数共享,缓和反向传播期间的梯度,提高模型的性能. 该框架是轻量级而且与模型无关的,可以泛化应用在众多主流并行架构的推荐算法上. 在3个公共数据集上的大量实验结果表明,利用该算法框架,能够有效地提高SOTA模型的性能.


关键词: 推荐系统,  点击率预测,  深度学习,  多专家模型 
Fig.1 Illustration of sequential and parallel architecture
Fig.2 Illustration overall architecture diagram of ME-PRAF
Fig.3 Internal structure of Broker module
数据集 M/106 F C/106
Criteo 45 39 33
Avazu 40 23 9.4
Movielens-1M 0.74 7 0.013
Tab.1 Parameters of three datasets in experiment
模型 Criteo Avazu MovieLens-1M
AUC LogLoss AUC LogLoss AUC LogLoss
DeepFM 0.8007 0.4508 0.7852 0.3780 0.8932 0.3202
DCN 0.8099 0.4419 0.7905 0.3744 0.8935 0.3197
xDeepFM 0.8052 0.4418 0.7894 0.3794 0.8923 0.3251
AutoInt+ 0.8083 0.4434 0.7774 0.3811 0.8488 0.3753
DCN-v2 0.8115 0.4406 0.7907 0.3742 0.8964 0.3160
EDCN 0.8001 0.5415 0.7793 0.3803 0.8722 0.3469
CowClip 0.8097 0.4420 0.7906 0.3740 0.8961 0.3174
本文方法 0.8122 0.4398 0.7928 0.3732 0.8970 0.3163
Tab.2 Performance comparisons between ME-DCN and other SOTA models in three datasets
模型 $N_{\rm{p}} /10^6 $
DeepFM 1.4
DCN 3.1
xDeepFM 4.2
AutoInt+ 3.7
DCN-v2 7.2
EDCN 11
本文方法 5.7
Tab.3 Number of parameters comparison between ME-DCN and other models (Criteo)
模型 Criteo Avazu MovieLens-1M
AUC LogLoss AUC LogLoss AUC LogLoss
DCN 0.8099 0.4419 0.7905 0.3744 0.8935 0.3197
DCNME 0.8116 0.4403 0.7919 0.3731 0.8962 0.3174
AutoInt+ 0.8083 0.4434 0.7774 0.3811 0.8488 0.3753
AutoInt+ME 0.8104 0.4414 0.7899 0.3737 0.8928 0.3250
DCN-v2 0.8115 0.4406 0.7907 0.3742 0.8964 0.3160
DCN-v2ME 0.8122 0.4398 0.7928 0.3732 0.8970 0.3163
Tab.4 Performance comparison of SOTA parallel architecture models after using ME-PRAF on three datasets
模型 AUC LogLoss
ME-DCN -w/o FB 0.811 7 0.440 3
ME-DCN -w/o EB 0.811 3 0.440 7
ME-DCN 0.812 2 0.439 8
Tab.5 Ablation study of Broker modules in ME-DCN(Criteo)
模型 AUC LogLoss
ME-DCN -w/ concat 0.812 2 0.439 8
ME-DCN -w/ add 0.810 5 0.441 8
ME-DCN -w/ Hardmard 0.810 7 0.441 6
Tab.6 Performance comparison of various fusion types in Fusion module in ME-DCN (Criteo)
Fig.4 Analysis of diversity factor of feature weight of Broker module
[1]   KHAWAR F, HANG X, TANG R, et al. Autofeature: searching for feature interactions and their architectures for click-through rate prediction [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management. [S. l.]: ACM, 2020: 625-634.
[2]   HU D, WANG C, NIE F, et al. Dense multimodal fusion for hierarchically joint representation [C]// 2019 IEEE International Conference on Acoustics, Speech and Signal Processing. Brighton: IEEE, 2019: 3941-3945.
[3]   CHENG H T, KOC L, HARMSEN J, et al. Wide and deep learning for recommender systems [C]// Proceedings of the 1st Workshop on Deep Learning for Recommender Systems. Boston: ACM, 2016: 7-10.
[4]   RENDLE S. Factorization machines [C]// IEEE International Conference on Data Mining. Sydney: IEEE, 2010: 995-1000.
[5]   GUO H, TANG R, YE Y, et al. DeepFM: a factorization-machine based neural network for CTR prediction [C]// Proceedings of the 26th International Joint Conference on Artificial Intelligence. Melbourne: AAAI, 2017: 1725-1731.
[6]   WANG R, SHIVANNA R, CHENG D, et al. DCN v2: improved deep & cross network and practical lessons for web-scale learning to rank systems [C]// Proceedings of the Web Conference. Ljubljana: ACM, 2021: 1785-1797.
[7]   BEUTEL A, COVINGTON P, JAIN S, et al. Latent cross: making use of context in recurrent recommender systems [C]// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. Marina Del Rey: ACM, 2018: 46-54.
[8]   QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2018, 37 (1): 1- 35
[9]   ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1059-1068.
[10]   ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction[C]// Proceedings of the AAAI Conference on Artificial Intelligence. California: AAAI, 2019: 5941-5948.
[11]   WANG R, FU B, FU G, et al. Deep & cross network for ad click predictions [M]// Proceedings of the ADKDD'17. Halifax: ACM, 2017: 1-7.
[12]   SONG W, SHI C, XIAO Z, et al. Autoint: automatic feature interaction learning via self-attentive neural networks [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management. Beijing: ACM, 2019: 1161-1170.
[13]   LIAN J, ZHOU X, ZHANG F, et al. xdeepfm: combining explicit and implicit feature interactions for recommender systems [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1754-1763.
[14]   HUANG T, SHE Q, WANG Z, et al GateNet: gating-enhanced deep network for click-through rate prediction[J]. ArXiv, 2020, 7 (1): 1- 7
[15]   CHEN B, WANG Y, LIU Z, et al. Enhancing explicit and implicit feature interactions via information sharing for parallel deep CTR models [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management. Queensland: ACM, 2021: 3757-3766.
[16]   MA J, ZHAO Z, YI X, et al. Modeling task relationships in multi-task learning with multi-gate mixture-of-experts [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. London: ACM, 2018: 1930-1939.
[17]   HOLMES N P, SPENCE C Multisensory integration: space, time and superadditivity[J]. Current Biology, 2005, 15 (18): R762- R764
doi: 10.1016/j.cub.2005.08.058
[18]   ZHENG Z, XU P, ZOU X, et al CowClip: reducing CTR prediction model training time from 12 hours to 10 minutes on 1 GPU[J]. ArXiv, 2022, 4 (1): 1- 18
[19]   KINGMA D P, BA J Adam: a method for stochastic optimization[J]. ArXiv, 2014, 12 (1): 1- 13
[20]   HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification [C]// Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034.
[1] Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG. Daily water supply prediction method based on integrated learning and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1120-1127.
[2] Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO. Continual learning framework of named entity recognition in aviation assembly domain[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1186-1194.
[3] Jia-chi ZHAO,Tian-qi WANG,Li-fang ZENG,Xue-ming SHAO. Rapid prediction of unsteady aerodynamic characteristics of flapping wing based on GRU[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1251-1256.
[4] Xiao-lu CAO,Fu-nan LU,Xiang ZHU,Li-bo WENG,Shu-fang LU,Fei GAO. Sketch-based compatible clothing image generation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 939-947.
[5] Yu-ting SU,Rong-xuan LU,Wei ZHANG. Vehicle re-identification algorithm based on attention mechanism and adaptive weight[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 712-718.
[6] Qing-lu MA,Jia-ping LU,Xiao-yao TANG,Xue-feng DUAN. Improved YOLOv5s flame and smoke detection method in road tunnels[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 784-794.
[7] Yao ZENG,Fa-qin GAO. Surface defect detection algorithm of electronic components based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 455-465.
[8] Huan LAN,Jian-bo YU. Steel surface defect detection based on deep learning 3D reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 466-476.
[9] Ju-xiang ZENG,Ping-hui WANG,Yi-dong DING,Lin LAN,Lin-xi CAI,Xiao-hong GUAN. Graph neural network based node embedding enhancement model for node classification[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 219-225.
[10] Jian-sha LU,Qin BAO,Hong-tao TANG,Yi-ping SHAO,Wen-bin ZHAO. Optimal tag selection method for device-free human tracking system[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 415-425.
[11] Jun-chi MA,Xiao-xin DI,Zong-tao DUAN,Lei TANG. Survey on program representation learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 155-169.
[12] Chen YE,Hong-fei ZHAN,Ying-jun LIN,Jun-he YU,Rui WANG,Wu-chang ZHONG. Design knowledge recommendation based on inference-context-aware activation model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(1): 32-46.
[13] Jin-zhen LIU,Fei CHEN,Hui XIONG. Open electrical impedance imaging algorithm based on multi-scale residual network model[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1789-1795.
[14] Wan-liang WANG,Tie-jun WANG,Jia-cheng CHEN,Wen-bo YOU. Medical image segmentation method combining multi-scale and multi-head attention[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(9): 1796-1805.
[15] Kun HAO,Kuo WANG,Bei-bei WANG. Lightweight underwater biological detection algorithm based on improved Mobilenet-YOLOv3[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1622-1632.