Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (6): 1142-1152    DOI: 10.3785/j.issn.1008-973X.2024.06.005
    
Multi-modal information augmented model for micro-video recommendation
Yufu HUO1(),Beihong JIN1,2,*(),Zhaoyi LIAO1
1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
Download: HTML     PDF(906KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.



Key wordsrecommender system      click through rate      multi modal      micro-video      machine learning     
Received: 24 December 2023      Published: 25 May 2024
CLC:  TP 393  
Fund:  国家自然科学基金资助项目(62072450).
Corresponding Authors: Beihong JIN     E-mail: huoyufu19@mails.ucas.ac.cn;beihong@iscas.ac.cn
Cite this article:

Yufu HUO,Beihong JIN,Zhaoyi LIAO. Multi-modal information augmented model for micro-video recommendation. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1142-1152.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.06.005     OR     https://www.zjujournals.com/eng/Y2024/V58/I6/1142


多模态信息增强的短视频推荐模型

提出用于短视频点击率预估任务的多模态增强模型(MMa4CTR). 模型综合利用用户与短视频交互中的多模态数据,以构建用户的嵌入表示,并学习用户的多模态兴趣. 通过组合和交叉不同模态特征,探索各模态间的共同语义. 通过引入自动学习率调整和验证中断这2种训练策略,提升模型整体的推荐性能. 为了解决多模态数据量增加带来的计算挑战,采用计算效率较高的多层感知机. 在微信视频号和抖音短视频数据集上进行性能比较实验和超参数敏感性实验,结果显示MMa4CTR在保持较低计算开销的同时,实现了超越基线模型的卓越推荐效果. 通过在2个数据集上进行的消融实验,进一步证实了短视频模态交叉模块、用户多模态嵌入层以及自动学习率调整策略和验证中断策略在提升推荐性能方面的重要性和有效性.


关键词: 推荐系统,  点击率,  多模态,  短视频,  机器学习 
Fig.1 Flowchart of validation interruption mechanism
模型AUC@10AUC@20AUC@30AUC@50
BPR0.51130.59810.63860.6716
FPMC0.52040.63380.67610.7038
NCGF0.61120.61590.63100.6612
LightGCN0.50680.56630.61100.6744
BERT4Rec0.74780.75660.77160.7714
GCSAN0.74450.75080.75340.7834
DIN0.832 40.843 60.849 80.849 4
DIEN0.79720.79870.80280.8027
MMa4CTR0.95270.94210.94060.9468
I/%14.4511.6810.6811.47
Tab.1 Rec-performance comparison on WeChat
模型AUC@10AUC@20AUC@30AUC@50
BPR$ 0.380\;5 $$ 0.380\;5 $$ 0.382\;8 $$ 0.389\;7 $
FPMC$ 0.401\;5 $$ 0.407\;0 $$ 0.413\;3 $$ 0.427\;3 $
NCGF$ 0.440\;1 $$ 0.438\;2 $$ 0.447\;5 $$ 0.455\;6 $
LightGCN$ 0.386\;3 $$ 0.386\;2 $$ 0.386\;2 $$ 0.386\;2 $
BERT4Rec0.584 40.583 40.629 50.535 9
GCSAN0.586 90.482 90.455 00.473 6
DIN$ 0.508\;7 $$ 0.546\;9 $$ 0.548\;0 $$ 0.561\;9 $
DIEN0.578 80.578 70.578 70.578 7
MMa4CTR$ {\boldsymbol{0.889\;5}} $$ {\boldsymbol{0.890\;2}} $$ {\boldsymbol{0.891\;0}} $$ {\boldsymbol{0.888\;4}} $
I/%$ 51.55$$ 52.58 $$ 41.54 $$ 53.51 $
Tab.2 Rec-performance comparison on TikTok
模型Tt/sTi/sN
BPR 54.77 45 7 460 224
FPMC 62.43 48 19 820 544
NCGF 126.54 109 7 485 184
LightGCN 91.54 79 7 460 224
BERT4Rec 471.41 317 6 915 904
GCSAN 4756.09 4385 6 908 032
DIN 170.44 108 1 449 310
DIEN 643.73 243 1 587 847
MMa4CTR 180.78 17 29 225
Tab.3 Compute-performance comparison on WeChat
模型$ {T}_{\mathrm{t}}/{\mathrm{s}} $$ {T}_{\mathrm{i}}/{\mathrm{s}} $$ N $
BPR 619.19 601 195 060 416
FPMC 549.91 399 584 966 336
NCGF 1332.28 799 195 085 376
LightGCN 1929.59 1908 195 060 416
BERT4Rec 71639.71 5060 195 056 384
GCSAN 44006.24 37142 195 048 512
DIN 643.46 574 30 761 840
DIEN 2315.56 1464 30 900 377
MMa4CTR 163.80 23 26 025
Tab.4 Compute-performance comparison on TikTok
组合WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉+声音$ 0.944\;8 $$ 0.949\;7 $$ 0.888\;6 $$ 0.888\;0 $
视觉+文本$ 0.952\;0 $$ 0.951\;7 $$ 0.888\;9 $$ 0.888\;0 $
声音+文本$ 0.951\;3 $$ 0.946\;9 $$ 0.888\;6 $$ 0.890\;7 $
Tab.5 Rec-performance via pairwise combination of multi-modal information
组合WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉+声音$ 0.945\;9 $$ 0.943\;9 $$ 0.891\;5 $$ 0.892\;1 $
视觉+文本$ 0.951\;6 $$ 0.950\;3 $$ 0.889\;7 $$ 0.891\;2 $
声音+文本$ 0.951\;6 $$ 0.952\;5 $$ 0.893\;8 $$ 0.893\;2 $
Tab.6 Rec-performance via pairwise cross of multi-modal information
模态WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉$ 0.947\;2 $$ 0.946\;3 $$ 0.822\;1 $$ 0.887\;3 $
声音$ 0.944\;2 $$ 0.942\;9 $$ 0.886\;8 $$ 0.887\;3 $
文本$ 0.886\;2 $$ 0.893\;5 $$ 0.894\;2 $$ 0.893\;3 $
平均$ 0.925\;9 $$ 0.927\;6 $$ 0.867\;7 $$ 0.889\;3 $
Tab.7 Rec-performance via single modal
lWeChatTikTok
AUC@10AUC@30AUC@10AUC@30
$ 0 $$ 0.616\;2 $$ 0.621\;3 $$ 0.576\;1 $$ 0.578\;3 $
$ 5 $$ 0.949\;9 $$ 0.951\;2 $$ 0.891\;0 $$ 0.890\;2 $
$ 11 $$ 0.951\;1 $$ 0.941\;8 $$ 0.889\;5 $$ 0.888\;4 $
$ 15 $$ 0.945\;3 $$ 0.948\;4 $$ 0.890\;0 $$ 0.888\;0 $
$ 21 $$ 0.952\;7 $$ 0.940\;6 $$ 0.889\;5 $$ 0.891\;0 $
$ 30 $$ 0.947\;8 $$ 0.945\;6 $$ 0.889\;1 $$ 0.887\;5 $
Tab.8 Rec-performance with different length of user's multi-modal embeddings
Fig.2 Impact of epochs on rec-performance on WeChat
Fig.3 Impact of epochs on rec-performance on TikTok
学习率WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
0.01000 0.5000 0.7433 0.7025 0.5000
0.00500 0.5003 0.5000 0.8518 0.5000
0.00200 0.9178 0.9329 0.8938 0.8919
0.00100 0.9521 0.9406 0.8899 0.8843
0.00050 0.9479 0.9435 0.8871 0.8888
0.00020 0.9470 0.9516 0.8898 0.8867
0.00010 0.9427 0.9485 0.8909 0.8908
0.00005 0.9473 0.9453 0.8893 0.8924
0.00002 0.9494 0.9426 0.8902 0.8892
0.00001 0.9443 0.9489 0.8926 0.8890
Tab.9 Impact of learning rate on rec-performance
Fig.4 Impact of batch size on rec-performance on WeChat
Fig.5 Impact of batch size on rec-performance on TikTok
[1]   LINDEN G, SMITH B, YORK J 2003. Amazon. com recommendations: item-to-item collaborative filtering[J]. IEEE Internet Computing, 2003, 7 (1): 76- 80
doi: 10.1109/MIC.2003.1167344
[2]   RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks: estimating the click-through rate for new ads [C]// Proceedings of the 16th International Conference on World Wide Web . Banff Alberta: Association for Computing Machinery, 2007: 521–530.
[3]   ZHANG W, QIN J, GUO W, et al. Deep learning for click-through rate estimation [C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence . [s. l.]: International Joint Conferences on Artificial Intelligence Organization, 2021: 4695–4703.
[4]   SEDHAIN S, KRISHN MENON A, SANNER S, et al. AutoRec: autoencoders meet collaborative filtering [C]// Proceedings of the 24th International Conference on World Wide Web . Florence: Association for Computing Machinery, 2015: 111–112.
[5]   SHAN Y, HOENS R, JIAO J, et al. Deep crossing: web-scale modeling without manually crafted combinatorial features [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . San Francisco California: Association for Computing Machinery, 2016: 255–262.
[6]   HE X, LIAO L, ZHANG H, et al. Neural collaborative filtering [C]// Proceedings of the 26th International Conference on World Wide Web . Perth: Republic and Canton of Geneva, 2017: 173–182.
[7]   QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2019, 37 (1): 1- 35
[8]   ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . London: Association for Computing Machinery, 2018: 1059–1068.
[9]   ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence . Honolulu: AAAI Press, 2019: 5941–5948.
[10]   LIN Q, XIE R, CHEN L, et al. Graph neural network for tag ranking in tag-enhanced video recommendation [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 2613–2620.
[11]   HE R, MCAULEY J. VBPR: visual Bayesian Personalized Ranking from implicit feedback [C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence . Phoenix Arizona: AAAI Press, 2016: 144–150.
[12]   CHEN J, ZHANG H, HE X, et al. Attentive collaborative filtering: multimedia recommendation with item- and component-level attention [C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . Shinjuku Tokyo: Association for Computing Machinery, 2017: 335–344.
[13]   FAN H, POOLE, M What is personalization? perspectives on the design and implementation of personalization in information systems[J]. Journal of Organizational Computing and Electronic Commerce, 2006, 16 (3/4): 179- 202
[14]   ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 Technical Report [R/OL]. (2023-03-15) [2023-12-24]. https://arxiv.org/abs/2303.08774.
[15]   RENDLE S, FREUENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback [C]// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence . Montreal Quebec: AUAI Press, 2009: 452–461.
[16]   WEI Y, WANG X, NIE L, et al. MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video [C]// Proceedings of the 27th ACM International Conference on Multimedia . Nice: Association for Computing Machinery, 2019: 1437–1445.
[17]   PU S, HE, Y, LI Z, et al. Multi-modal topic learning for video recommendation [EB/OL]. (2020-10-26) [2023-12-24]. https://arxiv.org/abs/2010.13373.
[18]   YANG M, LI S, PENG Z, et al Multi-head multi-modal deep interest recommendation network[J]. Knowledge-Based Systems, 2023, 276 (C): 110869
[19]   WEI W, HUANG C, XIA L, et al. Multi-modal self-supervised learning for recommendation [C]// Proceedings of the ACM Web Conference . Austin Texas: Association for Computing Machinery, 2023: 790−800.
[20]   SUN R, CAO X, ZHAO Y, et al. Multi-modal knowledge graphs for recommender systems [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 1405–1414.
[21]   HE L, CHEN H, WANG D, et al. Click-through rate prediction with multi-modal hypergraphs [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 690–699.
[22]   WEI Y, WANG X, NIE L, et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback [C]// Proceedings of the 28th ACM International Conference on Multimedia . Seattle Washington: Association for Computing Machinery, 2020: 3541–3549.
[23]   ZHAO W, MU S, HOU Y, et al. RecBole: towards a unified, comprehensive and efficient framework for recommendation algorithms [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 4653–4664.
[24]   RENDLE S, FREUDENTHALER C, SCHMIDT-THIEME L. Factorizing personalized Markov chains for next-basket recommendation [C]// Proceedings of the 19th International Conference on World Wide Web . Raleigh North Carolina: Association for Computing Machinery, 2010: 811–820.
[25]   WANG X, HE X, WANG M, et al. Neural graph collaborative filtering [C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval . Paris: Association for Computing Machinery, 2019: 165–174.
[26]   HE X, DENG K, WANG X, et al. LightGCN: simplifying and powering graph convolution network for recommendation [C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . [s. l.]: Association for Computing Machinery, 2020: 639–648.
[27]   SUN F, LIU J, WE J, et al. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management . Beijing: Association for Computing Machinery, 2019: 1441–1450.
[1] Su LI,Ze CHEN,Baoyan SONG,Haolin ZHANG. Enterprise composite blockchain construction method for business environment evaluation[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 891-899.
[2] Longsen HUANG,Jun FANG,Yunliang ZHOU,Zhicheng GUO. Optimization method of approximate aggregate query based on variational auto-encoder[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 931-940.
[3] Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.
[4] Yi-cong GAO,Yan-kun WANG,Shao-mei FEI,Qiong LIN. Intelligent proofreading method of engineering drawing based on transfer learning[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(5): 856-863, 889.
[5] Peng ZHANG,Zi-du TIAN,Hao WANG. Flight parameter data anomaly detection method based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 1967-1976.
[6] Fa-ming HUANG,Li-han PAN,Chi YAO,Chuang-bing ZHOU,Qing-hui JIANG,Zhi-lu CHANG. Landslide susceptibility prediction modelling based on semi-supervised machine learning[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(9): 1705-1713.
[7] Jia-hao REN,Hai-ou WANG,Jiang-kuan XING,Kun LUO,Jian-ren FAN. Lower-dimensional approximation models of tangential strain rate of turbulent flames[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1128-1134.
[8] You ZHAN,Qiang LI,Xiao-tian MA,Chen-ping WANG,Yan-jun QIU. Macro and micro texture based prediction of pavement surface friction[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(4): 684-694.
[9] Yong YU,Jing-yuan XUE,Sheng DAI,Qiang-wei BAO,Gang ZHAO. Quality prediction and process parameter optimization method for machining parts[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 441-447.
[10] Qiao-hong CHEN,YI CHEN,Wen-shu Li,Yu-bo JIA. Clothing image classification based on multi-scale SE-Xception[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1727-1735.
[11] Hui-fang WANG,Chen-yu ZHANG. Prediction of voltage stability margin in power system based on extreme gradient boosting algorithm[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 606-613.
[12] Le XIE,Xi-dan HENG,Yang LIU,Qi-long JIANG,Dong LIU. Transformer fault diagnosis based on linear discriminant analysis and step-by-step machine learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(11): 2266-2272.
[13] Zhi-yuan WAN,Jia-heng TAO,Jia-kun LIANG,Zhen-gong CAI,Cheng CHANG,Lin QIAO,Qiao-ni ZHOU. Large-scale empirical study on machine learning related questions on Stack Overflow[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(5): 819-828.
[14] Dong-xiang KE,Li-min PAN,Sen-lin LUO,Han-qing ZHANG. Android malicious behavior recognition and classification method based on random forest algorithm[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(10): 2013-2023.
[15] HU Li-sha, WANG Su-zhen, CHEN Yi-qiang, GAO Chen-long, HU Chun-yu, JIANG Xin-long, CHEN Zhen-yu, GAO Xing-yu. Fall detection algorithms based on wearable device: a review[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(9): 1717-1728.