Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (6): 1142-1152    DOI: 10.3785/j.issn.1008-973X.2024.06.005
计算机技术     
多模态信息增强的短视频推荐模型
霍育福1(),金蓓弘1,2,*(),廖肇翊1
1. 中国科学院大学 计算机科学与技术学院,北京 100049
2. 中国科学院软件研究所,北京 100190
Multi-modal information augmented model for micro-video recommendation
Yufu HUO1(),Beihong JIN1,2,*(),Zhaoyi LIAO1
1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China
2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China
 全文: PDF(906 KB)   HTML
摘要:

提出用于短视频点击率预估任务的多模态增强模型(MMa4CTR). 模型综合利用用户与短视频交互中的多模态数据,以构建用户的嵌入表示,并学习用户的多模态兴趣. 通过组合和交叉不同模态特征,探索各模态间的共同语义. 通过引入自动学习率调整和验证中断这2种训练策略,提升模型整体的推荐性能. 为了解决多模态数据量增加带来的计算挑战,采用计算效率较高的多层感知机. 在微信视频号和抖音短视频数据集上进行性能比较实验和超参数敏感性实验,结果显示MMa4CTR在保持较低计算开销的同时,实现了超越基线模型的卓越推荐效果. 通过在2个数据集上进行的消融实验,进一步证实了短视频模态交叉模块、用户多模态嵌入层以及自动学习率调整策略和验证中断策略在提升推荐性能方面的重要性和有效性.

关键词: 推荐系统点击率多模态短视频机器学习    
Abstract:

A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.

Key words: recommender system    click through rate    multi modal    micro-video    machine learning
收稿日期: 2023-12-24 出版日期: 2024-05-25
CLC:  TP 393  
基金资助: 国家自然科学基金资助项目(62072450).
通讯作者: 金蓓弘     E-mail: huoyufu19@mails.ucas.ac.cn;beihong@iscas.ac.cn
作者简介: 霍育福(2000—),男,硕士生,从事短视频推荐系统以及生物信息技术研究. orcid.org/0009-0001-7409-1588. E-mail:huoyufu19@mails.ucas.ac.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
霍育福
金蓓弘
廖肇翊

引用本文:

霍育福,金蓓弘,廖肇翊. 多模态信息增强的短视频推荐模型[J]. 浙江大学学报(工学版), 2024, 58(6): 1142-1152.

Yufu HUO,Beihong JIN,Zhaoyi LIAO. Multi-modal information augmented model for micro-video recommendation. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1142-1152.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.06.005        https://www.zjujournals.com/eng/CN/Y2024/V58/I6/1142

图 1  验证中断机制的流程图
模型AUC@10AUC@20AUC@30AUC@50
BPR0.51130.59810.63860.6716
FPMC0.52040.63380.67610.7038
NCGF0.61120.61590.63100.6612
LightGCN0.50680.56630.61100.6744
BERT4Rec0.74780.75660.77160.7714
GCSAN0.74450.75080.75340.7834
DIN0.832 40.843 60.849 80.849 4
DIEN0.79720.79870.80280.8027
MMa4CTR0.95270.94210.94060.9468
I/%14.4511.6810.6811.47
表 1  微信数据集上的推荐性能对比
模型AUC@10AUC@20AUC@30AUC@50
BPR$ 0.380\;5 $$ 0.380\;5 $$ 0.382\;8 $$ 0.389\;7 $
FPMC$ 0.401\;5 $$ 0.407\;0 $$ 0.413\;3 $$ 0.427\;3 $
NCGF$ 0.440\;1 $$ 0.438\;2 $$ 0.447\;5 $$ 0.455\;6 $
LightGCN$ 0.386\;3 $$ 0.386\;2 $$ 0.386\;2 $$ 0.386\;2 $
BERT4Rec0.584 40.583 40.629 50.535 9
GCSAN0.586 90.482 90.455 00.473 6
DIN$ 0.508\;7 $$ 0.546\;9 $$ 0.548\;0 $$ 0.561\;9 $
DIEN0.578 80.578 70.578 70.578 7
MMa4CTR$ {\boldsymbol{0.889\;5}} $$ {\boldsymbol{0.890\;2}} $$ {\boldsymbol{0.891\;0}} $$ {\boldsymbol{0.888\;4}} $
I/%$ 51.55$$ 52.58 $$ 41.54 $$ 53.51 $
表 2  抖音数据集上的推荐性能对比
模型Tt/sTi/sN
BPR 54.77 45 7 460 224
FPMC 62.43 48 19 820 544
NCGF 126.54 109 7 485 184
LightGCN 91.54 79 7 460 224
BERT4Rec 471.41 317 6 915 904
GCSAN 4756.09 4385 6 908 032
DIN 170.44 108 1 449 310
DIEN 643.73 243 1 587 847
MMa4CTR 180.78 17 29 225
表 3  微信数据集上的计算性能对比
模型$ {T}_{\mathrm{t}}/{\mathrm{s}} $$ {T}_{\mathrm{i}}/{\mathrm{s}} $$ N $
BPR 619.19 601 195 060 416
FPMC 549.91 399 584 966 336
NCGF 1332.28 799 195 085 376
LightGCN 1929.59 1908 195 060 416
BERT4Rec 71639.71 5060 195 056 384
GCSAN 44006.24 37142 195 048 512
DIN 643.46 574 30 761 840
DIEN 2315.56 1464 30 900 377
MMa4CTR 163.80 23 26 025
表 4  抖音数据集上的计算性能对比实验
组合WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉+声音$ 0.944\;8 $$ 0.949\;7 $$ 0.888\;6 $$ 0.888\;0 $
视觉+文本$ 0.952\;0 $$ 0.951\;7 $$ 0.888\;9 $$ 0.888\;0 $
声音+文本$ 0.951\;3 $$ 0.946\;9 $$ 0.888\;6 $$ 0.890\;7 $
表 5  多模态信息两两组合的推荐性能
组合WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉+声音$ 0.945\;9 $$ 0.943\;9 $$ 0.891\;5 $$ 0.892\;1 $
视觉+文本$ 0.951\;6 $$ 0.950\;3 $$ 0.889\;7 $$ 0.891\;2 $
声音+文本$ 0.951\;6 $$ 0.952\;5 $$ 0.893\;8 $$ 0.893\;2 $
表 6  多模态信息两两交叉的推荐性能
模态WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
视觉$ 0.947\;2 $$ 0.946\;3 $$ 0.822\;1 $$ 0.887\;3 $
声音$ 0.944\;2 $$ 0.942\;9 $$ 0.886\;8 $$ 0.887\;3 $
文本$ 0.886\;2 $$ 0.893\;5 $$ 0.894\;2 $$ 0.893\;3 $
平均$ 0.925\;9 $$ 0.927\;6 $$ 0.867\;7 $$ 0.889\;3 $
表 7  单一模态的推荐性能
lWeChatTikTok
AUC@10AUC@30AUC@10AUC@30
$ 0 $$ 0.616\;2 $$ 0.621\;3 $$ 0.576\;1 $$ 0.578\;3 $
$ 5 $$ 0.949\;9 $$ 0.951\;2 $$ 0.891\;0 $$ 0.890\;2 $
$ 11 $$ 0.951\;1 $$ 0.941\;8 $$ 0.889\;5 $$ 0.888\;4 $
$ 15 $$ 0.945\;3 $$ 0.948\;4 $$ 0.890\;0 $$ 0.888\;0 $
$ 21 $$ 0.952\;7 $$ 0.940\;6 $$ 0.889\;5 $$ 0.891\;0 $
$ 30 $$ 0.947\;8 $$ 0.945\;6 $$ 0.889\;1 $$ 0.887\;5 $
表 8  不同用户多模态嵌入长度的推荐性能
图 2  微信数据集上训练轮次对推荐性能的影响
图 3  抖音数据集上训练轮次对推荐性能的影响
学习率WeChatTikTok
AUC@10AUC@30AUC@10AUC@30
0.01000 0.5000 0.7433 0.7025 0.5000
0.00500 0.5003 0.5000 0.8518 0.5000
0.00200 0.9178 0.9329 0.8938 0.8919
0.00100 0.9521 0.9406 0.8899 0.8843
0.00050 0.9479 0.9435 0.8871 0.8888
0.00020 0.9470 0.9516 0.8898 0.8867
0.00010 0.9427 0.9485 0.8909 0.8908
0.00005 0.9473 0.9453 0.8893 0.8924
0.00002 0.9494 0.9426 0.8902 0.8892
0.00001 0.9443 0.9489 0.8926 0.8890
表 9  学习率对推荐性能的影响
图 4  微信数据集上批处理大小对推荐性能的影响
图 5  抖音数据集上批处理大小对推荐性能的影响
1 LINDEN G, SMITH B, YORK J 2003. Amazon. com recommendations: item-to-item collaborative filtering[J]. IEEE Internet Computing, 2003, 7 (1): 76- 80
doi: 10.1109/MIC.2003.1167344
2 RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks: estimating the click-through rate for new ads [C]// Proceedings of the 16th International Conference on World Wide Web . Banff Alberta: Association for Computing Machinery, 2007: 521–530.
3 ZHANG W, QIN J, GUO W, et al. Deep learning for click-through rate estimation [C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence . [s. l.]: International Joint Conferences on Artificial Intelligence Organization, 2021: 4695–4703.
4 SEDHAIN S, KRISHN MENON A, SANNER S, et al. AutoRec: autoencoders meet collaborative filtering [C]// Proceedings of the 24th International Conference on World Wide Web . Florence: Association for Computing Machinery, 2015: 111–112.
5 SHAN Y, HOENS R, JIAO J, et al. Deep crossing: web-scale modeling without manually crafted combinatorial features [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . San Francisco California: Association for Computing Machinery, 2016: 255–262.
6 HE X, LIAO L, ZHANG H, et al. Neural collaborative filtering [C]// Proceedings of the 26th International Conference on World Wide Web . Perth: Republic and Canton of Geneva, 2017: 173–182.
7 QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2019, 37 (1): 1- 35
8 ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . London: Association for Computing Machinery, 2018: 1059–1068.
9 ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence . Honolulu: AAAI Press, 2019: 5941–5948.
10 LIN Q, XIE R, CHEN L, et al. Graph neural network for tag ranking in tag-enhanced video recommendation [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 2613–2620.
11 HE R, MCAULEY J. VBPR: visual Bayesian Personalized Ranking from implicit feedback [C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence . Phoenix Arizona: AAAI Press, 2016: 144–150.
12 CHEN J, ZHANG H, HE X, et al. Attentive collaborative filtering: multimedia recommendation with item- and component-level attention [C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . Shinjuku Tokyo: Association for Computing Machinery, 2017: 335–344.
13 FAN H, POOLE, M What is personalization? perspectives on the design and implementation of personalization in information systems[J]. Journal of Organizational Computing and Electronic Commerce, 2006, 16 (3/4): 179- 202
14 ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 Technical Report [R/OL]. (2023-03-15) [2023-12-24]. https://arxiv.org/abs/2303.08774.
15 RENDLE S, FREUENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback [C]// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence . Montreal Quebec: AUAI Press, 2009: 452–461.
16 WEI Y, WANG X, NIE L, et al. MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video [C]// Proceedings of the 27th ACM International Conference on Multimedia . Nice: Association for Computing Machinery, 2019: 1437–1445.
17 PU S, HE, Y, LI Z, et al. Multi-modal topic learning for video recommendation [EB/OL]. (2020-10-26) [2023-12-24]. https://arxiv.org/abs/2010.13373.
18 YANG M, LI S, PENG Z, et al Multi-head multi-modal deep interest recommendation network[J]. Knowledge-Based Systems, 2023, 276 (C): 110869
19 WEI W, HUANG C, XIA L, et al. Multi-modal self-supervised learning for recommendation [C]// Proceedings of the ACM Web Conference . Austin Texas: Association for Computing Machinery, 2023: 790−800.
20 SUN R, CAO X, ZHAO Y, et al. Multi-modal knowledge graphs for recommender systems [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 1405–1414.
21 HE L, CHEN H, WANG D, et al. Click-through rate prediction with multi-modal hypergraphs [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 690–699.
22 WEI Y, WANG X, NIE L, et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback [C]// Proceedings of the 28th ACM International Conference on Multimedia . Seattle Washington: Association for Computing Machinery, 2020: 3541–3549.
23 ZHAO W, MU S, HOU Y, et al. RecBole: towards a unified, comprehensive and efficient framework for recommendation algorithms [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 4653–4664.
24 RENDLE S, FREUDENTHALER C, SCHMIDT-THIEME L. Factorizing personalized Markov chains for next-basket recommendation [C]// Proceedings of the 19th International Conference on World Wide Web . Raleigh North Carolina: Association for Computing Machinery, 2010: 811–820.
25 WANG X, HE X, WANG M, et al. Neural graph collaborative filtering [C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval . Paris: Association for Computing Machinery, 2019: 165–174.
26 HE X, DENG K, WANG X, et al. LightGCN: simplifying and powering graph convolution network for recommendation [C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . [s. l.]: Association for Computing Machinery, 2020: 639–648.
27 SUN F, LIU J, WE J, et al. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management . Beijing: Association for Computing Machinery, 2019: 1441–1450.
[1] 李素,陈泽,宋宝燕,张浩林. 营商环境评估的企业级复合区块链构建方法[J]. 浙江大学学报(工学版), 2024, 58(5): 891-899.
[2] 黄龙森,房俊,周云亮,郭志城. 基于变分自编码器的近似聚合查询优化方法[J]. 浙江大学学报(工学版), 2024, 58(5): 931-940.
[3] 毛福新,杨旭,程嘉强,彭涛. 基于多模态融合的开放域三维模型检索算法[J]. 浙江大学学报(工学版), 2024, 58(1): 61-70.
[4] 王慧欣,童向荣. 融合知识图谱的推荐系统研究进展[J]. 浙江大学学报(工学版), 2023, 57(8): 1527-1540.
[5] 杨哲,葛洪伟,李婷. 特征融合与分发的多专家并行推荐算法框架[J]. 浙江大学学报(工学版), 2023, 57(7): 1317-1325.
[6] 张京京,张兆功,许鑫. 融合图增强和采样策略的图卷积协同过滤模型[J]. 浙江大学学报(工学版), 2023, 57(2): 243-251.
[7] 杨帆,宁博,李怀清,周新,李冠宇. 基于语义增强特征融合的多模态图像检索模型[J]. 浙江大学学报(工学版), 2023, 57(2): 252-258.
[8] 陈巧红,孙佳锦,漏杨波,方志坚. 基于多任务学习与层叠 Transformer 的多模态情感分析模型[J]. 浙江大学学报(工学版), 2023, 57(12): 2421-2429.
[9] 赵卿,张雪英,陈桂军,张静. 基于模态注意力图卷积特征融合的EEG和fNIRS情感识别[J]. 浙江大学学报(工学版), 2023, 57(10): 1987-1997.
[10] 高一聪,王彦坤,费少梅,林琼. 基于迁移学习的机械制图智能评阅方法[J]. 浙江大学学报(工学版), 2022, 56(5): 856-863, 889.
[11] 陈巧红,裴皓磊,孙麒. 基于视觉关系推理与上下文门控机制的图像描述[J]. 浙江大学学报(工学版), 2022, 56(3): 542-549.
[12] 张鹏,田子都,王浩. 基于改进生成对抗网络的飞参数据异常检测方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1967-1976.
[13] 黄发明,潘李含,姚池,周创兵,姜清辉,常志璐. 基于半监督机器学习的滑坡易发性预测建模[J]. 浙江大学学报(工学版), 2021, 55(9): 1705-1713.
[14] 任嘉豪,王海鸥,邢江宽,罗坤,樊建人. 湍流火焰切向应变率的低维近似模型[J]. 浙江大学学报(工学版), 2021, 55(6): 1128-1134.
[15] 战友,李强,马啸天,王郴平,邱延峻. 基于宏微观纹理特征融合的路面摩擦性能预测[J]. 浙江大学学报(工学版), 2021, 55(4): 684-694.