|
|
Multi-modal information augmented model for micro-video recommendation |
Yufu HUO1( ),Beihong JIN1,2,*( ),Zhaoyi LIAO1 |
1. School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China 2. Institute of Software, Chinese Academy of Sciences, Beijing 100190, China |
|
|
Abstract A multi-modal augmented model for click through rate (MMa4CTR) tailored for micro-videos recommendation was proposed. Multi-modal data derived from user interactions with micro-videos were effectively leveraged to construct embedded user representations and capture diverse user interests across multi-modal. The aim was to reveal the latent semantic commonalities, by combining and crossing features across modalities. The overall recommendation performance was boosted via two training strategies, automatic learning rate adjustment and validation interruption. A computationally efficient multi-layer perceptron architecture was employed, in order to address the computational demands brought on by the vast amount of multi-modal data. Performance comparison experiments and sensitivity analyses of hyperparameter on WeChat Video Channel and TikTok datasets demonstrated that MMa4CTR outperformed baseline models, delivering superior recommendation results with minimal computational resources. Additionally, ablation studies performed on both datasets further validated the significance and efficacy of the micro-video modality cross module, the user multi-modal embedding layer, and the strategies for automatic learning rate adjustment and validation interruption in enhancing recommendation performance.
|
Received: 24 December 2023
Published: 25 May 2024
|
|
Fund: 国家自然科学基金资助项目(62072450). |
Corresponding Authors:
Beihong JIN
E-mail: huoyufu19@mails.ucas.ac.cn;beihong@iscas.ac.cn
|
多模态信息增强的短视频推荐模型
提出用于短视频点击率预估任务的多模态增强模型(MMa4CTR). 模型综合利用用户与短视频交互中的多模态数据,以构建用户的嵌入表示,并学习用户的多模态兴趣. 通过组合和交叉不同模态特征,探索各模态间的共同语义. 通过引入自动学习率调整和验证中断这2种训练策略,提升模型整体的推荐性能. 为了解决多模态数据量增加带来的计算挑战,采用计算效率较高的多层感知机. 在微信视频号和抖音短视频数据集上进行性能比较实验和超参数敏感性实验,结果显示MMa4CTR在保持较低计算开销的同时,实现了超越基线模型的卓越推荐效果. 通过在2个数据集上进行的消融实验,进一步证实了短视频模态交叉模块、用户多模态嵌入层以及自动学习率调整策略和验证中断策略在提升推荐性能方面的重要性和有效性.
关键词:
推荐系统,
点击率,
多模态,
短视频,
机器学习
|
|
[1] |
LINDEN G, SMITH B, YORK J 2003. Amazon. com recommendations: item-to-item collaborative filtering[J]. IEEE Internet Computing, 2003, 7 (1): 76- 80
doi: 10.1109/MIC.2003.1167344
|
|
|
[2] |
RICHARDSON M, DOMINOWSKA E, RAGNO R. Predicting clicks: estimating the click-through rate for new ads [C]// Proceedings of the 16th International Conference on World Wide Web . Banff Alberta: Association for Computing Machinery, 2007: 521–530.
|
|
|
[3] |
ZHANG W, QIN J, GUO W, et al. Deep learning for click-through rate estimation [C]// Proceedings of the 30th International Joint Conference on Artificial Intelligence . [s. l.]: International Joint Conferences on Artificial Intelligence Organization, 2021: 4695–4703.
|
|
|
[4] |
SEDHAIN S, KRISHN MENON A, SANNER S, et al. AutoRec: autoencoders meet collaborative filtering [C]// Proceedings of the 24th International Conference on World Wide Web . Florence: Association for Computing Machinery, 2015: 111–112.
|
|
|
[5] |
SHAN Y, HOENS R, JIAO J, et al. Deep crossing: web-scale modeling without manually crafted combinatorial features [C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . San Francisco California: Association for Computing Machinery, 2016: 255–262.
|
|
|
[6] |
HE X, LIAO L, ZHANG H, et al. Neural collaborative filtering [C]// Proceedings of the 26th International Conference on World Wide Web . Perth: Republic and Canton of Geneva, 2017: 173–182.
|
|
|
[7] |
QU Y, FANG B, ZHANG W, et al Product-based neural networks for user response prediction over multi-field categorical data[J]. ACM Transactions on Information Systems, 2019, 37 (1): 1- 35
|
|
|
[8] |
ZHOU G, ZHU X, SONG C, et al. Deep interest network for click-through rate prediction [C]// Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . London: Association for Computing Machinery, 2018: 1059–1068.
|
|
|
[9] |
ZHOU G, MOU N, FAN Y, et al. Deep interest evolution network for click-through rate prediction [C]// Proceedings of the 33rd AAAI Conference on Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conference and 9th AAAI Symposium on Educational Advances in Artificial Intelligence . Honolulu: AAAI Press, 2019: 5941–5948.
|
|
|
[10] |
LIN Q, XIE R, CHEN L, et al. Graph neural network for tag ranking in tag-enhanced video recommendation [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 2613–2620.
|
|
|
[11] |
HE R, MCAULEY J. VBPR: visual Bayesian Personalized Ranking from implicit feedback [C]// Proceedings of the 30th AAAI Conference on Artificial Intelligence . Phoenix Arizona: AAAI Press, 2016: 144–150.
|
|
|
[12] |
CHEN J, ZHANG H, HE X, et al. Attentive collaborative filtering: multimedia recommendation with item- and component-level attention [C]// Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval . Shinjuku Tokyo: Association for Computing Machinery, 2017: 335–344.
|
|
|
[13] |
FAN H, POOLE, M What is personalization? perspectives on the design and implementation of personalization in information systems[J]. Journal of Organizational Computing and Electronic Commerce, 2006, 16 (3/4): 179- 202
|
|
|
[14] |
ACHIAM J, ADLER S, AGARWAL S, et al. GPT-4 Technical Report [R/OL]. (2023-03-15) [2023-12-24]. https://arxiv.org/abs/2303.08774.
|
|
|
[15] |
RENDLE S, FREUENTHALER C, GANTNER Z, et al. BPR: Bayesian personalized ranking from implicit feedback [C]// Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence . Montreal Quebec: AUAI Press, 2009: 452–461.
|
|
|
[16] |
WEI Y, WANG X, NIE L, et al. MMGCN: multi-modal graph convolution network for personalized recommendation of micro-video [C]// Proceedings of the 27th ACM International Conference on Multimedia . Nice: Association for Computing Machinery, 2019: 1437–1445.
|
|
|
[17] |
PU S, HE, Y, LI Z, et al. Multi-modal topic learning for video recommendation [EB/OL]. (2020-10-26) [2023-12-24]. https://arxiv.org/abs/2010.13373.
|
|
|
[18] |
YANG M, LI S, PENG Z, et al Multi-head multi-modal deep interest recommendation network[J]. Knowledge-Based Systems, 2023, 276 (C): 110869
|
|
|
[19] |
WEI W, HUANG C, XIA L, et al. Multi-modal self-supervised learning for recommendation [C]// Proceedings of the ACM Web Conference . Austin Texas: Association for Computing Machinery, 2023: 790−800.
|
|
|
[20] |
SUN R, CAO X, ZHAO Y, et al. Multi-modal knowledge graphs for recommender systems [C]// Proceedings of the 29th ACM International Conference on Information and Knowledge Management . [s. l.]: Association for Computing Machinery, 2020: 1405–1414.
|
|
|
[21] |
HE L, CHEN H, WANG D, et al. Click-through rate prediction with multi-modal hypergraphs [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 690–699.
|
|
|
[22] |
WEI Y, WANG X, NIE L, et al. Graph-refined convolutional network for multimedia recommendation with implicit feedback [C]// Proceedings of the 28th ACM International Conference on Multimedia . Seattle Washington: Association for Computing Machinery, 2020: 3541–3549.
|
|
|
[23] |
ZHAO W, MU S, HOU Y, et al. RecBole: towards a unified, comprehensive and efficient framework for recommendation algorithms [C]// Proceedings of the 30th ACM International Conference on Information and Knowledge Management . Queensland: Association for Computing Machinery, 2021: 4653–4664.
|
|
|
[24] |
RENDLE S, FREUDENTHALER C, SCHMIDT-THIEME L. Factorizing personalized Markov chains for next-basket recommendation [C]// Proceedings of the 19th International Conference on World Wide Web . Raleigh North Carolina: Association for Computing Machinery, 2010: 811–820.
|
|
|
[25] |
WANG X, HE X, WANG M, et al. Neural graph collaborative filtering [C]// Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval . Paris: Association for Computing Machinery, 2019: 165–174.
|
|
|
[26] |
HE X, DENG K, WANG X, et al. LightGCN: simplifying and powering graph convolution network for recommendation [C]// Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval . [s. l.]: Association for Computing Machinery, 2020: 639–648.
|
|
|
[27] |
SUN F, LIU J, WE J, et al. BERT4Rec: sequential recommendation with bidirectional encoder representations from transformer [C]// Proceedings of the 28th ACM International Conference on Information and Knowledge Management . Beijing: Association for Computing Machinery, 2019: 1441–1450.
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|