Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (1): 61-70    DOI: 10.3785/j.issn.1008-973X.2024.01.007
计算机技术     
基于多模态融合的开放域三维模型检索算法
毛福新1(),杨旭1,程嘉强2,彭涛3
1. 天津职业技术师范大学 工程实训中心,天津 300222
2. 天津华大科技有限公司,天津 300131
3. 天津职业技术师范大学 汽车与交通学院,天津 300222
Open-set 3D model retrieval algorithm based on multi-modal fusion
Fuxin MAO1(),Xu YANG1,Jiaqiang CHENG2,Tao PENG3
1. Engineering Training Center, Tianjin University of Technology and Education, Tianjin 300222, China
2. Tianjin Huada Technology Limited Company, Tianjin 300131, China
3. College of Automobile and Transportation, Tianjin University of Technology and Education, Tianjin 300222, China
 全文: PDF(993 KB)   HTML
摘要:

为了满足开放域下海量新增模型数据的管理和检索需求,提出开放域三维模型检索算法,可以有效地利用多模态信息的语义一致性. 借助无监督算法探寻未知样本间的类别信息,利用该类别信息实现网络模型的参数优化,使得网络模型在开放域条件下具有更好的模型表征性能及检索结果. 提出基于Transformer结构的层级化多模态信息融合模型,有效地剔除了多模态间的冗余信息,得到鲁棒性更强的模型表征向量. 在数据集ModelNet40上进行实验,通过与其他典型算法的对比实验可知,所提方法在mAP指标上优于所有的对比方法,验证了该方法在检索性能提升上的有效性.

关键词: 机器视觉多模态融合开放域检索三维模型    
Abstract:

An open domain 3D model retrieval algorithm was proposed in order to meet the requirement of management and retrieval of massive new model data under the open domain. The semantic consistency of multi-modal information can be effectively used. The category information among unknown samples was explored with the help of unsupervised algorithm. Then the unknown class information was introduced into the parameter optimization process of the network model. The network model has better characterization and retrieval performance in the open domain condition. A hierarchical multi-modal information fusion model based on a Transformer structure was proposed, which could effectively remove the redundant information among the modalities and obtain a more robust model representation vector. Experiments were conducted on the dataset ModelNet40, and the experiments were compared with other typical algorithms. The proposed method outperformed all comparative methods in terms of mAP metrics, which verified the effectiveness of the method in terms of retrieval performance improvement.

Key words: machine vision    multi-modal fusion    open set retrieval    3D model
收稿日期: 2022-11-22 出版日期: 2023-11-07
CLC:  TP 37  
基金资助: 天津市多元投入基金重点资助项目(21JCZDJC00700)
作者简介: 毛福新(1987—),男,讲师,从事单片机实训教学、人工智能的研究. orcid.org/0009-0009-9894-3149. E-mail: 398341548@qq.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
毛福新
杨旭
程嘉强
彭涛

引用本文:

毛福新,杨旭,程嘉强,彭涛. 基于多模态融合的开放域三维模型检索算法[J]. 浙江大学学报(工学版), 2024, 58(1): 61-70.

Fuxin MAO,Xu YANG,Jiaqiang CHENG,Tao PENG. Open-set 3D model retrieval algorithm based on multi-modal fusion. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 61-70.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.01.007        https://www.zjujournals.com/eng/CN/Y2024/V58/I1/61

图 1  基于多模态融合的开放域三维模型检索算法的原理图
图 2  单模态特征提取网络
图 3  多模态特征融合网络
图 4  ModelNet40数据集的三维模型数据
方法 数据模态 mAP/% NN/% NDCG/% ANMRR/%
3D ShapeNets[24] 体素 71.41 93.05 84.83 30.38
MeshNet[3] 网格 82.15 96.55 87.52 27.20
MVCNN[1] 多视图 83.86 95.97 88.75 26.55
GVCNN[25] 多视图 84.94 97.01 88.63 25.83
SeqViews2SeqLabels[26] 多视图 83.55 97.47 86.52 26.45
VoxNet[21] 点云 76.86 95.32 85.12 32.55
PointNet[2] 点云 81.72 94.55 85.56 29.86
PointNet++[27] 点云 82.10 95.71 86.57 28.54
PointCNN[28] 点云 83.33 96.29 87.28 26.75
LDGCNN[29] 点云 83.98 96.15 88.92 26.25
MSIF[18] 多模态 85.12 96.81 88.79 27.37
HPFN[19] 多模态 85.45 97.03 89.24 26.72
SSFT[30] 多模态 85.89 97.44 89.83 26.63
本文方法 体素 77.25 93.85 85.01 29.81
本文方法 网格 82.86 96.79 87.61 27.46
本文方法 多视图 85.07 97.32 89.37 26.38
本文方法 点云 84.19 96.85 89.16 26.49
本文方法 多模态 86.23 97.82 90.13 26.17
表 1  各类算法在未知类数据集的检索性能
序号 编码器 解码器 无监督学习 mAP/%
1 68.83
2 71.49
3 72.68
4 82.50
5 79.17
6 83.92
7 84.37
8 86.23
表 2  提出算法在不同网络结构下的检索性能
网络层数 mAP/% NN/% NDCG/% ANMRR/%
2 84.57 95.32 86.84 27.93
3 84.88 96.35 87.43 27.07
4 84.31 97.22 86.40 26.11
5 85.57 97.38 88.91 25.79
6 85.52 97.55 89.37 25.96
8 85.49 97.65 89.32 25.85
10 86.23 97.82 90.13 26.17
表 3  提出算法在不同编码器层数下的检索性能
算法 mAP/% NN/% NDCG/% ANMRR/%
K-means[31] 86.23 97.82 90.13 26.17
层级式聚类[32] 84.56 96.97 87.55 25.47
DBSCAN[33] 85.37 97.33 88.95 26.73
Canopy[34] 82.53 94.45 83.03 35.78
GMM[35] 83.42 95.98 86.27 28.70
表 4  提出算法在不同无监督算法下的检索性能
序号 图像 点云 网格 体素 mAP/%
1 85.07
2 84.19
3 82.86
4 77.25
5 85.49
6 84.78
7 86.23
表 5  提出算法在不同模态输入下的检索性能
融合序号 mAP/%
图像 点云 网格 体素
1 2 3 4 85.46
2 3 4 1 85.79
3 4 1 2 86.10
4 1 2 3 85.77
2 4 1 3 85.14
1 4 2 3 86.23
表 6  提出算法在不同融合顺序时的检索性能
图 5  t-SNE特征分布图
图 6  输入样本及相应检索结果中的前十项
1 SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 945- 953.
2 QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 652-660.
3 FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019, 33(1): 8279-8286.
4 KLOKOV R, LEMPITSKY V. Escape from cells: deep kd-networks for the recognition of 3d point cloud models [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 863- 872.
5 HAN Z, LU H, LIU Z, et al 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation[J]. IEEE Transactions on Image Processing, 2019, 28 (8): 3986- 3999
6 LI B, LU Y, LI C, et al A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries[J]. Computer Vision and Image Understanding, 2015, 131: 1- 27
doi: 10.1016/j.cviu.2014.10.006
7 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5998--6008.
8 DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations. Vienna: [s. n. ], 2021.
9 FENG Y, GAO Y, ZHAO X, et al SHREC’22 track: open-set 3D object retrieval[J]. Computers and Graphics, 2022, 107: 231- 240
doi: 10.1016/j.cag.2022.07.020
10 OSADA R, FUNKHOUSER T, CHAZELLE B, et al Shape distributions[J]. ACM Transactions on Graphics, 2002, 21 (4): 807- 832
doi: 10.1145/571647.571648
11 TABIA H, DAOUDI M, VANDEBORRE J P, et al A new 3D-matching method of nonrigid and partially similar models using curve analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33 (4): 852- 858
12 AVETISYAN A, DAI A, NIEßNER M. End-to-end cad model retrieval and 9dof alignment in 3d scans [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2551-2560.
13 SARKAR K, HAMPIHOLI B, VARANASI K, et al. Learning 3d shapes as multi-layered height-maps using 2d convolutional networks [C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 71-86.
14 YANG Z, WANG L. Learning relationships for multi-view 3D object recognition [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7505-7514.
15 HUANG Q, WANG Y, YIN Z View-based weight network for 3D object recognition[J]. Image and Vision Computing, 2020, 93: 103828
doi: 10.1016/j.imavis.2019.11.006
16 SFIKAS K, PRATIKAKIS I, THEOHARIS T Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval[J]. Computers and Graphics, 2018, 71: 208- 218
doi: 10.1016/j.cag.2017.12.001
17 PÉREZ-RÚA J M, VIELZEUF V, PATEUX S, et al. MFAS: multimodal fusion architecture search [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6966-6975.
18 ZHANG Q, LIU Y, BLUM R S, et al Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review[J]. Information Fusion, 2018, 40: 57- 75
doi: 10.1016/j.inffus.2017.05.006
19 HOU M, TANG J, ZHANG J, et al. Deep multimodal multilinear fusion with high-order polynomial pooling [C]// Advances in Neural Information Processing Systems. Vancouver: [s. n.], 2019.
20 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// 3rd International Conference on Learning Representations. San Diego: IEEE, 2015.
21 MATURANA D, SCHERER S. Voxnet: a 3d convolutional neural network for real-time object recognition [C]// 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2015: 922-928.
22 FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8279-8286.
23 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: [s. n. ], 2019: 4171—4186.
24 WU Z, SONG S, KHOSLA A, et al. 3d shapenets: a deep representation for volumetric shapes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1912-1920.
25 FENG Y, ZHANG Z, ZHAO X, et al. Gvcnn: group-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 264-272.
26 HAN Z, SHANG M, LIU Z, et al SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention[J]. IEEE Transactions on Image Processing, 2018, 28 (2): 658- 672
27 QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5099-5108.
28 LI Y, BU R, SUN M, et al. Pointcnn: convolution on x- transformed points [C]// Advances in Neural Information Processing Systems. Montreal: [s. n. ], 2018: 828-838.
29 ZHANG K, HAO M, WANG J, et al. Linked dynamic graph CNN: learning on point cloud via linking hierarchical features [EB/OL]. [2022-11-08]. https://arxiv.org/abs/1904.10014.
30 LU Y, WU Y, LIU B, et al. Cross-modality person reidentification with shared-specific feature transfer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13379-13389.
31 KRISHNA K, MURTY M N Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29 (3): 433
doi: 10.1109/3477.764879
32 MURTAGH F, CONTRERAS P Algorithms for hierarchical clustering: an overview[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2012, 2 (1): 86- 97
doi: 10.1002/widm.53
33 KHAN K, REHMAN S U, AZIZ K, et al. DBSCAN: past, present and future [C]// 5th International Conference on the Applications of Digital Information and Web Technologies. New York: IEEE, 2014: 232-238.
34 GHULI P, SHUKLA A, KIRAN R, et al Multidimensional canopy clustering on iterative MapReduce framework using Elefig tool[J]. IETE Journal of Research, 2015, 61 (1): 14- 21
doi: 10.1080/03772063.2014.988760
35 LU Y, TIAN Z, PENG P, et al GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system[J]. Energy and Buildings, 2019, 190: 49- 60
doi: 10.1016/j.enbuild.2019.02.014
36 VAN DER MAATEN L, HINTON G Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605
37 WATTENBERG M, VIÉGAS F, JOHNSON I How to use t-SNE effectively[J]. Distill, 2016, 1 (10): e2
[1] 秦思怡,盖绍彦,达飞鹏. 混合采样下多级特征聚合的视频目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(1): 10-19.
[2] 苏育挺,陆荣烜,张为. 基于注意力和自适应权重的车辆重识别算法[J]. 浙江大学学报(工学版), 2023, 57(4): 712-718.
[3] 赵卿,张雪英,陈桂军,张静. 基于模态注意力图卷积特征融合的EEG和fNIRS情感识别[J]. 浙江大学学报(工学版), 2023, 57(10): 1987-1997.
[4] 华夏,王新晴,芮挺,邵发明,王东. 视觉感知的无人机端到端目标跟踪控制技术[J]. 浙江大学学报(工学版), 2022, 56(7): 1464-1472.
[5] 杨军,李金泰,高志明. 无监督的三维模型簇对应关系协同计算[J]. 浙江大学学报(工学版), 2022, 56(10): 1935-1947.
[6] 程训,余建波. 基于机器视觉的加工刀具磨损监测方法[J]. 浙江大学学报(工学版), 2021, 55(5): 896-904.
[7] 陈志刚,万永菁,王于蓝,蒋翠玲,陈霞. 基于异构低秩多模态融合网络的后囊膜混浊预测[J]. 浙江大学学报(工学版), 2021, 55(11): 2045-2053.
[8] 杜军,马琛,魏正英. 基于视觉传感的铝合金电弧增材沉积层形貌动态响应[J]. 浙江大学学报(工学版), 2020, 54(8): 1481-1489.
[9] 康庄,杨杰,郭濠奇. 基于机器视觉的垃圾自动分类系统设计[J]. 浙江大学学报(工学版), 2020, 54(7): 1272-1280.
[10] 李瑛,成芳,赵志林. 采用结构光的大跨度销孔加工精度在线测量[J]. 浙江大学学报(工学版), 2020, 54(3): 557-565.
[11] 蒋卓华, 蒋焕煜, 童俊华. 穴盘苗自动移栽机末端执行器的优化设计[J]. 浙江大学学报(工学版), 2017, 51(6): 1119-1125.
[12] 沈腾, 王炅, 黄刘. 离心环境下毛细被动阀的理论与实验[J]. 浙江大学学报(工学版), 2016, 50(8): 1578-1584.
[13] 刘亚男,倪鹤鹏,张承瑞,王云飞,孙好春. 基于PC的运动视觉一体化开放控制平台设计[J]. 浙江大学学报(工学版), 2016, 50(7): 1381-1386.
[14] 王志, 朱世强, 卜琰, 郭振民. 改进导向滤波器立体匹配算法[J]. 浙江大学学报(工学版), 2016, 50(12): 2262-2269.
[15] 叶肖伟,张小明,倪一清,黄启远,樊可清. 基于机器视觉技术的桥梁挠度测试方法[J]. 浙江大学学报(工学版), 2014, 48(5): 813-819.