Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (1): 61-70    DOI: 10.3785/j.issn.1008-973X.2024.01.007
    
Open-set 3D model retrieval algorithm based on multi-modal fusion
Fuxin MAO1(),Xu YANG1,Jiaqiang CHENG2,Tao PENG3
1. Engineering Training Center, Tianjin University of Technology and Education, Tianjin 300222, China
2. Tianjin Huada Technology Limited Company, Tianjin 300131, China
3. College of Automobile and Transportation, Tianjin University of Technology and Education, Tianjin 300222, China
Download: HTML     PDF(993KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An open domain 3D model retrieval algorithm was proposed in order to meet the requirement of management and retrieval of massive new model data under the open domain. The semantic consistency of multi-modal information can be effectively used. The category information among unknown samples was explored with the help of unsupervised algorithm. Then the unknown class information was introduced into the parameter optimization process of the network model. The network model has better characterization and retrieval performance in the open domain condition. A hierarchical multi-modal information fusion model based on a Transformer structure was proposed, which could effectively remove the redundant information among the modalities and obtain a more robust model representation vector. Experiments were conducted on the dataset ModelNet40, and the experiments were compared with other typical algorithms. The proposed method outperformed all comparative methods in terms of mAP metrics, which verified the effectiveness of the method in terms of retrieval performance improvement.



Key wordsmachine vision      multi-modal fusion      open set retrieval      3D model     
Received: 22 November 2022      Published: 07 November 2023
CLC:  TP 37  
Fund:  天津市多元投入基金重点资助项目(21JCZDJC00700)
Cite this article:

Fuxin MAO,Xu YANG,Jiaqiang CHENG,Tao PENG. Open-set 3D model retrieval algorithm based on multi-modal fusion. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 61-70.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.01.007     OR     https://www.zjujournals.com/eng/Y2024/V58/I1/61


基于多模态融合的开放域三维模型检索算法

为了满足开放域下海量新增模型数据的管理和检索需求,提出开放域三维模型检索算法,可以有效地利用多模态信息的语义一致性. 借助无监督算法探寻未知样本间的类别信息,利用该类别信息实现网络模型的参数优化,使得网络模型在开放域条件下具有更好的模型表征性能及检索结果. 提出基于Transformer结构的层级化多模态信息融合模型,有效地剔除了多模态间的冗余信息,得到鲁棒性更强的模型表征向量. 在数据集ModelNet40上进行实验,通过与其他典型算法的对比实验可知,所提方法在mAP指标上优于所有的对比方法,验证了该方法在检索性能提升上的有效性.


关键词: 机器视觉,  多模态融合,  开放域检索,  三维模型 
Fig.1 Schematic diagram of open domain 3D model retrieval algorithm based on multi-modal fusion
Fig.2 Single mode feature extraction network
Fig.3 Multi-modal feature fusion network
Fig.4 Three-dimensional model data of ModelNet40 datasets
方法 数据模态 mAP/% NN/% NDCG/% ANMRR/%
3D ShapeNets[24] 体素 71.41 93.05 84.83 30.38
MeshNet[3] 网格 82.15 96.55 87.52 27.20
MVCNN[1] 多视图 83.86 95.97 88.75 26.55
GVCNN[25] 多视图 84.94 97.01 88.63 25.83
SeqViews2SeqLabels[26] 多视图 83.55 97.47 86.52 26.45
VoxNet[21] 点云 76.86 95.32 85.12 32.55
PointNet[2] 点云 81.72 94.55 85.56 29.86
PointNet++[27] 点云 82.10 95.71 86.57 28.54
PointCNN[28] 点云 83.33 96.29 87.28 26.75
LDGCNN[29] 点云 83.98 96.15 88.92 26.25
MSIF[18] 多模态 85.12 96.81 88.79 27.37
HPFN[19] 多模态 85.45 97.03 89.24 26.72
SSFT[30] 多模态 85.89 97.44 89.83 26.63
本文方法 体素 77.25 93.85 85.01 29.81
本文方法 网格 82.86 96.79 87.61 27.46
本文方法 多视图 85.07 97.32 89.37 26.38
本文方法 点云 84.19 96.85 89.16 26.49
本文方法 多模态 86.23 97.82 90.13 26.17
Tab.1 Retrieval performance of various algorithms in unknown class data sets
序号 编码器 解码器 无监督学习 mAP/%
1 68.83
2 71.49
3 72.68
4 82.50
5 79.17
6 83.92
7 84.37
8 86.23
Tab.2 Retrieval performance of proposed algorithm in different network structures
网络层数 mAP/% NN/% NDCG/% ANMRR/%
2 84.57 95.32 86.84 27.93
3 84.88 96.35 87.43 27.07
4 84.31 97.22 86.40 26.11
5 85.57 97.38 88.91 25.79
6 85.52 97.55 89.37 25.96
8 85.49 97.65 89.32 25.85
10 86.23 97.82 90.13 26.17
Tab.3 Retrieval performance of proposed algorithm under different encoder layers
算法 mAP/% NN/% NDCG/% ANMRR/%
K-means[31] 86.23 97.82 90.13 26.17
层级式聚类[32] 84.56 96.97 87.55 25.47
DBSCAN[33] 85.37 97.33 88.95 26.73
Canopy[34] 82.53 94.45 83.03 35.78
GMM[35] 83.42 95.98 86.27 28.70
Tab.4 Retrieval performance of proposed algorithm under different unsupervised algorithms
序号 图像 点云 网格 体素 mAP/%
1 85.07
2 84.19
3 82.86
4 77.25
5 85.49
6 84.78
7 86.23
Tab.5 Retrieval performance of proposed algorithm under different input modalities
融合序号 mAP/%
图像 点云 网格 体素
1 2 3 4 85.46
2 3 4 1 85.79
3 4 1 2 86.10
4 1 2 3 85.77
2 4 1 3 85.14
1 4 2 3 86.23
Tab.6 Retrieval performance of proposed algorithm in different fusion sequences
Fig.5 Feature distribution visualized by t-SNE
Fig.6 Input models and corresponding Top10 ranked results
[1]   SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 945- 953.
[2]   QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 652-660.
[3]   FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019, 33(1): 8279-8286.
[4]   KLOKOV R, LEMPITSKY V. Escape from cells: deep kd-networks for the recognition of 3d point cloud models [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 863- 872.
[5]   HAN Z, LU H, LIU Z, et al 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation[J]. IEEE Transactions on Image Processing, 2019, 28 (8): 3986- 3999
[6]   LI B, LU Y, LI C, et al A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries[J]. Computer Vision and Image Understanding, 2015, 131: 1- 27
doi: 10.1016/j.cviu.2014.10.006
[7]   VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5998--6008.
[8]   DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations. Vienna: [s. n. ], 2021.
[9]   FENG Y, GAO Y, ZHAO X, et al SHREC’22 track: open-set 3D object retrieval[J]. Computers and Graphics, 2022, 107: 231- 240
doi: 10.1016/j.cag.2022.07.020
[10]   OSADA R, FUNKHOUSER T, CHAZELLE B, et al Shape distributions[J]. ACM Transactions on Graphics, 2002, 21 (4): 807- 832
doi: 10.1145/571647.571648
[11]   TABIA H, DAOUDI M, VANDEBORRE J P, et al A new 3D-matching method of nonrigid and partially similar models using curve analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33 (4): 852- 858
[12]   AVETISYAN A, DAI A, NIEßNER M. End-to-end cad model retrieval and 9dof alignment in 3d scans [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2551-2560.
[13]   SARKAR K, HAMPIHOLI B, VARANASI K, et al. Learning 3d shapes as multi-layered height-maps using 2d convolutional networks [C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 71-86.
[14]   YANG Z, WANG L. Learning relationships for multi-view 3D object recognition [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7505-7514.
[15]   HUANG Q, WANG Y, YIN Z View-based weight network for 3D object recognition[J]. Image and Vision Computing, 2020, 93: 103828
doi: 10.1016/j.imavis.2019.11.006
[16]   SFIKAS K, PRATIKAKIS I, THEOHARIS T Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval[J]. Computers and Graphics, 2018, 71: 208- 218
doi: 10.1016/j.cag.2017.12.001
[17]   PÉREZ-RÚA J M, VIELZEUF V, PATEUX S, et al. MFAS: multimodal fusion architecture search [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6966-6975.
[18]   ZHANG Q, LIU Y, BLUM R S, et al Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review[J]. Information Fusion, 2018, 40: 57- 75
doi: 10.1016/j.inffus.2017.05.006
[19]   HOU M, TANG J, ZHANG J, et al. Deep multimodal multilinear fusion with high-order polynomial pooling [C]// Advances in Neural Information Processing Systems. Vancouver: [s. n.], 2019.
[20]   SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// 3rd International Conference on Learning Representations. San Diego: IEEE, 2015.
[21]   MATURANA D, SCHERER S. Voxnet: a 3d convolutional neural network for real-time object recognition [C]// 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2015: 922-928.
[22]   FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8279-8286.
[23]   DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: [s. n. ], 2019: 4171—4186.
[24]   WU Z, SONG S, KHOSLA A, et al. 3d shapenets: a deep representation for volumetric shapes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1912-1920.
[25]   FENG Y, ZHANG Z, ZHAO X, et al. Gvcnn: group-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 264-272.
[26]   HAN Z, SHANG M, LIU Z, et al SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention[J]. IEEE Transactions on Image Processing, 2018, 28 (2): 658- 672
[27]   QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5099-5108.
[28]   LI Y, BU R, SUN M, et al. Pointcnn: convolution on x- transformed points [C]// Advances in Neural Information Processing Systems. Montreal: [s. n. ], 2018: 828-838.
[29]   ZHANG K, HAO M, WANG J, et al. Linked dynamic graph CNN: learning on point cloud via linking hierarchical features [EB/OL]. [2022-11-08]. https://arxiv.org/abs/1904.10014.
[30]   LU Y, WU Y, LIU B, et al. Cross-modality person reidentification with shared-specific feature transfer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13379-13389.
[31]   KRISHNA K, MURTY M N Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29 (3): 433
doi: 10.1109/3477.764879
[32]   MURTAGH F, CONTRERAS P Algorithms for hierarchical clustering: an overview[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2012, 2 (1): 86- 97
doi: 10.1002/widm.53
[33]   KHAN K, REHMAN S U, AZIZ K, et al. DBSCAN: past, present and future [C]// 5th International Conference on the Applications of Digital Information and Web Technologies. New York: IEEE, 2014: 232-238.
[34]   GHULI P, SHUKLA A, KIRAN R, et al Multidimensional canopy clustering on iterative MapReduce framework using Elefig tool[J]. IETE Journal of Research, 2015, 61 (1): 14- 21
doi: 10.1080/03772063.2014.988760
[35]   LU Y, TIAN Z, PENG P, et al GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system[J]. Energy and Buildings, 2019, 190: 49- 60
doi: 10.1016/j.enbuild.2019.02.014
[36]   VAN DER MAATEN L, HINTON G Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605
[37]   WATTENBERG M, VIÉGAS F, JOHNSON I How to use t-SNE effectively[J]. Distill, 2016, 1 (10): e2
[1] Siyi QIN,Shaoyan GAI,Feipeng DA. Video object detection algorithm based on multi-level feature aggregation under mixed sampler[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 10-19.
[2] Yu-ting SU,Rong-xuan LU,Wei ZHANG. Vehicle re-identification algorithm based on attention mechanism and adaptive weight[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 712-718.
[3] Qing ZHAO,Xue-ying ZHANG,Gui-jun CHEN,Jing ZHANG. EEG and fNIRS emotion recognition based on modality attention graph convolution feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1987-1997.
[4] Xia HUA,Xin-qing WANG,Ting RUI,Fa-ming SHAO,Dong WANG. Vision-driven end-to-end maneuvering object tracking of UAV[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1464-1472.
[5] Xun CHENG,Jian-bo YU. Monitoring method for machining tool wear based on machine vision[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(5): 896-904.
[6] Zhu-ye XU,Xiao-qiang ZHAO,Hong-mei JIANG. 3D model fitting method based on point distribution model[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(12): 2373-2381.
[7] Jun DU,Chen MA,Zheng-ying WEI. Dynamic response of surface morphology of aluminum (Al) deposited layers in wire and arc additive manufacturing based on visual sensing[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1481-1489.
[8] Zhuang KANG,Jie YANG,Hao-qi GUO. Automatic garbage classification system based on machine vision[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1272-1280.
[9] JIANG Zhuo-hua, JIANG Huan-yu, TONG Jun-hua. Optimal design of end-effector on automatic plug seedling transplanter[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(6): 1119-1125.
[10] BI Yun bo, XU Chao, Fan Xin tian, YAN Wei miao. Method of countersink perpendicularity detection using vision measurement[J]. Journal of ZheJiang University (Engineering Science), 2017, 51(2): 312-318.
[11] LIU Ya nan, NI He peng, ZHANG Cheng rui WANG Yun fei; SUN Hao chun. PC-based open control platform design of integration of machine vision and motion control[J]. Journal of ZheJiang University (Engineering Science), 2016, 50(7): 1381-1386.
[12] YE Xiao-wei, ZHANG Xiao-ming, NI Yi-qing, WONG Kai-yuen, FAN Ke-qing. Bridge deflection measurement method based on machine vision technology[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(5): 813-819.
[13] XU Song,SUN Xiu-xia,HE Yan. Iterative method of camera distortion calibration utilizing lines-imaging characteristics[J]. Journal of ZheJiang University (Engineering Science), 2014, 48(3): 404-413.
[14] LV Gu-lai,LI Jian-ping,LI Qiang,YU Li-xing,ZHU Song-ming,LOU Jian-zhong. Method for rootstock position recognition based on machine vision[J]. Journal of ZheJiang University (Engineering Science), 2011, 45(10): 1766-1770.
[15] GAO Bo-yong, YAO Fu-tian, ZHANG San-yuan. 3D model semantic classification and retrieval with
Gaussian processes
[J]. Journal of ZheJiang University (Engineering Science), 2010, 44(12): 2251-2256.