Open-set 3D model retrieval algorithm based on multi-modal fusion
Fuxin MAO1(),Xu YANG1,Jiaqiang CHENG2,Tao PENG3
1. Engineering Training Center, Tianjin University of Technology and Education, Tianjin 300222, China 2. Tianjin Huada Technology Limited Company, Tianjin 300131, China 3. College of Automobile and Transportation, Tianjin University of Technology and Education, Tianjin 300222, China
An open domain 3D model retrieval algorithm was proposed in order to meet the requirement of management and retrieval of massive new model data under the open domain. The semantic consistency of multi-modal information can be effectively used. The category information among unknown samples was explored with the help of unsupervised algorithm. Then the unknown class information was introduced into the parameter optimization process of the network model. The network model has better characterization and retrieval performance in the open domain condition. A hierarchical multi-modal information fusion model based on a Transformer structure was proposed, which could effectively remove the redundant information among the modalities and obtain a more robust model representation vector. Experiments were conducted on the dataset ModelNet40, and the experiments were compared with other typical algorithms. The proposed method outperformed all comparative methods in terms of mAP metrics, which verified the effectiveness of the method in terms of retrieval performance improvement.
Fuxin MAO,Xu YANG,Jiaqiang CHENG,Tao PENG. Open-set 3D model retrieval algorithm based on multi-modal fusion. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 61-70.
Fig.1Schematic diagram of open domain 3D model retrieval algorithm based on multi-modal fusion
Fig.2Single mode feature extraction network
Fig.3Multi-modal feature fusion network
Fig.4Three-dimensional model data of ModelNet40 datasets
方法
数据模态
mAP/%
NN/%
NDCG/%
ANMRR/%
3D ShapeNets[24]
体素
71.41
93.05
84.83
30.38
MeshNet[3]
网格
82.15
96.55
87.52
27.20
MVCNN[1]
多视图
83.86
95.97
88.75
26.55
GVCNN[25]
多视图
84.94
97.01
88.63
25.83
SeqViews2SeqLabels[26]
多视图
83.55
97.47
86.52
26.45
VoxNet[21]
点云
76.86
95.32
85.12
32.55
PointNet[2]
点云
81.72
94.55
85.56
29.86
PointNet++[27]
点云
82.10
95.71
86.57
28.54
PointCNN[28]
点云
83.33
96.29
87.28
26.75
LDGCNN[29]
点云
83.98
96.15
88.92
26.25
MSIF[18]
多模态
85.12
96.81
88.79
27.37
HPFN[19]
多模态
85.45
97.03
89.24
26.72
SSFT[30]
多模态
85.89
97.44
89.83
26.63
本文方法
体素
77.25
93.85
85.01
29.81
本文方法
网格
82.86
96.79
87.61
27.46
本文方法
多视图
85.07
97.32
89.37
26.38
本文方法
点云
84.19
96.85
89.16
26.49
本文方法
多模态
86.23
97.82
90.13
26.17
Tab.1Retrieval performance of various algorithms in unknown class data sets
序号
编码器
解码器
无监督学习
mAP/%
1
—
—
—
68.83
2
√
—
—
71.49
3
—
√
—
72.68
4
—
—
√
82.50
5
√
√
—
79.17
6
√
—
√
83.92
7
—
√
√
84.37
8
√
√
√
86.23
Tab.2Retrieval performance of proposed algorithm in different network structures
网络层数
mAP/%
NN/%
NDCG/%
ANMRR/%
2
84.57
95.32
86.84
27.93
3
84.88
96.35
87.43
27.07
4
84.31
97.22
86.40
26.11
5
85.57
97.38
88.91
25.79
6
85.52
97.55
89.37
25.96
8
85.49
97.65
89.32
25.85
10
86.23
97.82
90.13
26.17
Tab.3Retrieval performance of proposed algorithm under different encoder layers
算法
mAP/%
NN/%
NDCG/%
ANMRR/%
K-means[31]
86.23
97.82
90.13
26.17
层级式聚类[32]
84.56
96.97
87.55
25.47
DBSCAN[33]
85.37
97.33
88.95
26.73
Canopy[34]
82.53
94.45
83.03
35.78
GMM[35]
83.42
95.98
86.27
28.70
Tab.4Retrieval performance of proposed algorithm under different unsupervised algorithms
序号
图像
点云
网格
体素
mAP/%
1
√
—
—
—
85.07
2
—
√
—
—
84.19
3
—
—
√
—
82.86
4
—
—
—
√
77.25
5
√
√
—
—
85.49
6
—
—
√
√
84.78
7
√
√
√
√
86.23
Tab.5Retrieval performance of proposed algorithm under different input modalities
融合序号
mAP/%
图像
点云
网格
体素
1
2
3
4
85.46
2
3
4
1
85.79
3
4
1
2
86.10
4
1
2
3
85.77
2
4
1
3
85.14
1
4
2
3
86.23
Tab.6Retrieval performance of proposed algorithm in different fusion sequences
Fig.5Feature distribution visualized by t-SNE
Fig.6Input models and corresponding Top10 ranked results
[1]
SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 945- 953.
[2]
QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 652-660.
[3]
FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019, 33(1): 8279-8286.
[4]
KLOKOV R, LEMPITSKY V. Escape from cells: deep kd-networks for the recognition of 3d point cloud models [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 863- 872.
[5]
HAN Z, LU H, LIU Z, et al 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation[J]. IEEE Transactions on Image Processing, 2019, 28 (8): 3986- 3999
[6]
LI B, LU Y, LI C, et al A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries[J]. Computer Vision and Image Understanding, 2015, 131: 1- 27
doi: 10.1016/j.cviu.2014.10.006
[7]
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5998--6008.
[8]
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations. Vienna: [s. n. ], 2021.
[9]
FENG Y, GAO Y, ZHAO X, et al SHREC’22 track: open-set 3D object retrieval[J]. Computers and Graphics, 2022, 107: 231- 240
doi: 10.1016/j.cag.2022.07.020
[10]
OSADA R, FUNKHOUSER T, CHAZELLE B, et al Shape distributions[J]. ACM Transactions on Graphics, 2002, 21 (4): 807- 832
doi: 10.1145/571647.571648
[11]
TABIA H, DAOUDI M, VANDEBORRE J P, et al A new 3D-matching method of nonrigid and partially similar models using curve analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33 (4): 852- 858
[12]
AVETISYAN A, DAI A, NIEßNER M. End-to-end cad model retrieval and 9dof alignment in 3d scans [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2551-2560.
[13]
SARKAR K, HAMPIHOLI B, VARANASI K, et al. Learning 3d shapes as multi-layered height-maps using 2d convolutional networks [C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 71-86.
[14]
YANG Z, WANG L. Learning relationships for multi-view 3D object recognition [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7505-7514.
[15]
HUANG Q, WANG Y, YIN Z View-based weight network for 3D object recognition[J]. Image and Vision Computing, 2020, 93: 103828
doi: 10.1016/j.imavis.2019.11.006
[16]
SFIKAS K, PRATIKAKIS I, THEOHARIS T Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval[J]. Computers and Graphics, 2018, 71: 208- 218
doi: 10.1016/j.cag.2017.12.001
[17]
PÉREZ-RÚA J M, VIELZEUF V, PATEUX S, et al. MFAS: multimodal fusion architecture search [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6966-6975.
[18]
ZHANG Q, LIU Y, BLUM R S, et al Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review[J]. Information Fusion, 2018, 40: 57- 75
doi: 10.1016/j.inffus.2017.05.006
[19]
HOU M, TANG J, ZHANG J, et al. Deep multimodal multilinear fusion with high-order polynomial pooling [C]// Advances in Neural Information Processing Systems. Vancouver: [s. n.], 2019.
[20]
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// 3rd International Conference on Learning Representations. San Diego: IEEE, 2015.
[21]
MATURANA D, SCHERER S. Voxnet: a 3d convolutional neural network for real-time object recognition [C]// 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2015: 922-928.
[22]
FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8279-8286.
[23]
DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: [s. n. ], 2019: 4171—4186.
[24]
WU Z, SONG S, KHOSLA A, et al. 3d shapenets: a deep representation for volumetric shapes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1912-1920.
[25]
FENG Y, ZHANG Z, ZHAO X, et al. Gvcnn: group-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 264-272.
[26]
HAN Z, SHANG M, LIU Z, et al SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention[J]. IEEE Transactions on Image Processing, 2018, 28 (2): 658- 672
[27]
QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5099-5108.
[28]
LI Y, BU R, SUN M, et al. Pointcnn: convolution on x- transformed points [C]// Advances in Neural Information Processing Systems. Montreal: [s. n. ], 2018: 828-838.
[29]
ZHANG K, HAO M, WANG J, et al. Linked dynamic graph CNN: learning on point cloud via linking hierarchical features [EB/OL]. [2022-11-08]. https://arxiv.org/abs/1904.10014.
[30]
LU Y, WU Y, LIU B, et al. Cross-modality person reidentification with shared-specific feature transfer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13379-13389.
[31]
KRISHNA K, MURTY M N Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29 (3): 433
doi: 10.1109/3477.764879
[32]
MURTAGH F, CONTRERAS P Algorithms for hierarchical clustering: an overview[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2012, 2 (1): 86- 97
doi: 10.1002/widm.53
[33]
KHAN K, REHMAN S U, AZIZ K, et al. DBSCAN: past, present and future [C]// 5th International Conference on the Applications of Digital Information and Web Technologies. New York: IEEE, 2014: 232-238.
[34]
GHULI P, SHUKLA A, KIRAN R, et al Multidimensional canopy clustering on iterative MapReduce framework using Elefig tool[J]. IETE Journal of Research, 2015, 61 (1): 14- 21
doi: 10.1080/03772063.2014.988760
[35]
LU Y, TIAN Z, PENG P, et al GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system[J]. Energy and Buildings, 2019, 190: 49- 60
doi: 10.1016/j.enbuild.2019.02.014
[36]
VAN DER MAATEN L, HINTON G Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605
[37]
WATTENBERG M, VIÉGAS F, JOHNSON I How to use t-SNE effectively[J]. Distill, 2016, 1 (10): e2