|
|
Open-set 3D model retrieval algorithm based on multi-modal fusion |
Fuxin MAO1(),Xu YANG1,Jiaqiang CHENG2,Tao PENG3 |
1. Engineering Training Center, Tianjin University of Technology and Education, Tianjin 300222, China 2. Tianjin Huada Technology Limited Company, Tianjin 300131, China 3. College of Automobile and Transportation, Tianjin University of Technology and Education, Tianjin 300222, China |
|
|
Abstract An open domain 3D model retrieval algorithm was proposed in order to meet the requirement of management and retrieval of massive new model data under the open domain. The semantic consistency of multi-modal information can be effectively used. The category information among unknown samples was explored with the help of unsupervised algorithm. Then the unknown class information was introduced into the parameter optimization process of the network model. The network model has better characterization and retrieval performance in the open domain condition. A hierarchical multi-modal information fusion model based on a Transformer structure was proposed, which could effectively remove the redundant information among the modalities and obtain a more robust model representation vector. Experiments were conducted on the dataset ModelNet40, and the experiments were compared with other typical algorithms. The proposed method outperformed all comparative methods in terms of mAP metrics, which verified the effectiveness of the method in terms of retrieval performance improvement.
|
Received: 22 November 2022
Published: 07 November 2023
|
|
Fund: 天津市多元投入基金重点资助项目(21JCZDJC00700) |
基于多模态融合的开放域三维模型检索算法
为了满足开放域下海量新增模型数据的管理和检索需求,提出开放域三维模型检索算法,可以有效地利用多模态信息的语义一致性. 借助无监督算法探寻未知样本间的类别信息,利用该类别信息实现网络模型的参数优化,使得网络模型在开放域条件下具有更好的模型表征性能及检索结果. 提出基于Transformer结构的层级化多模态信息融合模型,有效地剔除了多模态间的冗余信息,得到鲁棒性更强的模型表征向量. 在数据集ModelNet40上进行实验,通过与其他典型算法的对比实验可知,所提方法在mAP指标上优于所有的对比方法,验证了该方法在检索性能提升上的有效性.
关键词:
机器视觉,
多模态融合,
开放域检索,
三维模型
|
|
[1] |
SU H, MAJI S, KALOGERAKIS E, et al. Multi-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2015: 945- 953.
|
|
|
[2] |
QI C R, SU H, MO K, et al. Pointnet: deep learning on point sets for 3d classification and segmentation [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 652-660.
|
|
|
[3] |
FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019, 33(1): 8279-8286.
|
|
|
[4] |
KLOKOV R, LEMPITSKY V. Escape from cells: deep kd-networks for the recognition of 3d point cloud models [C]// Proceedings of the IEEE International Conference on Computer Vision. Piscataway: IEEE, 2017: 863- 872.
|
|
|
[5] |
HAN Z, LU H, LIU Z, et al 3D2SeqViews: aggregating sequential views for 3D global feature learning by CNN with hierarchical attention aggregation[J]. IEEE Transactions on Image Processing, 2019, 28 (8): 3986- 3999
|
|
|
[6] |
LI B, LU Y, LI C, et al A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries[J]. Computer Vision and Image Understanding, 2015, 131: 1- 27
doi: 10.1016/j.cviu.2014.10.006
|
|
|
[7] |
VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need[C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5998--6008.
|
|
|
[8] |
DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: transformers for image recognition at scale [C]// International Conference on Learning Representations. Vienna: [s. n. ], 2021.
|
|
|
[9] |
FENG Y, GAO Y, ZHAO X, et al SHREC’22 track: open-set 3D object retrieval[J]. Computers and Graphics, 2022, 107: 231- 240
doi: 10.1016/j.cag.2022.07.020
|
|
|
[10] |
OSADA R, FUNKHOUSER T, CHAZELLE B, et al Shape distributions[J]. ACM Transactions on Graphics, 2002, 21 (4): 807- 832
doi: 10.1145/571647.571648
|
|
|
[11] |
TABIA H, DAOUDI M, VANDEBORRE J P, et al A new 3D-matching method of nonrigid and partially similar models using curve analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2010, 33 (4): 852- 858
|
|
|
[12] |
AVETISYAN A, DAI A, NIEßNER M. End-to-end cad model retrieval and 9dof alignment in 3d scans [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 2551-2560.
|
|
|
[13] |
SARKAR K, HAMPIHOLI B, VARANASI K, et al. Learning 3d shapes as multi-layered height-maps using 2d convolutional networks [C]// Proceedings of the European Conference on Computer Vision. Berlin: Springer, 2018: 71-86.
|
|
|
[14] |
YANG Z, WANG L. Learning relationships for multi-view 3D object recognition [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 7505-7514.
|
|
|
[15] |
HUANG Q, WANG Y, YIN Z View-based weight network for 3D object recognition[J]. Image and Vision Computing, 2020, 93: 103828
doi: 10.1016/j.imavis.2019.11.006
|
|
|
[16] |
SFIKAS K, PRATIKAKIS I, THEOHARIS T Ensemble of PANORAMA-based convolutional neural networks for 3D model classification and retrieval[J]. Computers and Graphics, 2018, 71: 208- 218
doi: 10.1016/j.cag.2017.12.001
|
|
|
[17] |
PÉREZ-RÚA J M, VIELZEUF V, PATEUX S, et al. MFAS: multimodal fusion architecture search [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 6966-6975.
|
|
|
[18] |
ZHANG Q, LIU Y, BLUM R S, et al Sparse representation based multi-sensor image fusion for multi-focus and multi-modality images: a review[J]. Information Fusion, 2018, 40: 57- 75
doi: 10.1016/j.inffus.2017.05.006
|
|
|
[19] |
HOU M, TANG J, ZHANG J, et al. Deep multimodal multilinear fusion with high-order polynomial pooling [C]// Advances in Neural Information Processing Systems. Vancouver: [s. n.], 2019.
|
|
|
[20] |
SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// 3rd International Conference on Learning Representations. San Diego: IEEE, 2015.
|
|
|
[21] |
MATURANA D, SCHERER S. Voxnet: a 3d convolutional neural network for real-time object recognition [C]// 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems. New York: IEEE, 2015: 922-928.
|
|
|
[22] |
FENG Y, FENG Y, YOU H, et al. Meshnet: mesh neural network for 3d shape representation [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Menlo Park: AAAI, 2019: 8279-8286.
|
|
|
[23] |
DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Minneapolis: [s. n. ], 2019: 4171—4186.
|
|
|
[24] |
WU Z, SONG S, KHOSLA A, et al. 3d shapenets: a deep representation for volumetric shapes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1912-1920.
|
|
|
[25] |
FENG Y, ZHANG Z, ZHAO X, et al. Gvcnn: group-view convolutional neural networks for 3d shape recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 264-272.
|
|
|
[26] |
HAN Z, SHANG M, LIU Z, et al SeqViews2SeqLabels: learning 3D global features via aggregating sequential views by RNN with attention[J]. IEEE Transactions on Image Processing, 2018, 28 (2): 658- 672
|
|
|
[27] |
QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space [C]// Advances in Neural Information Processing Systems. Long Beach: [s. n. ], 2017: 5099-5108.
|
|
|
[28] |
LI Y, BU R, SUN M, et al. Pointcnn: convolution on x- transformed points [C]// Advances in Neural Information Processing Systems. Montreal: [s. n. ], 2018: 828-838.
|
|
|
[29] |
ZHANG K, HAO M, WANG J, et al. Linked dynamic graph CNN: learning on point cloud via linking hierarchical features [EB/OL]. [2022-11-08]. https://arxiv.org/abs/1904.10014.
|
|
|
[30] |
LU Y, WU Y, LIU B, et al. Cross-modality person reidentification with shared-specific feature transfer [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13379-13389.
|
|
|
[31] |
KRISHNA K, MURTY M N Genetic K-means algorithm[J]. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 1999, 29 (3): 433
doi: 10.1109/3477.764879
|
|
|
[32] |
MURTAGH F, CONTRERAS P Algorithms for hierarchical clustering: an overview[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2012, 2 (1): 86- 97
doi: 10.1002/widm.53
|
|
|
[33] |
KHAN K, REHMAN S U, AZIZ K, et al. DBSCAN: past, present and future [C]// 5th International Conference on the Applications of Digital Information and Web Technologies. New York: IEEE, 2014: 232-238.
|
|
|
[34] |
GHULI P, SHUKLA A, KIRAN R, et al Multidimensional canopy clustering on iterative MapReduce framework using Elefig tool[J]. IETE Journal of Research, 2015, 61 (1): 14- 21
doi: 10.1080/03772063.2014.988760
|
|
|
[35] |
LU Y, TIAN Z, PENG P, et al GMM clustering for heating load patterns in-depth identification and prediction model accuracy improvement of district heating system[J]. Energy and Buildings, 2019, 190: 49- 60
doi: 10.1016/j.enbuild.2019.02.014
|
|
|
[36] |
VAN DER MAATEN L, HINTON G Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9 (11): 2579- 2605
|
|
|
[37] |
WATTENBERG M, VIÉGAS F, JOHNSON I How to use t-SNE effectively[J]. Distill, 2016, 1 (10): e2
|
|
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|