Please wait a minute...
浙江大学学报(工学版)  2025, Vol. 59 Issue (5): 920-928    DOI: 10.3785/j.issn.1008-973X.2025.05.005
计算机技术、信息工程     
面向密集预测任务的点云Transformer适配器
张德军1(),白燕子1,曹锋1,吴亦奇1,徐战亚2,*()
1. 中国地质大学(武汉) 智能地学信息处理湖北省重点实验室,湖北 武汉 430074
2. 中国地质大学(武汉) 地理与信息工程学院,湖北 武汉 430074
Point cloud Transformer adapter for dense prediction task
Dejun ZHANG1(),Yanzi BAI1,Feng CAO1,Yiqi WU1,Zhanya XU2,*()
1. Hubei Key Laboratory of Intelligent Geo-Information Processing, China University of Geosciences, Wuhan 430074, China
2. School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China
 全文: PDF(1268 KB)   HTML
摘要:

提出点云Transformer适配器(PCT-Adapter)框架,以增强标准Transformer在点云密集预测任务中的处理能力. 设计灵活的层次化点云多尺度先验特征提取模块,该模块不仅增强了标准Transformer对不同尺度物体的感知能力,而且提升了对多样数据集和下游任务的适应性. 在Adapter与标准Transformer之间设计双向特征交互模块. 该模块实现了点云先验特征向标准Transformer的有效注入及多尺度点云特征金字塔的更新,在保持标准Transformer架构的同时,通过多次交互显著提高了特征的表达能力. PCT-Adapter以标准Transformer为主干,支持加载多种点云Transformer预训练参数,增强了迁移学习的能力. 在ShapeNetPart、S3DIS和SemanticKITTI数据集上的实验结果证明,利用PCT-Adapter框架,显著提升了标准Transformer在密集预测任务中的适应性.

关键词: 标准Transformer密集预测任务适配器特征交互任务迁移    
Abstract:

The point cloud Transformer adapter (PCT-Adapter) framework was proposed to enhance the performance of standard Transformers in point cloud dense prediction tasks. A hierarchical multi-scale prior feature extraction module was designed to improve the Transformer's ability to perceive objects at different scales and enhance its adaptability to diverse datasets and tasks. A bidirectional feature interaction module was introduced between the Adapter and the Transformer, enabling the effective injection of prior features and updating the multi-scale feature pyramid. Then the standard Transformer architecture was maintained, and feature representation was improved through iterative interactions. The PCT-Adapter framework, with the standard Transformer as its backbone, supported various pre-trained point cloud Transformer parameters, enhancing transfer learning capabilities. The experimental results on the ShapeNetPart, S3DIS, and SemanticKITTI datasets showed significant improvements in the adaptability of standard Transformers for dense prediction tasks.

Key words: standard Transformer    dense prediction task    adapter    feature interaction    task transfer
收稿日期: 2024-07-08 出版日期: 2025-04-25
CLC:  TP 393  
基金资助: 智能地学信息处理湖北省重点实验室开放研究课题(KLIGIP-2023-B12).
通讯作者: 徐战亚     E-mail: zhangdejun@cug.edu.cn;xuzhanya@cug.edu.cn
作者简介: 张德军(1982—),男,副教授,硕导,从事三维场景感知与数据融合的研究. orcid.org/0000-0001-9129-534X. E-mail:zhangdejun@cug.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
张德军
白燕子
曹锋
吴亦奇
徐战亚

引用本文:

张德军,白燕子,曹锋,吴亦奇,徐战亚. 面向密集预测任务的点云Transformer适配器[J]. 浙江大学学报(工学版), 2025, 59(5): 920-928.

Dejun ZHANG,Yanzi BAI,Feng CAO,Yiqi WU,Zhanya XU. Point cloud Transformer adapter for dense prediction task. Journal of ZheJiang University (Engineering Science), 2025, 59(5): 920-928.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2025.05.005        https://www.zjujournals.com/eng/CN/Y2025/V59/I5/920

图 1  PCT-Adapter网络的结构图
图 2  先验特征提取模块的结构
图 3  双向特征交互模块的结构
模型mIoUcls/%mIoUins/%
PointNet++[26]81.985.1
PointASNL[27]86.1
PCT[6]86.4
PointTransformer[5]83.786.6
PointCAT[28]84.486.0
Point-Bert[10]84.185.6
PCT-Adapter84.586.0
表 1  ShapeNetPart数据集的部件分割结果
方法mAcc/
%
mIoU/
%
mIoUcls/%
天花板地板墙壁横梁柱子窗户桌子椅子沙发书柜黑板杂物
SPG[27]66.558.089.496.978.10.042.848.961.684.775.469.852.62.152.2
PointWeb[2]66.660.392.098.579.40.021.159.734.876.388.346.969.364.952.5
PAT[29]70.860.193.098.572.31.041.585.138.257.783.648.167.061.333.6
PT[5]76.570.494.098.586.30.038.063.474.389.182.474.380.276.059.3
PCT[6]67.761.392.598.480.60.019.361.648.076.685.246.267.767.952.3
PatchF[13]67.391.898.786.20.034.148.962.481.689.847.274.974.458.6
PointCAT[28]71.064.094.298.380.50.018.655.558.977.288.064.872.268.955.4
SPFormer[14]77.368.991.598.281.40.023.365.340.075.587.759.567.865.649.4
Point-Bert[10]75.763.591.392.373.10.033.965.660.476.582.786.864.041.743.0
PCT-Adapter80.569.091.996.081.60.052.466.567.082.990.170.872.869.554.7
表 2  S3DIS数据集(区域5)的语义分割结果
图 4  在S3DIS数据集(区域5)上的分割可视化效果
方法输入mIoU/%
PointNet++[26]50 000个点20.1
SPG[27]17.4
SPLATNet[30]18.4
TangentConv[31]40.9
SqueezeSegV2[32]64×2 048像素39.7
DarkNet21Seg[23]47.4
DarkNet53Seg[23]49.9
Point-Bert[10]20 000个点44.5
PCT-Adapter53.4
表 3  在SemanticKITTI上采用不同方法的定量结果
TransformerPFEBFImIoU/%mAcc/%
A-to-TT-to-A
????55.665.7
????63.575.7
????65.575.8
????66.576.6
????69.080.5
表 4  PCT-Adapter在S3DIS数据集上的消融实验
NmIoU/%mAcc/%
063.575.7
165.976.1
266.476.0
467.076.8
669.080.5
868.977.6
表 5  S3DIS数据集上交互次数的定量比较
方法预训练模型权重mIoU/%mAcc/%
标准Transformer62.972.1
ACT[7]63.173.2
Point-MAE[9]63.172.1
MaskPoint[8]64.472.7
Point-Bert[10]63.575.7
ReCon[3]64.873.3
PCT-Adapter66.273.7
ACT[7]66.975.5
Point-MAE[9]68.976.5
MaskPoint[8]68.275.9
Point-Bert[10]69.080.5
ReCon[3]67.474.7
表 6  加载各种预训练模型权重的定量比较
1 CHOE J, PARK C, RAMEAU F, et al. Pointmixer: Mlp-mixer for point cloud understanding [C]// European Conference on Computer Vision . Israel: Springer, 2022: 620-640.
2 ZHAO H, JIANG L, FU C W, et al. Pointweb: enhancing local neighborhood features for point cloud processing [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 5565-5573.
3 QI Z, DONG R, FAN G, et al. Contrast with reconstruct: contrastive 3d representation learning guided by generative pretraining [C]// International Conference on Machine Learning . Honolulu: PMLR, 2023: 28223-28243.
4 YANG Y Q, GUO Y X, XIONG J Y, et al. Swin3D: a pretrained Transformer backbone for 3D indoor scene understanding [EB/OL]. (2023-08-26) [2024-05-25]. https://arxiv.org/abs/2304.06906.
5 ZHAO H, JIANG L, JIA J, et al. Point transformer [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Montreal: IEEE, 2021: 16259-16268.
6 GUO M H, CAI J X, LIU Z N, et al Pct: point cloud transformer[J]. Computational Visual Media, 2021, 7: 187- 199
doi: 10.1007/s41095-021-0229-5
7 DONG R, QI Z, ZHANG L, et al. Autoencoders as cross-modal teachers: can pretrained 2D image Transformers help 3D representation learning? [EB/OL]. (2023-02-02) [2024-05-25]. https://arxiv.org/abs/2212.08320.
8 LIU H, CAI M, LEE Y J. Masked discrimination for self-supervised learning on point clouds [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 657-675.
9 PANG Y, WANG W, TAY F E H, et al. Masked autoencoders for point cloud self-supervised learning [C]// European Conference on Computer Vision . Tel Aviv: Springer, 2022: 604-621.
10 YU X, TANG L, RAO Y, et al. Point-bert: pre-training 3d point cloud transformers with masked point modeling [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Oreans: IEEE, 2022: 19313-19322.
11 CHANG A X, FUNKHOUSER T, GUIBAS L, et al. Shapenet: an information-rich 3d model repository [EB/OL]. (2015-12-09) [2024-05-25]. http://arxiv.org/abs/1512.03012.
12 TANG Y, ZHANG R, GUO Z, et al. Point-PEFT: parameter-efficient fine-tuning for 3D pre-trained models [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Vancouver: AAAI, 2024: 5171-5179.
13 ZHANG C, WAN H, SHEN X, et al. Patchformer: an efficient point transformer with patch attention [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . New Orleans: IEEE, 2022: 11799-11808.
14 SUN J, QING C, TAN J, et al. Superpoint transformer for 3d scene instance segmentation [C]// Proceedings of the AAAI Conference on Artificial Intelligence . Washington DC: AAAI, 2023: 2393-2401.
15 DEVLIN J, CHANG M W, LEE K, et al. Bert: pre-training of deep bidirectional transformers for language understanding [C]// Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies . Minneapolis: [s. n.], 2019: 4171–4186.
16 STICKLAND A C, MURRAY I. Bert and pals: projected attention layers for efficient adaptation in multi-task learning [C]// International Conference on Machine Learning . Long Beach: ACM, 2019: 5986-5995.
17 CHEN Z, DUAN Y, WANG W, et al. Vision transformer adapter for dense predictions [EB/OL]. (2023-02-13) [2024-05-25]. https://arxiv.org/abs/2205.08534.
18 VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook: Curran Associates Inc. , 2017: 6000–6010.
19 WANG W, XIE E, LI X, et al Pvt v2: improved baselines with pyramid vision transformer[J]. Computational Visual Media, 2022, 8 (3): 415- 424
doi: 10.1007/s41095-022-0274-8
20 WU X, TIAN Z, WEN X, et al. Towards large-scale 3d representation learning with multi-dataset point prompt training [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2024: 19551-19562.
21 ZHANG R, WANG L, WANG Y, et al. Starting from non-parametric networks for 3d point cloud analysis [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Vancouver: IEEE, 2023: 5344-5353.
22 MA X, QIN C, YOU H, et al. Rethinking network design and local geometry in point cloud: a simple residual MLP framework [EB/OL]. (2022-11-29) [2024-05-25]. https://arxiv.org/abs/2202.07123.
23 BEHLEY J, GARBADE M, MILIOTO A, et al. Semantickitti: a dataset for semantic scene understanding of lidar sequences [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision . Seoul: IEEE, 2019: 9297-9307.
24 ARMENI I, SENER O, ZAMIR A R, et al. 3d semantic parsing of large-scale indoor spaces [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 1534-1543.
25 HAN X F, HE Z Y, CHEN J, et al 3CROSSNet: cross-level cross-scale cross-attention network for point cloud representation[J]. IEEE Robotics and Automation Letters, 2022, 7 (2): 3718- 3725
doi: 10.1109/LRA.2022.3147907
26 QI C R, YI L, SU H, et al. Pointnet++: deep hierarchical feature learning on point sets in a metric space [C]// Proceedings of the 31st International Conference on Neural Information Processing Systems . Red Hook: Curran Associates Inc. , 2017: 5105–5114.
27 YAN X, ZHENG C, LI Z, et al. Pointasnl: robust point clouds processing using nonlocal neural networks with adaptive sampling [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 5589-5598.
28 YANG X, JIN M, HE W, et al. PointCAT: cross-attention Transformer for point cloud [EB/OL]. (2023-04-06) [2024-05-25]. https://arxiv.org/abs/2304.03012.
29 YANG J, ZHANG Q, NI B, et al. Modeling point clouds with self-attention and gumbel subset sampling [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition . Long Beach: IEEE, 2019: 3323-3332.
30 SU H, JAMPANI V, SUN D, et al. Splatnet: sparse lattice networks for point cloud processing [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake: IEEE, 2018: 2530-2539.
31 TATARCHENKO M, PARK J, KOLTUN V, et al. Tangent convolutions for dense prediction in 3d [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake: IEEE, 2018: 3887-3896.
[1] 梁礼明,龙鹏威,金家新,李仁杰,曾璐. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522.
[2] 华幸成,刘鹏. 基于动态任务迁移的近数据处理方法[J]. 浙江大学学报(工学版), 2019, 53(12): 2348-2356.