Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (7): 1404-1415    DOI: 10.3785/j.issn.1008-973X.2026.07.004
计算机与控制工程     
基于改进RT-DETR的水下色偏环境中小型生物检测
董绍江(),肖涛,吕振鸣,夏浩然,罗家元,孙世政,张霞,刘超
重庆交通大学 机电与车辆工程学院,重庆 400074
Small organism detection in underwater color-cast environments based on improved RT-DETR
Shaojiang DONG(),Tao XIAO,Zhenming LV,Haoran XIA,Jiayuan LUO,Shizheng SUN,Xia ZHANG,Chao LIU
School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China
 全文: PDF(3433 KB)   HTML
摘要:

为了实现对水下小型生物的快速、准确检测,针对现有模型在水下色偏环境中检测性能差的问题,提出基于改进RT-DETR的检测方法(FES-DETR). 在主干网络中设计高效多尺度注意力特征提取(Faster-Rep-EMA)模块,以优化原有的BasicBlock,提高对色偏干扰下微弱目标的特征提取能力和计算效率. 在颈部编码网络中,将纠缠Transformer块(ETB)与基于注意力的尺度内特征交互(AIFI)模块融合,实现频率域和空间域特征的联合优化,增强色偏图像的特征表达. 设计轻量化小目标增强金字塔(SOEP)模块,增强模型对小目标的检测性能并降低计算冗余. 实验结果表明,FES-DETR显著提高了检测性能,准确率、召回率较RT-DETR-r18分别提升了2.6和2.3个百分点,mAP@0.5和mAP@0.5∶0.95分别提升了3.2和2.1个百分点,参数量和计算量分别下降了3.0 M和8.5 G,FPS提高至95.7帧/s. 与YOLO系列等主流目标检测模型相比,该模型展现出更优越的性能,为水下小型生物检测提供了高效的技术手段.

关键词: 色偏环境小型生物目标检测RT-DETR纠缠Transformer    
Abstract:

A detection method based on an improved RT-DETR (FES-DETR) was proposed to achieve fast and accurate detection of underwater small organisms, and address the poor detection performance of the existing models in underwater color-cast environments. An efficient multi-scale attention feature extraction (Faster-Rep-EMA) module was designed to optimize the original BasicBlock in the backbone network, thereby improving the feature extraction ability and computational efficiency for weak targets under color-cast interference. The entanglement Transformer block (ETB) was integrated with the attention-based intra-scale feature interaction (AIFI) module in the neck encoding network to achieve the joint optimization of frequency-domain and spatial-domain features, which could enhance the feature representation of color-cast images. A lightweight small object enhancement pyramid (SOEP) module was designed to enhance the detection performance of the model for small targets and reduce the computational redundancy. Experimental results showed that the FES-DETR significantly improved the detection performance. Compared with RT-DETR-r18, the precision and recall were improved by 2.6 and 2.3 percentage points, respectively; the mAP@0.5 and the mAP@0.5:0.95 were improved by 3.2 and 2.1 percentage points, respectively; the number of parameters and computational complexity were decreased by 3.0 M and 8.5 G, respectively; and the FPS was increased to 95.7 frames per second. Compared with mainstream target detection models such as YOLO series, this model showed more superior performance, providing an effective technical approach for the detection of underwater small organisms.

Key words: color-cast environment    small organism    target detection    RT-DETR    entanglement Transformer
收稿日期: 2025-06-15 出版日期: 2026-05-23
CLC:  TP 391.4  
基金资助: 重庆市技术创新与应用发展专项重点资助项目(CSTB2024TIAD-KPX0081);重庆市自然科学基金创新发展联合基金资助项目(CSTB2024NSCQ-LZX0024);重庆市教育委员会科学技术研究资助项目(KJZD-K202300711).
作者简介: 董绍江(1982—),男,教授,从事特种机器人、机电一体化研究. orcid.org/0009-0006-6937-0854. E-mail:dongshaojiang100@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
董绍江
肖涛
吕振鸣
夏浩然
罗家元
孙世政
张霞
刘超

引用本文:

董绍江,肖涛,吕振鸣,夏浩然,罗家元,孙世政,张霞,刘超. 基于改进RT-DETR的水下色偏环境中小型生物检测[J]. 浙江大学学报(工学版), 2026, 60(7): 1404-1415.

Shaojiang DONG,Tao XIAO,Zhenming LV,Haoran XIA,Jiayuan LUO,Shizheng SUN,Xia ZHANG,Chao LIU. Small organism detection in underwater color-cast environments based on improved RT-DETR. Journal of ZheJiang University (Engineering Science), 2026, 60(7): 1404-1415.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.07.004        https://www.zjujournals.com/eng/CN/Y2026/V60/I7/1404

图 1  面向水下色偏环境的小型生物检测模型(FES-DETR)结构图
图 2  部分卷积模块结构图
图 3  融合部分卷积与重参数化卷积的RepPConv模块结构图
图 4  高效多尺度注意力模块结构图
图 5  高效多尺度注意力特征提取模块结构图
图 6  纠缠Transformer块结构图
图 7  小目标增强金字塔模块结构图
图 8  空间到深度卷积模块结构图
图 9  CSPOmnikernel模块结构图
图 10  DUO数据集样图
参数数值参数数值
训练轮数250初始学习率10?4
批量大小32动量0.9
输入图像像素640×640权重衰减系数10?4
表 1  实验参数设置
分组配置mAP@0.5/%Np/MFLOPs/GFPS/(帧·s?1)
EMA_no83.116.147.199.7
EMA_1683.917.047.498.6
EMA_3284.316.447.299.4
EMA_6483.616.747.498.9
表 2  采取不同特征分组数的实验结果
FREETB-AIFISOEPP/%R/%mAP@0.5/%mAP@0.5?0.95/%Np/MFLOPs/GFPS/(帧·s ?1)
×××84.874.982.463.220.856.985.5
××86.576.184.364.616.447.299.4
××87.176.684.964.822.160.373.5
××86.976.484.264.617.750.792.6
×86.476.183.964.118.354.685.3
×86.275.884.164.416.951.5100.0
×87.276.984.464.920.456.780.2
87.477.285.665.317.848.495.7
表 3  各模块的消融实验结果
rmAP@0.5/%Np/MFLOPs/GFPS/(帧·s?1)
1/285.418.148.695.4
1/485.617.848.495.7
1/885.017.648.396.3
表 4  通道数对比实验结果
模型P/%R/%mAP@0.5/%mAP@0.5?0.95/%Np/MFLOPs/GFPS/(帧·s?1)
Faster R-CNN75.870.473.157.241.1126.746.5
YOLOv5s81.071.277.661.510.123.2123.7
YOLOv8n83.772.679.361.77.917.6128.4
YOLOv9t84.573.180.462.19.820.397.2
YOLOv10n84.773.582.362.87.718.4133.4
YOLOv11n85.275.282.763.49.419.1148.2
Deformable-DETR84.273.881.762.340.688.268.6
RT-DETR-r5085.375.382.963.641.9130.863.7
RT-DETR-r3484.474.182.262.731.174.582.3
RT-DETR-r1884.874.982.463.220.856.985.5
FES-DETR87.477.285.665.317.848.495.7
表 5  FES-DETR与主流目标检测算法的对比实验结果
图 11  多算法实验指标对比
图 12  多算法检测结果可视化对比
图 13  多算法热力图可视化对比
1 ELMEZAIN M, SAAD SAOUD L, SULTAN A, et al Advancing underwater vision: a survey of deep learning models for underwater object recognition and tracking[J]. IEEE Access, 2025, 13: 17830- 17867
doi: 10.1109/ACCESS.2025.3534098
2 SHI P, XU X, NI J, et al Underwater biological detection algorithm based on improved faster-RCNN[J]. Water, 2021, 13 (17): 2420
doi: 10.3390/w13172420
3 张艳, 孙晶雪, 孙叶美, 等 基于分割注意力与线性变换的轻量化目标检测[J]. 浙江大学学报: 工学版, 2023, 57 (6): 1195- 1204
ZHANG Yan, SUN Jingxue, SUN Yemei, et al Lightweight object detection based on split attention and linear transformation[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1195- 1204
doi: 10.3785/j.issn.1008-973X.2023.06.015
4 闵锋, 张雨薇, 刘煜晖, 等 改进YOLOv8的轻量化水下生物检测模型[J]. 计算机工程与应用, 2025, 61 (6): 96- 105
MIN Feng, ZHANG Yuwei, LIU Yuhui, et al Improving lightweight underwater biological detection model of YOLOv8[J]. Computer Engineering and Applications, 2025, 61 (6): 96- 105
doi: 10.3778/j.issn.1002-8331.2408-0411
5 GUO L, LIU X, YE D, et al Underwater object detection algorithm integrating image enhancement and deformable convolution[J]. Ecological Informatics, 2025, 89: 103185
doi: 10.1016/j.ecoinf.2025.103185
6 ZHOU H, KONG M, YUAN H, et al Real-time underwater object detection technology for complex underwater environments based on deep learning[J]. Ecological Informatics, 2024, 82: 102680
doi: 10.1016/j.ecoinf.2024.102680
7 ZHANG W, WANG H, LI H, et al Dual-stream feature pyramid network with task interaction for underwater object detection[J]. Digital Signal Processing, 2025, 163: 105199
doi: 10.1016/j.dsp.2025.105199
8 CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// European Conference on Computer Vision. [S. l. ]: Springer, 2020: 213–229.
9 ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. (2020-07-09) [2025-06-01]. https://arxiv.org/abs/2010.04159.
10 ZHANG H, LI F, LIU S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection [EB/OL]. (2022-03-07) [2025-06-01]. https://arxiv.org/abs/2203.03605.
11 ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965–16974.
12 JAMIESON S, HOW J P, GIRDHAR Y. DeepSeeColor: realtime adaptive color correction for autonomous underwater vehicles via deep learning methods [C]// Proceedings of the IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 3095–3101.
13 吕振鸣, 董绍江, 夏宗佑, 等 基于改进CycleGAN的多失真类型水下图像增强[J]. 浙江大学学报: 工学版, 2025, 59 (6): 1148- 1158
LV Zhenming, DONG Shaojiang, XIA Zongyou, et al Multi-distortion type underwater image enhancement based on improved CycleGAN[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (6): 1148- 1158
14 CHEN J, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 12021–12031.
15 OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning [C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1–5.
16 BERMAN D, LEVY D, AVIDAN S, et al Underwater single image color restoration using haze-lines and a new quantitative dataset[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (8): 2822- 2837
doi: 10.1109/tpami.2020.2977624
17 SUN Y, XU C, YANG J, et al. Frequency-spatial entanglement learning for camouflaged object detection [C]// European Conference on Computer Vision. Milan: Springer, 2024: 343–360.
18 KHALILI B, SMYTH A W SOD-YOLOv8: enhancing YOLOv8 for small object detection in aerial imagery and traffic scenes[J]. Sensors, 2024, 24 (19): 6209
19 SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Machine Learning and Knowledge Discovery in Databases. Grenoble: Springer, 2023: 443–459.
20 CUI Y, REN W, KNOLL A Omni-kernel modulation for universal image restoration[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (12): 12496- 12509
doi: 10.1109/TCSVT.2024.3429557
21 LIU C, LI H, WANG S, et al. A dataset and benchmark of underwater object detection for robot picking [C]// Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Shenzhen: IEEE, 2021: 1–6.
22 WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information [C]// European Conference on Computer Vision. Milan: Springer, 2024: 1–21.
23 WANG A, CHEN H, LIU L, et al. YOLOv10: real-time end-to-end object detection [EB/OL]. (2024-05-13) [2025-06-06]. https://arxiv.org/abs/2405.14458.
24 KHANAM R, HUSSAIN M. YOLOv11: an overview of the key architectural enhancements [EB/OL]. (2024-10-09) [2025-06-06]. https://arxiv.org/abs/2410.17725.
[1] 于天河,王文龙,刘镛,杨壮壮,侯善冲. 改进的有雾图像中被遮挡车辆及行人识别算法[J]. 浙江大学学报(工学版), 2026, 60(4): 738-750.
[2] 宋耀莲,彭驰,唐菁敏,赵宣植,虞贵财. 基于融合注意力机制的光学遥感图像小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(4): 763-771.
[3] 刘慧,王防修,王意,黄淄博,苏晨. 轻量级改进RT-DETR的葡萄叶片病害检测算法[J]. 浙江大学学报(工学版), 2026, 60(3): 604-613.
[4] 孟昱煜,孔垂乐,火久元,武泽宇. 重构YOLOv11的无人机小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(2): 303-312.
[5] 肖剑,何昕泽,程鸿亮,杨小苑,胡欣. 基于多尺度特征增强的航拍小目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 19-31.
[6] 包晓安,彭书友,张娜,涂小妹,张庆琪,吴彪. 基于多方位感知深度融合检测头的目标检测算法[J]. 浙江大学学报(工学版), 2026, 60(1): 32-42.
[7] 董超群,汪战,廖平,谢帅,荣玉杰,周靖淞. 轻量化YOLOv5s-OCG的轨枕裂纹检测算法[J]. 浙江大学学报(工学版), 2025, 59(9): 1838-1845.
[8] 翟亚红,陈雅玲,徐龙艳,龚玉. 改进YOLOv8s的轻量级无人机航拍小目标检测算法[J]. 浙江大学学报(工学版), 2025, 59(8): 1708-1717.
[9] 付家瑞,李兆飞,周豪,黄惟. 基于Convnextv2与纹理边缘引导的伪装目标检测[J]. 浙江大学学报(工学版), 2025, 59(8): 1718-1726.
[10] 梁耕良,韩曙光. 基于改进RT-DETR的牛仔面料疵点检测算法[J]. 浙江大学学报(工学版), 2025, 59(6): 1169-1178.
[11] 徐慧智,王秀青. 基于车辆图像特征的前车距离与速度感知[J]. 浙江大学学报(工学版), 2025, 59(6): 1219-1232.
[12] 李沈崇,曾新华,林传渠. 基于轴向注意力的多任务自动驾驶环境感知算法[J]. 浙江大学学报(工学版), 2025, 59(4): 769-777.
[13] 王浚银,文斌,沈艳军,张俊,王子豪. 基于改进YOLOv7-tiny的铝型材表面缺陷检测方法[J]. 浙江大学学报(工学版), 2025, 59(3): 523-534.
[14] 董红召,林少轩,佘翊妮. 交通目标YOLO检测技术的研究进展[J]. 浙江大学学报(工学版), 2025, 59(2): 249-260.
[15] 何永福,谢世维,于佳禄,陈思宇. 考虑跨层特征融合的抛洒风险车辆检测方法[J]. 浙江大学学报(工学版), 2025, 59(2): 300-309.