Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (7): 1404-1415    DOI: 10.3785/j.issn.1008-973X.2026.07.004
    
Small organism detection in underwater color-cast environments based on improved RT-DETR
Shaojiang DONG(),Tao XIAO,Zhenming LV,Haoran XIA,Jiayuan LUO,Shizheng SUN,Xia ZHANG,Chao LIU
School of Mechatronics and Vehicle Engineering, Chongqing Jiaotong University, Chongqing 400074, China
Download: HTML     PDF(3433KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A detection method based on an improved RT-DETR (FES-DETR) was proposed to achieve fast and accurate detection of underwater small organisms, and address the poor detection performance of the existing models in underwater color-cast environments. An efficient multi-scale attention feature extraction (Faster-Rep-EMA) module was designed to optimize the original BasicBlock in the backbone network, thereby improving the feature extraction ability and computational efficiency for weak targets under color-cast interference. The entanglement Transformer block (ETB) was integrated with the attention-based intra-scale feature interaction (AIFI) module in the neck encoding network to achieve the joint optimization of frequency-domain and spatial-domain features, which could enhance the feature representation of color-cast images. A lightweight small object enhancement pyramid (SOEP) module was designed to enhance the detection performance of the model for small targets and reduce the computational redundancy. Experimental results showed that the FES-DETR significantly improved the detection performance. Compared with RT-DETR-r18, the precision and recall were improved by 2.6 and 2.3 percentage points, respectively; the mAP@0.5 and the mAP@0.5:0.95 were improved by 3.2 and 2.1 percentage points, respectively; the number of parameters and computational complexity were decreased by 3.0 M and 8.5 G, respectively; and the FPS was increased to 95.7 frames per second. Compared with mainstream target detection models such as YOLO series, this model showed more superior performance, providing an effective technical approach for the detection of underwater small organisms.



Key wordscolor-cast environment      small organism      target detection      RT-DETR      entanglement Transformer     
Received: 15 June 2025      Published: 23 May 2026
CLC:  TP 391.4  
Fund:  重庆市技术创新与应用发展专项重点资助项目(CSTB2024TIAD-KPX0081);重庆市自然科学基金创新发展联合基金资助项目(CSTB2024NSCQ-LZX0024);重庆市教育委员会科学技术研究资助项目(KJZD-K202300711).
Cite this article:

Shaojiang DONG,Tao XIAO,Zhenming LV,Haoran XIA,Jiayuan LUO,Shizheng SUN,Xia ZHANG,Chao LIU. Small organism detection in underwater color-cast environments based on improved RT-DETR. Journal of ZheJiang University (Engineering Science), 2026, 60(7): 1404-1415.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.07.004     OR     https://www.zjujournals.com/eng/Y2026/V60/I7/1404


基于改进RT-DETR的水下色偏环境中小型生物检测

为了实现对水下小型生物的快速、准确检测,针对现有模型在水下色偏环境中检测性能差的问题,提出基于改进RT-DETR的检测方法(FES-DETR). 在主干网络中设计高效多尺度注意力特征提取(Faster-Rep-EMA)模块,以优化原有的BasicBlock,提高对色偏干扰下微弱目标的特征提取能力和计算效率. 在颈部编码网络中,将纠缠Transformer块(ETB)与基于注意力的尺度内特征交互(AIFI)模块融合,实现频率域和空间域特征的联合优化,增强色偏图像的特征表达. 设计轻量化小目标增强金字塔(SOEP)模块,增强模型对小目标的检测性能并降低计算冗余. 实验结果表明,FES-DETR显著提高了检测性能,准确率、召回率较RT-DETR-r18分别提升了2.6和2.3个百分点,mAP@0.5和mAP@0.5∶0.95分别提升了3.2和2.1个百分点,参数量和计算量分别下降了3.0 M和8.5 G,FPS提高至95.7帧/s. 与YOLO系列等主流目标检测模型相比,该模型展现出更优越的性能,为水下小型生物检测提供了高效的技术手段.


关键词: 色偏环境,  小型生物,  目标检测,  RT-DETR,  纠缠Transformer 
Fig.1 Structure diagram of small organism detection model (FES-DETR) targeting underwater color-cast environments
Fig.2 Structure diagram of PConv module
Fig.3 Structure diagram of RepPConv module fusing partial convolution and re-parameterized convolution
Fig.4 Structure diagram of EMA module
Fig.5 Structure diagram of Faster-Rep-EMA module
Fig.6 Structure diagram of ETB
Fig.7 Structure diagram of SOEP module
Fig.8 Structure diagram of SPDConv
Fig.9 Structure diagram of CSPOmnikernel module
Fig.10 Sample images of DUO dataset
参数数值参数数值
训练轮数250初始学习率10?4
批量大小32动量0.9
输入图像像素640×640权重衰减系数10?4
Tab.1 Settings of experimental parameters
分组配置mAP@0.5/%Np/MFLOPs/GFPS/(帧·s?1)
EMA_no83.116.147.199.7
EMA_1683.917.047.498.6
EMA_3284.316.447.299.4
EMA_6483.616.747.498.9
Tab.2 Experimental results with different feature grouping numbers
FREETB-AIFISOEPP/%R/%mAP@0.5/%mAP@0.5?0.95/%Np/MFLOPs/GFPS/(帧·s ?1)
×××84.874.982.463.220.856.985.5
××86.576.184.364.616.447.299.4
××87.176.684.964.822.160.373.5
××86.976.484.264.617.750.792.6
×86.476.183.964.118.354.685.3
×86.275.884.164.416.951.5100.0
×87.276.984.464.920.456.780.2
87.477.285.665.317.848.495.7
Tab.3 Results of ablation experiments on each module
rmAP@0.5/%Np/MFLOPs/GFPS/(帧·s?1)
1/285.418.148.695.4
1/485.617.848.495.7
1/885.017.648.396.3
Tab.4 Results of comparative experiment on different channel numbers
模型P/%R/%mAP@0.5/%mAP@0.5?0.95/%Np/MFLOPs/GFPS/(帧·s?1)
Faster R-CNN75.870.473.157.241.1126.746.5
YOLOv5s81.071.277.661.510.123.2123.7
YOLOv8n83.772.679.361.77.917.6128.4
YOLOv9t84.573.180.462.19.820.397.2
YOLOv10n84.773.582.362.87.718.4133.4
YOLOv11n85.275.282.763.49.419.1148.2
Deformable-DETR84.273.881.762.340.688.268.6
RT-DETR-r5085.375.382.963.641.9130.863.7
RT-DETR-r3484.474.182.262.731.174.582.3
RT-DETR-r1884.874.982.463.220.856.985.5
FES-DETR87.477.285.665.317.848.495.7
Tab.5 Results of comparative experiment of FES-DETR and mainstream object detection algorithms
Fig.11 Comparison of experimental metrics for multiple algorithms
Fig.12 Visual comparison of detection results of multiple algorithms
Fig.13 Visual comparison of heatmaps of multiple algorithms
[1]   ELMEZAIN M, SAAD SAOUD L, SULTAN A, et al Advancing underwater vision: a survey of deep learning models for underwater object recognition and tracking[J]. IEEE Access, 2025, 13: 17830- 17867
doi: 10.1109/ACCESS.2025.3534098
[2]   SHI P, XU X, NI J, et al Underwater biological detection algorithm based on improved faster-RCNN[J]. Water, 2021, 13 (17): 2420
doi: 10.3390/w13172420
[3]   张艳, 孙晶雪, 孙叶美, 等 基于分割注意力与线性变换的轻量化目标检测[J]. 浙江大学学报: 工学版, 2023, 57 (6): 1195- 1204
ZHANG Yan, SUN Jingxue, SUN Yemei, et al Lightweight object detection based on split attention and linear transformation[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (6): 1195- 1204
doi: 10.3785/j.issn.1008-973X.2023.06.015
[4]   闵锋, 张雨薇, 刘煜晖, 等 改进YOLOv8的轻量化水下生物检测模型[J]. 计算机工程与应用, 2025, 61 (6): 96- 105
MIN Feng, ZHANG Yuwei, LIU Yuhui, et al Improving lightweight underwater biological detection model of YOLOv8[J]. Computer Engineering and Applications, 2025, 61 (6): 96- 105
doi: 10.3778/j.issn.1002-8331.2408-0411
[5]   GUO L, LIU X, YE D, et al Underwater object detection algorithm integrating image enhancement and deformable convolution[J]. Ecological Informatics, 2025, 89: 103185
doi: 10.1016/j.ecoinf.2025.103185
[6]   ZHOU H, KONG M, YUAN H, et al Real-time underwater object detection technology for complex underwater environments based on deep learning[J]. Ecological Informatics, 2024, 82: 102680
doi: 10.1016/j.ecoinf.2024.102680
[7]   ZHANG W, WANG H, LI H, et al Dual-stream feature pyramid network with task interaction for underwater object detection[J]. Digital Signal Processing, 2025, 163: 105199
doi: 10.1016/j.dsp.2025.105199
[8]   CARION N, MASSA F, SYNNAEVE G, et al. End-to-end object detection with Transformers [C]// European Conference on Computer Vision. [S. l. ]: Springer, 2020: 213–229.
[9]   ZHU X, SU W, LU L, et al. Deformable DETR: deformable Transformers for end-to-end object detection [EB/OL]. (2020-07-09) [2025-06-01]. https://arxiv.org/abs/2010.04159.
[10]   ZHANG H, LI F, LIU S, et al. DINO: DETR with improved denoising anchor boxes for end-to-end object detection [EB/OL]. (2022-03-07) [2025-06-01]. https://arxiv.org/abs/2203.03605.
[11]   ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965–16974.
[12]   JAMIESON S, HOW J P, GIRDHAR Y. DeepSeeColor: realtime adaptive color correction for autonomous underwater vehicles via deep learning methods [C]// Proceedings of the IEEE International Conference on Robotics and Automation. London: IEEE, 2023: 3095–3101.
[13]   吕振鸣, 董绍江, 夏宗佑, 等 基于改进CycleGAN的多失真类型水下图像增强[J]. 浙江大学学报: 工学版, 2025, 59 (6): 1148- 1158
LV Zhenming, DONG Shaojiang, XIA Zongyou, et al Multi-distortion type underwater image enhancement based on improved CycleGAN[J]. Journal of Zhejiang University: Engineering Science, 2025, 59 (6): 1148- 1158
[14]   CHEN J, KAO S H, HE H, et al. Run, don’t walk: chasing higher FLOPS for faster neural networks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 12021–12031.
[15]   OUYANG D, HE S, ZHANG G, et al. Efficient multi-scale attention module with cross-spatial learning [C]// Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1–5.
[16]   BERMAN D, LEVY D, AVIDAN S, et al Underwater single image color restoration using haze-lines and a new quantitative dataset[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43 (8): 2822- 2837
doi: 10.1109/tpami.2020.2977624
[17]   SUN Y, XU C, YANG J, et al. Frequency-spatial entanglement learning for camouflaged object detection [C]// European Conference on Computer Vision. Milan: Springer, 2024: 343–360.
[18]   KHALILI B, SMYTH A W SOD-YOLOv8: enhancing YOLOv8 for small object detection in aerial imagery and traffic scenes[J]. Sensors, 2024, 24 (19): 6209
[19]   SUNKARA R, LUO T. No more strided convolutions or pooling: a new CNN building block for low-resolution images and small objects [C]// Machine Learning and Knowledge Discovery in Databases. Grenoble: Springer, 2023: 443–459.
[20]   CUI Y, REN W, KNOLL A Omni-kernel modulation for universal image restoration[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2024, 34 (12): 12496- 12509
doi: 10.1109/TCSVT.2024.3429557
[21]   LIU C, LI H, WANG S, et al. A dataset and benchmark of underwater object detection for robot picking [C]// Proceedings of the IEEE International Conference on Multimedia & Expo Workshops. Shenzhen: IEEE, 2021: 1–6.
[22]   WANG C Y, YEH I H, LIAO H Y M. YOLOv9: learning what you want to learn using programmable gradient information [C]// European Conference on Computer Vision. Milan: Springer, 2024: 1–21.
[23]   WANG A, CHEN H, LIU L, et al. YOLOv10: real-time end-to-end object detection [EB/OL]. (2024-05-13) [2025-06-06]. https://arxiv.org/abs/2405.14458.
[24]   KHANAM R, HUSSAIN M. YOLOv11: an overview of the key architectural enhancements [EB/OL]. (2024-10-09) [2025-06-06]. https://arxiv.org/abs/2410.17725.
[1] Tianhe YU,Wenlong WANG,Yong LIU,Zhuangzhuang YANG,Shanchong HOU. Improved algorithm for identifying occluded vehicles and pedestrians in foggy images[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 738-750.
[2] Yaolian SONG,Chi PENG,Jingmin TANG,Xuanzhi ZHAO,Guicai YU. Small object detection algorithm for optical remote sensing images based on fusion attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(4): 763-771.
[3] Hui LIU,Fangxiu WANG,Yi WANG,Zibo HUANG,Chen SU. Lightweight improved RT-DETR algorithm for grape leaf disease detection[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(3): 604-613.
[4] Yuyu MENG,Chuile KONG,Jiuyuan HUO,Zeyu WU. UAV small target detection algorithm based on reconstruction of YOLOv11[J]. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 303-312.
[5] Yahong ZHAI,Yaling CHEN,Longyan XU,Yu GONG. Improved YOLOv8s lightweight small target detection algorithm of UAV aerial image[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(8): 1708-1717.
[6] Gengliang LIANG,Shuguang HAN. Denim fabric defect detection algorithm based on improved RT-DETR[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1169-1178.
[7] Junyin WANG,Bin WEN,Yanjun SHEN,Jun ZHANG,Zihao WANG. Surface defect detection method for aluminum profiles based on improved YOLOv7-tiny[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 523-534.
[8] Huijuan ZHANG,Kunpeng LI,Miaoxin JI,Zhenjiang LIU,Jianjuan LIU,Chi ZHANG. UAV detection algorithm based on spatial correlation enhancement[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 468-479.
[9] Jiaming LV,Feng ZHANG,Yabo LUO. Improved YOLOv5s based target detection algorithm for tobacco stem material[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2438-2446.
[10] Henghui MO,linjing WEI. Improved YOLOv7 based apple target detection in complex environment[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(12): 2447-2458.
[11] Jun HAN,Xiao-ping YUAN,Zhun WANG,Ye CHEN. UAV dense small target detection algorithm based on YOLOv5s[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1224-1233.
[12] Yun-zuo ZHANG,Wei GUO,Zhao-quan CAI,Wen-bo LI. Remote sensing image target detection combining multi-scale and attention mechanism[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2215-2223.
[13] Jin-hai ZHOU,Shi-yi ZHOU,Yang CHANG,Geng-jun WU,Yi-chuan WANG. Multi-human target tracking based on baseband signals of ultra wide band radar[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(6): 1208-1214.
[14] Li-feng XU,Hai-fan HUANG,Wei-long DING,Yu-lei FAN. Detection of small fruit target based on improved DenseNet[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(2): 377-385.
[15] Pu ZHENG,Hong-yang BAI,Wei LI,Hong-wei GUO. Small target detection algorithm in complex background[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1777-1784.