Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (10): 1955-1965    DOI: 10.3785/j.issn.1008-973X.2023.10.005
计算机技术、自动化技术     
基于自注意力机制的双分支密集人群计数算法
杨天乐(),李玲霞,张为*()
天津大学 微电子学院,天津 300072
Dual-branch crowd counting algorithm based on self-attention mechanism
Tian-le YANG(),Ling-xia LI,Wei ZHANG*()
School of Microelectronics, Tianjin University, Tianjin 300072, China
 全文: PDF(2871 KB)   HTML
摘要:

针对密集人群计数中人头尺度变化大、复杂背景干扰的问题,提出基于自注意力机制的双分支密集人群计数算法. 该算法结合卷积神经网络(CNN)和Transformer 2种网络框架,通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支,分别获取人群局部信息和全局信息. 设计双分支注意力融合模块,以具备连续尺度的人群特征提取能力;通过基于混合注意力模块的Transformer网络提取深度特征,进一步区分复杂背景并聚焦人群区域. 采用位置级-全监督方式和计数级-弱监督方式,在ShanghaiTech Part A、ShanghaiTech Part B、UCF-QNRF、JHU-Crowd++等数据集上进行实验. 结果表明,算法在4个数据集上的性能均优于最近研究,全监督算法在上述数据集的平均绝对误差和均方根误差分别为55.3、6.7、82.9、55.7和93.1、9.8、145.1、248.0,可以实现高密集、高遮挡场景下的准确计数. 特别是在弱监督算法对比中,以低参数量实现了更佳的计数精度,并达到全监督87.9%的计数效果.

关键词: 人群计数深度学习自注意力机制双分支弱监督学习    
Abstract:

A dual-branch crowd counting algorithm based on self-attention mechanism was proposed to solve the problems of large variation in head scale and complex background interference in crowd counting. The algorithm combined two network frameworks, including convolutional neural network (CNN) and Transformer. The multi-scale CNN branch and Transformer branch based on convolution enhanced self-attention module were used to obtain local and global crowd information respectively. The dual-branch attention fusion module was designed to enable continuous-scale crowd feature extraction. The Transformer network with the hybrid attention module was utilized to extract deep features, which facilitated the distinction of complex backgrounds and focused on the crowd regions. The experiments were conducted on ShanghaiTech Part A, ShanghaiTech Part B, UCF-QNRF, JHU-Crowd++ and other datasets using position-level full supervision and count-level weak supervision. Results showed that the performance of the proposed algorithms was better than that of recent studies. The MAE and MSE of the fully supervised algorithm in the above datasets were 55.3, 6.7, 82.9, 55.7, and 93.1, 9.8, 145.1, 248.0, respectively, which could achieve accurate counting in high density and high occlusion scenes. Good counting precision was achieved with low parameters, and a counting accuracy of 87.9% of the full supervision was attained especially in the comparison of weakly supervised algorithms.

Key words: crowd counting    deep learning    self-attention mechanism    dual-branch    weakly supervised learning
收稿日期: 2023-01-13 出版日期: 2023-10-18
CLC:  TP 391  
基金资助: 国家重点研发计划资助项目(2020YFC1522405);省级科技重大专项与工程项目(19ZXZNGX00030)
通讯作者: 张为     E-mail: yangtianle@tju.edu.cn;tjuzhangwei@tju.edu.cn
作者简介: 杨天乐(1999—),男,硕士生,从事数字图像处理、模式识别研究. orcid.org/0000-0003-1162-4372. E-mail: yangtianle@tju.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
杨天乐
李玲霞
张为

引用本文:

杨天乐,李玲霞,张为. 基于自注意力机制的双分支密集人群计数算法[J]. 浙江大学学报(工学版), 2023, 57(10): 1955-1965.

Tian-le YANG,Ling-xia LI,Wei ZHANG. Dual-branch crowd counting algorithm based on self-attention mechanism. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1955-1965.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.10.005        https://www.zjujournals.com/eng/CN/Y2023/V57/I10/1955

图 1  基于自注意力机制的双分支密集人群计数算法整体结构图
图 2  1/4尺度特征的下采样示意图
图 3  卷积增强自注意力模块
图 4  双分支注意力融合模块
图 5  混合注意力模块
图 6  DBCC-Net模型的训练曲线
算法 SHT Part A SHT Part B UCF_QNRF UCF_CC_50 JHU-Crowd++
MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE
MCNN[11] 110.2 173.2 26.4 41.3 277 426 377.6 509.1 188.9 483.4
CSRNet[24] 68.2 115.0 10.6 16.0 266.1 397.5 85.9 309.2
BL[25] 62.8 101.8 7.7 12.7 88.7 154.8 229.3 308.2 75.0 299.9
DM-Count[19] 59.7 95.7 7.4 11.8 85.0 148.0 211.0 291.5
GL[26] 61.3 95.4 7.3 11.7 84.3 147.5 59.9 259.5
FIDT[27] 57.0 103.4 6.9 11.8 89.0 153.5 171.4 233.1 66.6 253.6
CLTR[28] 56.9 95.2 6.5 10.6 85.8 141.3 59.5 240.6
NDConv[29] 61.4 104.18 7.8 13.8 91.2 165.6 167.2 240.6
DFRNet[30] 59.6 100.9 6.9 12.1 80.2 145.5
SGANet[31] 57.6 101.1 6.6 10.2 87.6 152.5 224.6 314.6
RAN[32] 57.9 99.2 7.2 11.9 83.4 141.8 155.0 219.5 59.4 257.6
CTrans-MISN[33] 55.8 95.9 7.3 11.4 95.2 180.1 71.5 280.1
DBCC-Net 55.3 93.1 6.7 9.8 82.9 145.1 147.5 205.1 55.7 248.0
表 1  位置级-全监督对比实验的结果
算法 SHT Part A SHT Part B UCF_QNRF UCF_CC_50 JHU-Crowd++
MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE
Yang[34] 104.6 145.2 12.3 21.2
MATT[15] 80.1 129.4 11.7 17.5 355.0 550.2
TDCrowd[35] 67.9 108.3
TransCrowd[16] 66.1 105.1 9.3 16.1 97.2 168.5 74.9 595.6
CCST[36] 62.8 94.1 8.3 13.4 93.7 166.9 190.7 289.0 61.4 239.3
DBCC-Net 60.1 94.0 7.5 12.5 90.9 156.3 177.1 237.9 61.3 241.8
表 2  计数级-弱监督对比实验的结果
图 7  位置级-全监督人群密度图
图 8  不同方法的可视化对比图
图 9  计数级-弱监督人群关注图
类型 算法 框架方式 MAE MSE Np/106 GFLOPs/109
全监督 CAN[38] CNN 62.3 100.0 18.1 64.6
FIDT[27] CNN 57.0 103.4 66.6 80.1
RAN[32] CNN 57.9 99.2 22.9 115.9
DBCC-Net CNN+Transformer 55.3 93.1 38.0 98.5
弱监督 TransCrowd[16] Transformer 66.1 95.4 86.0 49.3
CCTrans[39] Transformer 64.4 95.4 104.0 57.6
CCST[36] Transformer 62.8 94.1 294.7 323.6
DBCC-Net CNN+Transformer 60.1 94.0 38.0 98.5
表 3  时间及空间复杂度对比的实验结果
算法 框架方式 MAE MSE 参量大小/M
VGG CNN 59.7 95.7 29.0
Swin Transformer 55.5 92.7 104.0
VGG-Swin CNN+Trans
(单分支)
58.6 97.1 33.5
DBCC-Net CNN+Trans
(双分支)
55.3 93.1 38.0
表 4  网络模型框架的对比实验结果
算法 双分支注意力融合 卷积增强自注意力模块 混合注意力模块 MAE MSE
基本模型 58.6 97.1
本研究模型 57.7 96.4
56.5 96.2
55.3 93.1
表 5  DBCC-Net模型消融实验结果
损失函数 MAE MSE
MSE-Loss 60.6 101.2
Bayesian-Loss 57.2 95.6
DM-Loss 55.3 93.1
表 6  损失函数的对比实验结果
融合尺度 MAE MSE
1/4尺度 60.8 98.0
1/8尺度 60.8 95.6
1/16尺度 60.6 96.5
多尺度融合模块 60.1 94.0
表 7  多尺度融合模块的对比实验结果
模块 MAE MSE
传统CBAM模块 60.8 95.4
仅本文空间注意分支 61.2 100.7
仅本文通道注意分支 61.5 98.1
混合注意力模块 60.1 94.0
表 8  混合注意力模块的对比实验结果
1 MA Y, SHUAI H, CHENG W Spatiotemporal dilated convolution with uncertain matching for video-based crowd estimation[J]. IEEE Transactions on Multimedia, 2022, 24: 261- 273
doi: 10.1109/TMM.2021.3050059
2 李萌, 孙艳歌, 郭华平, 等 多层次融合与注意力机制的人群计数算法[J]. 吉林大学学报: 信息科学版, 2022, 40 (6): 1009- 1016
LI Meng, SUN Yan-ge, GUO Hua-ping, et al Multi-level fusion and attention mechanism based crowd counting algorithm[J]. Journal of Jilin University: Information Science Edition, 2022, 40 (6): 1009- 1016
3 LIAN D, CHEN X, LI J, et al Locating and counting heads in crowds with a depth prior[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (12): 9056- 9072
doi: 10.1109/TPAMI.2021.3124956
4 LU H, CAO Z, XIAO Y, et al TasselNet: counting maize tassels in the wild via local counts regression network[J]. Plant Methods, 2017, 13 (1): 1- 17
doi: 10.1186/s13007-016-0152-4
5 XIE W, NOBLE J A, ZISSERMAN A Microscopy cell counting and detection with fully convolutional regression networks[J]. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 2018, 6 (3): 283- 292
doi: 10.1080/21681163.2016.1149104
6 LIANG M, HUANG X, CHEN C, et al Counting and classification of highway vehicles by regression analysis[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16: 2878- 2888
doi: 10.1109/TITS.2015.2424917
7 ZENG C, MA H. Robust Head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting [C]// 20th International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 2069-2072.
8 LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection [C]// 19th International Conference on Pattern Recognition. Tampa: IEEE, 2008: 1-4.
9 刘迪, 郭继昌, 汪昱东, 等 融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报: 自然科学版, 2022, 49 (4): 118- 126
LIU Di, GUO Ji-chang, WANG Yu-dong, et al Multi-scale salient object detection network combining an attention mechanism[J]. Journal of Xidian University: Natural Science, 2022, 49 (4): 118- 126
10 LEMPITSKY V, ZISSERMAN A Learning to count objects in images[J]. Advances in Neural Information Processing Systems, 2010, 23: 1324- 1332
11 ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network [C]// Proceedings of the lEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 589-597.
12 万洪林, 王晓敏, 彭振伟, 等 基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44 (3): 1129- 1136
WAN Hong-lin, WANG Xiao-min, PENG Zhen-wei, et al Dense crowd counting algorithm based on new multi-scale attention mechanism[J]. Journal of Electronics and Information Technology, 2022, 44 (3): 1129- 1136
doi: 10.11999/JEIT210163
13 LIU J, GAO C, MENG D, et al. DecideNet: counting varying density crowds through attention guided detection and density estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5197-5206.
14 XIE J, PANG C, ZHENG Y, et al Multi-scale attention Recalibration Network for crowd counting[J]. Applied Soft Computing, 2022, 117: 108457
doi: 10.1016/j.asoc.2022.108457
15 LEI Y, LIU Y, ZHANG P, et al Towards using count-level weak supervision for crowd counting[J]. Pattern Recognition, 2021, 109: 107616
doi: 10.1016/j.patcog.2020.107616
16 LIANG D, CHEN X, XU W, et al TransCrowd: weakly-supervised crowd counting with transformer[J]. Science China: Information Sciences, 2022, 65 (6): 48- 61
17 LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.
18 CHEN X, WANG X, ZHOU J, et al. Activating more pixels in image super-resolution transformer [EB/OL]. [2022-05-09]. https://arxiv.org/abs/2205.04437.pdf.
19 WANG B, LIU H, SAMARAS D, et al. Distribution matching for crowd counting [C]// Proceedings of the Advances in Neural Information Processing Systems. Vancouver: CA, 2020: 1595-1607.
20 CHAO X, SHANG W, ZHANG F Information-guided flame detection based on faster R-CNN[J]. IEEE Access, 2020, 8: 58923- 58932
doi: 10.1109/ACCESS.2020.2982994
21 IDRESS H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 532-546.
22 IDRESS H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2547-2554.
23 SINDAGI V A, YASARLA R, PATEL V M Jhu-crowd++: large-scale crowd counting dataset and a benchmark method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, (4): 653- 671
24 LI Y, ZHANG X, CHEN D. Csrnet: dilated convolutional neural networks for understanding the highly congested scenes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt lake city: IEEE, 2018: 1091-1100.
25 MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6141-6150.
26 WAN J, LIU Z, CHAN A. A generalized loss function for crowd counting and localization [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1974-1983.
27 LIANG D, XU W, ZHU Y, et al. Focal inverse distance transform maps for crowd localization [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/2102.07925.pdf.
28 LIANG D, XU W, BAI X. An end-to-end transformer model for crowd localization [C]// Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 38-54.
29 ZHONG X, YAN Z, QIN J, et al An improved normed-deformable convolution for crowd counting[J]. IEEE Signal Process Letters, 2022, 29: 1794- 1798
doi: 10.1109/LSP.2022.3198371
30 GAO X, XIE J, CHEN Z, et al Dilated convolution based feature refinement network for crowd localization[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19 (6): 1- 16
31 WANG Q, BRECKON T Crowd counting via segmentation guided attention networks and curriculum loss[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 15233- 43
doi: 10.1109/TITS.2021.3138896
32 CHEN Y, YANG J, ZHANG D, et al Region-aware network: model human’s top-down visual perception mechanism for crowd counting[J]. Neural Networks, 2022, 148: 219- 231
doi: 10.1016/j.neunet.2022.01.015
33 ZENG X, HU S, WANG H, et al. Joint contextual transformer and multi-scale information shared network for crowd counting [C]// 5th International Conference on Pattern Recognition and Artificial Intelligence. Chengdu: IEEE, 2022: 412-417.
34 YANG Y, WU Z, SU L, et al. Weakly-supervised crowd counting learns from sorting rather than locations [C]// Proceedings of European Conference on Computer Vision. Newcastle: Springer, 2020: 1-17.
35 PHUCT P. Attention in crowd counting using the transformer and density map to improve counting result [C]// 8th Nafosted Conference on Information and Computer Science. Hanoi: IEEE, 2021: 65-70.
36 LI B, ZHANG Y, XU H, et al CCST: crowd counting with swin transformer[J]. The Visual Computer, 2023, 39 (7): 2671- 2682
doi: 10.1007/s00371-022-02485-3
37 RONG L, LI C. Coarse and fine-grained attention network with background-aware loss for crowd density map estimation [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3675-3684.
38 LIU W, SALZMANN M, FUA P. Context-aware crowd counting [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5094-5103.
[1] 郑超昊,尹志伟,曾钢锋,许月萍,周鹏,刘莉. 基于时空深度学习模型的数值降水预报后处理[J]. 浙江大学学报(工学版), 2023, 57(9): 1756-1765.
[2] 杨哲,葛洪伟,李婷. 特征融合与分发的多专家并行推荐算法框架[J]. 浙江大学学报(工学版), 2023, 57(7): 1317-1325.
[3] 李云红,段姣姣,苏雪平,张蕾涛,于惠康,刘杏瑞. 基于改进生成对抗网络的书法字生成算法[J]. 浙江大学学报(工学版), 2023, 57(7): 1326-1334.
[4] 权巍,蔡永青,王超,宋佳,孙鸿凯,李林轩. 基于3D-ResNet双流网络的VR病评估模型[J]. 浙江大学学报(工学版), 2023, 57(7): 1345-1353.
[5] 周欣磊,顾海挺,刘晶,许月萍,耿芳,王冲. 基于集成学习与深度学习的日供水量预测方法[J]. 浙江大学学报(工学版), 2023, 57(6): 1120-1127.
[6] 刘沛丰,钱璐,赵兴炜,陶波. 航空装配领域中命名实体识别的持续学习框架[J]. 浙江大学学报(工学版), 2023, 57(6): 1186-1194.
[7] 赵嘉墀,王天琪,曾丽芳,邵雪明. 基于GRU的扑翼非定常气动特性快速预测[J]. 浙江大学学报(工学版), 2023, 57(6): 1251-1256.
[8] 曹晓璐,卢富男,朱翔,翁立波,卢书芳,高飞. 基于草图的兼容性服装生成方法[J]. 浙江大学学报(工学版), 2023, 57(5): 939-947.
[9] 苏育挺,陆荣烜,张为. 基于注意力和自适应权重的车辆重识别算法[J]. 浙江大学学报(工学版), 2023, 57(4): 712-718.
[10] 马庆禄,鲁佳萍,唐小垚,段学锋. 改进YOLOv5s的公路隧道烟火检测方法[J]. 浙江大学学报(工学版), 2023, 57(4): 784-794.
[11] 曾耀,高法钦. 基于改进YOLOv5的电子元件表面缺陷检测算法[J]. 浙江大学学报(工学版), 2023, 57(3): 455-465.
[12] 兰欢,余建波. 基于深度学习三维成型的钢板表面缺陷检测[J]. 浙江大学学报(工学版), 2023, 57(3): 466-476.
[13] 曾菊香,王平辉,丁益东,兰林,蔡林熹,管晓宏. 面向节点分类的图神经网络节点嵌入增强模型[J]. 浙江大学学报(工学版), 2023, 57(2): 219-225.
[14] 刘超,孔兵,杜国王,周丽华,陈红梅,包崇明. 高阶互信息最大化与伪标签指导的深度聚类[J]. 浙江大学学报(工学版), 2023, 57(2): 299-309.
[15] 鲁建厦,包秦,汤洪涛,邵益平,赵文彬. 无设备人体追踪系统的择优标签方法[J]. 浙江大学学报(工学版), 2023, 57(2): 415-425.