Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2023, Vol. 57 Issue (10): 1955-1965    DOI: 10.3785/j.issn.1008-973X.2023.10.005
    
Dual-branch crowd counting algorithm based on self-attention mechanism
Tian-le YANG(),Ling-xia LI,Wei ZHANG*()
School of Microelectronics, Tianjin University, Tianjin 300072, China
Download: HTML     PDF(2871KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A dual-branch crowd counting algorithm based on self-attention mechanism was proposed to solve the problems of large variation in head scale and complex background interference in crowd counting. The algorithm combined two network frameworks, including convolutional neural network (CNN) and Transformer. The multi-scale CNN branch and Transformer branch based on convolution enhanced self-attention module were used to obtain local and global crowd information respectively. The dual-branch attention fusion module was designed to enable continuous-scale crowd feature extraction. The Transformer network with the hybrid attention module was utilized to extract deep features, which facilitated the distinction of complex backgrounds and focused on the crowd regions. The experiments were conducted on ShanghaiTech Part A, ShanghaiTech Part B, UCF-QNRF, JHU-Crowd++ and other datasets using position-level full supervision and count-level weak supervision. Results showed that the performance of the proposed algorithms was better than that of recent studies. The MAE and MSE of the fully supervised algorithm in the above datasets were 55.3, 6.7, 82.9, 55.7, and 93.1, 9.8, 145.1, 248.0, respectively, which could achieve accurate counting in high density and high occlusion scenes. Good counting precision was achieved with low parameters, and a counting accuracy of 87.9% of the full supervision was attained especially in the comparison of weakly supervised algorithms.



Key wordscrowd counting      deep learning      self-attention mechanism      dual-branch      weakly supervised learning     
Received: 13 January 2023      Published: 18 October 2023
CLC:  TP 391  
Fund:  国家重点研发计划资助项目(2020YFC1522405);省级科技重大专项与工程项目(19ZXZNGX00030)
Corresponding Authors: Wei ZHANG     E-mail: yangtianle@tju.edu.cn;tjuzhangwei@tju.edu.cn
Cite this article:

Tian-le YANG,Ling-xia LI,Wei ZHANG. Dual-branch crowd counting algorithm based on self-attention mechanism. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1955-1965.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.10.005     OR     https://www.zjujournals.com/eng/Y2023/V57/I10/1955


基于自注意力机制的双分支密集人群计数算法

针对密集人群计数中人头尺度变化大、复杂背景干扰的问题,提出基于自注意力机制的双分支密集人群计数算法. 该算法结合卷积神经网络(CNN)和Transformer 2种网络框架,通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支,分别获取人群局部信息和全局信息. 设计双分支注意力融合模块,以具备连续尺度的人群特征提取能力;通过基于混合注意力模块的Transformer网络提取深度特征,进一步区分复杂背景并聚焦人群区域. 采用位置级-全监督方式和计数级-弱监督方式,在ShanghaiTech Part A、ShanghaiTech Part B、UCF-QNRF、JHU-Crowd++等数据集上进行实验. 结果表明,算法在4个数据集上的性能均优于最近研究,全监督算法在上述数据集的平均绝对误差和均方根误差分别为55.3、6.7、82.9、55.7和93.1、9.8、145.1、248.0,可以实现高密集、高遮挡场景下的准确计数. 特别是在弱监督算法对比中,以低参数量实现了更佳的计数精度,并达到全监督87.9%的计数效果.


关键词: 人群计数,  深度学习,  自注意力机制,  双分支,  弱监督学习 
Fig.1 Overall structure of dual-branch crowd counting algorithm based on self-attention mechanism
Fig.2 Down sampled diagram of 1/4 scale feature
Fig.3 Convolution enhanced self-attention module
Fig.4 Dual-branch attention fusion module
Fig.5 Hybrid attention module
Fig.6 Training curve of DBCC-Net
算法 SHT Part A SHT Part B UCF_QNRF UCF_CC_50 JHU-Crowd++
MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE
MCNN[11] 110.2 173.2 26.4 41.3 277 426 377.6 509.1 188.9 483.4
CSRNet[24] 68.2 115.0 10.6 16.0 266.1 397.5 85.9 309.2
BL[25] 62.8 101.8 7.7 12.7 88.7 154.8 229.3 308.2 75.0 299.9
DM-Count[19] 59.7 95.7 7.4 11.8 85.0 148.0 211.0 291.5
GL[26] 61.3 95.4 7.3 11.7 84.3 147.5 59.9 259.5
FIDT[27] 57.0 103.4 6.9 11.8 89.0 153.5 171.4 233.1 66.6 253.6
CLTR[28] 56.9 95.2 6.5 10.6 85.8 141.3 59.5 240.6
NDConv[29] 61.4 104.18 7.8 13.8 91.2 165.6 167.2 240.6
DFRNet[30] 59.6 100.9 6.9 12.1 80.2 145.5
SGANet[31] 57.6 101.1 6.6 10.2 87.6 152.5 224.6 314.6
RAN[32] 57.9 99.2 7.2 11.9 83.4 141.8 155.0 219.5 59.4 257.6
CTrans-MISN[33] 55.8 95.9 7.3 11.4 95.2 180.1 71.5 280.1
DBCC-Net 55.3 93.1 6.7 9.8 82.9 145.1 147.5 205.1 55.7 248.0
Tab.1 Results of position-level full supervision comparison experiment
算法 SHT Part A SHT Part B UCF_QNRF UCF_CC_50 JHU-Crowd++
MAE MSE MAE MSE MAE MSE MAE MSE MAE MSE
Yang[34] 104.6 145.2 12.3 21.2
MATT[15] 80.1 129.4 11.7 17.5 355.0 550.2
TDCrowd[35] 67.9 108.3
TransCrowd[16] 66.1 105.1 9.3 16.1 97.2 168.5 74.9 595.6
CCST[36] 62.8 94.1 8.3 13.4 93.7 166.9 190.7 289.0 61.4 239.3
DBCC-Net 60.1 94.0 7.5 12.5 90.9 156.3 177.1 237.9 61.3 241.8
Tab.2 Results of count-level weakly supervision comparison experiment
Fig.7 Position-level full supervision crowd density map
Fig.8 Visual comparison of different methods
Fig.9 Count-level weakly supervision crowd density map
类型 算法 框架方式 MAE MSE Np/106 GFLOPs/109
全监督 CAN[38] CNN 62.3 100.0 18.1 64.6
FIDT[27] CNN 57.0 103.4 66.6 80.1
RAN[32] CNN 57.9 99.2 22.9 115.9
DBCC-Net CNN+Transformer 55.3 93.1 38.0 98.5
弱监督 TransCrowd[16] Transformer 66.1 95.4 86.0 49.3
CCTrans[39] Transformer 64.4 95.4 104.0 57.6
CCST[36] Transformer 62.8 94.1 294.7 323.6
DBCC-Net CNN+Transformer 60.1 94.0 38.0 98.5
Tab.3 Experimental results of time and space complexity comparison
算法 框架方式 MAE MSE 参量大小/M
VGG CNN 59.7 95.7 29.0
Swin Transformer 55.5 92.7 104.0
VGG-Swin CNN+Trans
(单分支)
58.6 97.1 33.5
DBCC-Net CNN+Trans
(双分支)
55.3 93.1 38.0
Tab.4 Comparison of experimental results with network model framework
算法 双分支注意力融合 卷积增强自注意力模块 混合注意力模块 MAE MSE
基本模型 58.6 97.1
本研究模型 57.7 96.4
56.5 96.2
55.3 93.1
Tab.5 Results of ablation experiments for DBCC-Net
损失函数 MAE MSE
MSE-Loss 60.6 101.2
Bayesian-Loss 57.2 95.6
DM-Loss 55.3 93.1
Tab.6 Comparison experiment results of loss function
融合尺度 MAE MSE
1/4尺度 60.8 98.0
1/8尺度 60.8 95.6
1/16尺度 60.6 96.5
多尺度融合模块 60.1 94.0
Tab.7 Comparative experiment results of multiple fusion modules
模块 MAE MSE
传统CBAM模块 60.8 95.4
仅本文空间注意分支 61.2 100.7
仅本文通道注意分支 61.5 98.1
混合注意力模块 60.1 94.0
Tab.8 Comparative experimental results of hybrid attention modules
[1]   MA Y, SHUAI H, CHENG W Spatiotemporal dilated convolution with uncertain matching for video-based crowd estimation[J]. IEEE Transactions on Multimedia, 2022, 24: 261- 273
doi: 10.1109/TMM.2021.3050059
[2]   李萌, 孙艳歌, 郭华平, 等 多层次融合与注意力机制的人群计数算法[J]. 吉林大学学报: 信息科学版, 2022, 40 (6): 1009- 1016
LI Meng, SUN Yan-ge, GUO Hua-ping, et al Multi-level fusion and attention mechanism based crowd counting algorithm[J]. Journal of Jilin University: Information Science Edition, 2022, 40 (6): 1009- 1016
[3]   LIAN D, CHEN X, LI J, et al Locating and counting heads in crowds with a depth prior[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (12): 9056- 9072
doi: 10.1109/TPAMI.2021.3124956
[4]   LU H, CAO Z, XIAO Y, et al TasselNet: counting maize tassels in the wild via local counts regression network[J]. Plant Methods, 2017, 13 (1): 1- 17
doi: 10.1186/s13007-016-0152-4
[5]   XIE W, NOBLE J A, ZISSERMAN A Microscopy cell counting and detection with fully convolutional regression networks[J]. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 2018, 6 (3): 283- 292
doi: 10.1080/21681163.2016.1149104
[6]   LIANG M, HUANG X, CHEN C, et al Counting and classification of highway vehicles by regression analysis[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16: 2878- 2888
doi: 10.1109/TITS.2015.2424917
[7]   ZENG C, MA H. Robust Head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting [C]// 20th International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 2069-2072.
[8]   LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection [C]// 19th International Conference on Pattern Recognition. Tampa: IEEE, 2008: 1-4.
[9]   刘迪, 郭继昌, 汪昱东, 等 融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报: 自然科学版, 2022, 49 (4): 118- 126
LIU Di, GUO Ji-chang, WANG Yu-dong, et al Multi-scale salient object detection network combining an attention mechanism[J]. Journal of Xidian University: Natural Science, 2022, 49 (4): 118- 126
[10]   LEMPITSKY V, ZISSERMAN A Learning to count objects in images[J]. Advances in Neural Information Processing Systems, 2010, 23: 1324- 1332
[11]   ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network [C]// Proceedings of the lEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 589-597.
[12]   万洪林, 王晓敏, 彭振伟, 等 基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44 (3): 1129- 1136
WAN Hong-lin, WANG Xiao-min, PENG Zhen-wei, et al Dense crowd counting algorithm based on new multi-scale attention mechanism[J]. Journal of Electronics and Information Technology, 2022, 44 (3): 1129- 1136
doi: 10.11999/JEIT210163
[13]   LIU J, GAO C, MENG D, et al. DecideNet: counting varying density crowds through attention guided detection and density estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5197-5206.
[14]   XIE J, PANG C, ZHENG Y, et al Multi-scale attention Recalibration Network for crowd counting[J]. Applied Soft Computing, 2022, 117: 108457
doi: 10.1016/j.asoc.2022.108457
[15]   LEI Y, LIU Y, ZHANG P, et al Towards using count-level weak supervision for crowd counting[J]. Pattern Recognition, 2021, 109: 107616
doi: 10.1016/j.patcog.2020.107616
[16]   LIANG D, CHEN X, XU W, et al TransCrowd: weakly-supervised crowd counting with transformer[J]. Science China: Information Sciences, 2022, 65 (6): 48- 61
[17]   LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.
[18]   CHEN X, WANG X, ZHOU J, et al. Activating more pixels in image super-resolution transformer [EB/OL]. [2022-05-09]. https://arxiv.org/abs/2205.04437.pdf.
[19]   WANG B, LIU H, SAMARAS D, et al. Distribution matching for crowd counting [C]// Proceedings of the Advances in Neural Information Processing Systems. Vancouver: CA, 2020: 1595-1607.
[20]   CHAO X, SHANG W, ZHANG F Information-guided flame detection based on faster R-CNN[J]. IEEE Access, 2020, 8: 58923- 58932
doi: 10.1109/ACCESS.2020.2982994
[21]   IDRESS H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 532-546.
[22]   IDRESS H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2547-2554.
[23]   SINDAGI V A, YASARLA R, PATEL V M Jhu-crowd++: large-scale crowd counting dataset and a benchmark method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, (4): 653- 671
[24]   LI Y, ZHANG X, CHEN D. Csrnet: dilated convolutional neural networks for understanding the highly congested scenes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt lake city: IEEE, 2018: 1091-1100.
[25]   MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6141-6150.
[26]   WAN J, LIU Z, CHAN A. A generalized loss function for crowd counting and localization [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1974-1983.
[27]   LIANG D, XU W, ZHU Y, et al. Focal inverse distance transform maps for crowd localization [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/2102.07925.pdf.
[28]   LIANG D, XU W, BAI X. An end-to-end transformer model for crowd localization [C]// Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 38-54.
[29]   ZHONG X, YAN Z, QIN J, et al An improved normed-deformable convolution for crowd counting[J]. IEEE Signal Process Letters, 2022, 29: 1794- 1798
doi: 10.1109/LSP.2022.3198371
[30]   GAO X, XIE J, CHEN Z, et al Dilated convolution based feature refinement network for crowd localization[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19 (6): 1- 16
[31]   WANG Q, BRECKON T Crowd counting via segmentation guided attention networks and curriculum loss[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 15233- 43
doi: 10.1109/TITS.2021.3138896
[32]   CHEN Y, YANG J, ZHANG D, et al Region-aware network: model human’s top-down visual perception mechanism for crowd counting[J]. Neural Networks, 2022, 148: 219- 231
doi: 10.1016/j.neunet.2022.01.015
[33]   ZENG X, HU S, WANG H, et al. Joint contextual transformer and multi-scale information shared network for crowd counting [C]// 5th International Conference on Pattern Recognition and Artificial Intelligence. Chengdu: IEEE, 2022: 412-417.
[34]   YANG Y, WU Z, SU L, et al. Weakly-supervised crowd counting learns from sorting rather than locations [C]// Proceedings of European Conference on Computer Vision. Newcastle: Springer, 2020: 1-17.
[35]   PHUCT P. Attention in crowd counting using the transformer and density map to improve counting result [C]// 8th Nafosted Conference on Information and Computer Science. Hanoi: IEEE, 2021: 65-70.
[36]   LI B, ZHANG Y, XU H, et al CCST: crowd counting with swin transformer[J]. The Visual Computer, 2023, 39 (7): 2671- 2682
doi: 10.1007/s00371-022-02485-3
[37]   RONG L, LI C. Coarse and fine-grained attention network with background-aware loss for crowd density map estimation [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3675-3684.
[38]   LIU W, SALZMANN M, FUA P. Context-aware crowd counting [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5094-5103.
[1] Chao-hao ZHENG,Zhi-wei YIN,Gang-feng ZENG,Yue-ping XU,Peng ZHOU,Li LIU. Post-processing of numerical precipitation forecast based on spatial-temporal deep learning model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1756-1765.
[2] Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.
[3] Yun-hong LI,Jiao-jiao DUAN,Xue-ping SU,Lei-tao ZHANG,Hui-kang YU,Xing-rui LIU. Calligraphy generation algorithm based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1326-1334.
[4] Wei QUAN,Yong-qing CAI,Chao WANG,Jia SONG,Hong-kai SUN,Lin-xuan LI. VR sickness estimation model based on 3D-ResNet two-stream network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1345-1353.
[5] Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG. Daily water supply prediction method based on integrated learning and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1120-1127.
[6] Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO. Continual learning framework of named entity recognition in aviation assembly domain[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1186-1194.
[7] Jia-chi ZHAO,Tian-qi WANG,Li-fang ZENG,Xue-ming SHAO. Rapid prediction of unsteady aerodynamic characteristics of flapping wing based on GRU[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1251-1256.
[8] Xiao-lu CAO,Fu-nan LU,Xiang ZHU,Li-bo WENG,Shu-fang LU,Fei GAO. Sketch-based compatible clothing image generation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 939-947.
[9] Yu-ting SU,Rong-xuan LU,Wei ZHANG. Vehicle re-identification algorithm based on attention mechanism and adaptive weight[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 712-718.
[10] Qing-lu MA,Jia-ping LU,Xiao-yao TANG,Xue-feng DUAN. Improved YOLOv5s flame and smoke detection method in road tunnels[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 784-794.
[11] Yao ZENG,Fa-qin GAO. Surface defect detection algorithm of electronic components based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 455-465.
[12] Huan LAN,Jian-bo YU. Steel surface defect detection based on deep learning 3D reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 466-476.
[13] Ju-xiang ZENG,Ping-hui WANG,Yi-dong DING,Lin LAN,Lin-xi CAI,Xiao-hong GUAN. Graph neural network based node embedding enhancement model for node classification[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 219-225.
[14] Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.
[15] Jian-sha LU,Qin BAO,Hong-tao TANG,Yi-ping SHAO,Wen-bin ZHAO. Optimal tag selection method for device-free human tracking system[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 415-425.