Dual-branch crowd counting algorithm based on self-attention mechanism

doi:10.3785/j.issn.1008-973X.2023.10.005

Journal of ZheJiang University (Engineering Science)

2023, Vol. 57

Issue (10): 1955-1965 DOI: 10.3785/j.issn.1008-973X.2023.10.005

Dual-branch crowd counting algorithm based on self-attention mechanism

Tian-le YANG(

),Ling-xia LI,Wei ZHANG*(

)

School of Microelectronics, Tianjin University, Tianjin 300072, China

Download:

HTML

PDF(2871KB) HTML
Export: BibTeX | EndNote (RIS)

Abstract

A dual-branch crowd counting algorithm based on self-attention mechanism was proposed to solve the problems of large variation in head scale and complex background interference in crowd counting. The algorithm combined two network frameworks, including convolutional neural network (CNN) and Transformer. The multi-scale CNN branch and Transformer branch based on convolution enhanced self-attention module were used to obtain local and global crowd information respectively. The dual-branch attention fusion module was designed to enable continuous-scale crowd feature extraction. The Transformer network with the hybrid attention module was utilized to extract deep features, which facilitated the distinction of complex backgrounds and focused on the crowd regions. The experiments were conducted on ShanghaiTech Part A, ShanghaiTech Part B, UCF-QNRF, JHU-Crowd++ and other datasets using position-level full supervision and count-level weak supervision. Results showed that the performance of the proposed algorithms was better than that of recent studies. The MAE and MSE of the fully supervised algorithm in the above datasets were 55.3, 6.7, 82.9, 55.7, and 93.1, 9.8, 145.1, 248.0, respectively, which could achieve accurate counting in high density and high occlusion scenes. Good counting precision was achieved with low parameters, and a counting accuracy of 87.9% of the full supervision was attained especially in the comparison of weakly supervised algorithms.

Key words： crowd counting deep learning self-attention mechanism dual-branch weakly supervised learning

Received: 13 January 2023 Published: 18 October 2023

CLC:

TP 391

Fund: 国家重点研发计划资助项目（2020YFC1522405）；省级科技重大专项与工程项目（19ZXZNGX00030）

Corresponding Authors: Wei ZHANG E-mail: yangtianle@tju.edu.cn;tjuzhangwei@tju.edu.cn

	Service
	E-mail this article
	Add to my bookshelf
	Add to citation manager
	E-mail Alert
	RSS
	Articles by authors
	Tian-le YANG
	Ling-xia LI
	Wei ZHANG

Cite this article:

Tian-le YANG,Ling-xia LI,Wei ZHANG. Dual-branch crowd counting algorithm based on self-attention mechanism. Journal of ZheJiang University (Engineering Science), 2023, 57(10): 1955-1965.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2023.10.005 OR https://www.zjujournals.com/eng/Y2023/V57/I10/1955

基于自注意力机制的双分支密集人群计数算法

针对密集人群计数中人头尺度变化大、复杂背景干扰的问题，提出基于自注意力机制的双分支密集人群计数算法. 该算法结合卷积神经网络（CNN）和Transformer 2种网络框架，通过多尺度CNN分支和基于卷积增强自注意力模块的Transformer分支，分别获取人群局部信息和全局信息. 设计双分支注意力融合模块，以具备连续尺度的人群特征提取能力；通过基于混合注意力模块的Transformer网络提取深度特征，进一步区分复杂背景并聚焦人群区域. 采用位置级-全监督方式和计数级-弱监督方式，在ShanghaiTech Part A、ShanghaiTech Part B、UCF-QNRF、JHU-Crowd++等数据集上进行实验. 结果表明，算法在4个数据集上的性能均优于最近研究，全监督算法在上述数据集的平均绝对误差和均方根误差分别为55.3、6.7、82.9、55.7和93.1、9.8、145.1、248.0，可以实现高密集、高遮挡场景下的准确计数. 特别是在弱监督算法对比中，以低参数量实现了更佳的计数精度，并达到全监督87.9%的计数效果.

关键词： 人群计数, 深度学习, 自注意力机制, 双分支, 弱监督学习

Fig.1 Overall structure of dual-branch crowd counting algorithm based on self-attention mechanism

Fig.2 Down sampled diagram of 1/4 scale feature

Fig.3 Convolution enhanced self-attention module

Fig.4 Dual-branch attention fusion module

Fig.5 Hybrid attention module

Fig.6 Training curve of DBCC-Net

Tab.1 Results of position-level full supervision comparison experiment

Tab.2 Results of count-level weakly supervision comparison experiment

Fig.7 Position-level full supervision crowd density map

Fig.8 Visual comparison of different methods

Fig.9 Count-level weakly supervision crowd density map

Tab.3 Experimental results of time and space complexity comparison

Tab.4 Comparison of experimental results with network model framework

Tab.5 Results of ablation experiments for DBCC-Net

Tab.6 Comparison experiment results of loss function

Tab.7 Comparative experiment results of multiple fusion modules

Tab.8 Comparative experimental results of hybrid attention modules


[1]	MA Y, SHUAI H, CHENG W Spatiotemporal dilated convolution with uncertain matching for video-based crowd estimation[J]. IEEE Transactions on Multimedia, 2022, 24: 261- 273 doi: 10.1109/TMM.2021.3050059

[2]	李萌, 孙艳歌, 郭华平, 等多层次融合与注意力机制的人群计数算法[J]. 吉林大学学报: 信息科学版, 2022, 40 (6): 1009- 1016 LI Meng, SUN Yan-ge, GUO Hua-ping, et al Multi-level fusion and attention mechanism based crowd counting algorithm[J]. Journal of Jilin University: Information Science Edition, 2022, 40 (6): 1009- 1016

[3]	LIAN D, CHEN X, LI J, et al Locating and counting heads in crowds with a depth prior[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (12): 9056- 9072 doi: 10.1109/TPAMI.2021.3124956

[4]	LU H, CAO Z, XIAO Y, et al TasselNet: counting maize tassels in the wild via local counts regression network[J]. Plant Methods, 2017, 13 (1): 1- 17 doi: 10.1186/s13007-016-0152-4

[5]	XIE W, NOBLE J A, ZISSERMAN A Microscopy cell counting and detection with fully convolutional regression networks[J]. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 2018, 6 (3): 283- 292 doi: 10.1080/21681163.2016.1149104

[6]	LIANG M, HUANG X, CHEN C, et al Counting and classification of highway vehicles by regression analysis[J]. IEEE Transactions on Intelligent Transportation Systems, 2015, 16: 2878- 2888 doi: 10.1109/TITS.2015.2424917

[7]	ZENG C, MA H. Robust Head-shoulder detection by PCA-based multilevel HOG-LBP detector for people counting [C]// 20th International Conference on Pattern Recognition. Istanbul: IEEE, 2010: 2069-2072.

[8]	LI M, ZHANG Z, HUANG K, et al. Estimating the number of people in crowded scenes by MID based foreground segmentation and head-shoulder detection [C]// 19th International Conference on Pattern Recognition. Tampa: IEEE, 2008: 1-4.

[9]	刘迪, 郭继昌, 汪昱东, 等融合注意力机制的多尺度显著性目标检测网络[J]. 西安电子科技大学学报: 自然科学版, 2022, 49 (4): 118- 126 LIU Di, GUO Ji-chang, WANG Yu-dong, et al Multi-scale salient object detection network combining an attention mechanism[J]. Journal of Xidian University: Natural Science, 2022, 49 (4): 118- 126

[10]	LEMPITSKY V, ZISSERMAN A Learning to count objects in images[J]. Advances in Neural Information Processing Systems, 2010, 23: 1324- 1332

[11]	ZHANG Y, ZHOU D, CHEN S, et al. Single-image crowd counting via multi-column convolutional neural network [C]// Proceedings of the lEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 589-597.

[12]	万洪林, 王晓敏, 彭振伟, 等基于新型多尺度注意力机制的密集人群计数算法[J]. 电子与信息学报, 2022, 44 (3): 1129- 1136 WAN Hong-lin, WANG Xiao-min, PENG Zhen-wei, et al Dense crowd counting algorithm based on new multi-scale attention mechanism[J]. Journal of Electronics and Information Technology, 2022, 44 (3): 1129- 1136 doi: 10.11999/JEIT210163

[13]	LIU J, GAO C, MENG D, et al. DecideNet: counting varying density crowds through attention guided detection and density estimation [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5197-5206.

[14]	XIE J, PANG C, ZHENG Y, et al Multi-scale attention Recalibration Network for crowd counting[J]. Applied Soft Computing, 2022, 117: 108457 doi: 10.1016/j.asoc.2022.108457

[15]	LEI Y, LIU Y, ZHANG P, et al Towards using count-level weak supervision for crowd counting[J]. Pattern Recognition, 2021, 109: 107616 doi: 10.1016/j.patcog.2020.107616

[16]	LIANG D, CHEN X, XU W, et al TransCrowd: weakly-supervised crowd counting with transformer[J]. Science China: Information Sciences, 2022, 65 (6): 48- 61

[17]	LIU Z, LIN Y, CAO Y, et al. Swin transformer: hierarchical vision transformer using shifted windows[C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 10012-10022.

[18]	CHEN X, WANG X, ZHOU J, et al. Activating more pixels in image super-resolution transformer [EB/OL]. [2022-05-09]. https://arxiv.org/abs/2205.04437.pdf.

[19]	WANG B, LIU H, SAMARAS D, et al. Distribution matching for crowd counting [C]// Proceedings of the Advances in Neural Information Processing Systems. Vancouver: CA, 2020: 1595-1607.

[20]	CHAO X, SHANG W, ZHANG F Information-guided flame detection based on faster R-CNN[J]. IEEE Access, 2020, 8: 58923- 58932 doi: 10.1109/ACCESS.2020.2982994

[21]	IDRESS H, TAYYAB M, ATHREY K, et al. Composition loss for counting, density map estimation and localization in dense crowds [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 532-546.

[22]	IDRESS H, SALEEMI I, SEIBERT C, et al. Multi-source multi-scale counting in extremely dense crowd images [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Portland: IEEE, 2013: 2547-2554.

[23]	SINDAGI V A, YASARLA R, PATEL V M Jhu-crowd++: large-scale crowd counting dataset and a benchmark method[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020, (4): 653- 671

[24]	LI Y, ZHANG X, CHEN D. Csrnet: dilated convolutional neural networks for understanding the highly congested scenes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Salt lake city: IEEE, 2018: 1091-1100.

[25]	MA Z, WEI X, HONG X, et al. Bayesian loss for crowd count estimation with point supervision [C]// IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 6141-6150.

[26]	WAN J, LIU Z, CHAN A. A generalized loss function for crowd counting and localization [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 1974-1983.

[27]	LIANG D, XU W, ZHU Y, et al. Focal inverse distance transform maps for crowd localization [EB/OL]. [2021-10-20]. https://arxiv.org/pdf/2102.07925.pdf.

[28]	LIANG D, XU W, BAI X. An end-to-end transformer model for crowd localization [C]// Proceedings of the European Conference on Computer Vision. Cham: Springer, 2022: 38-54.

[29]	ZHONG X, YAN Z, QIN J, et al An improved normed-deformable convolution for crowd counting[J]. IEEE Signal Process Letters, 2022, 29: 1794- 1798 doi: 10.1109/LSP.2022.3198371

[30]	GAO X, XIE J, CHEN Z, et al Dilated convolution based feature refinement network for crowd localization[J]. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19 (6): 1- 16

[31]	WANG Q, BRECKON T Crowd counting via segmentation guided attention networks and curriculum loss[J]. IEEE Transactions on Intelligent Transportation Systems, 2022, 23 (9): 15233- 43 doi: 10.1109/TITS.2021.3138896

[32]	CHEN Y, YANG J, ZHANG D, et al Region-aware network: model human’s top-down visual perception mechanism for crowd counting[J]. Neural Networks, 2022, 148: 219- 231 doi: 10.1016/j.neunet.2022.01.015

[33]	ZENG X, HU S, WANG H, et al. Joint contextual transformer and multi-scale information shared network for crowd counting [C]// 5th International Conference on Pattern Recognition and Artificial Intelligence. Chengdu: IEEE, 2022: 412-417.

[34]	YANG Y, WU Z, SU L, et al. Weakly-supervised crowd counting learns from sorting rather than locations [C]// Proceedings of European Conference on Computer Vision. Newcastle: Springer, 2020: 1-17.

[35]	PHUCT P. Attention in crowd counting using the transformer and density map to improve counting result [C]// 8th Nafosted Conference on Information and Computer Science. Hanoi: IEEE, 2021: 65-70.

[36]	LI B, ZHANG Y, XU H, et al CCST: crowd counting with swin transformer[J]. The Visual Computer, 2023, 39 (7): 2671- 2682 doi: 10.1007/s00371-022-02485-3

[37]	RONG L, LI C. Coarse and fine-grained attention network with background-aware loss for crowd density map estimation [C]// Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. Waikoloa: IEEE, 2021: 3675-3684.

[38]	LIU W, SALZMANN M, FUA P. Context-aware crowd counting [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5094-5103.

[1]	Chao-hao ZHENG,Zhi-wei YIN,Gang-feng ZENG,Yue-ping XU,Peng ZHOU,Li LIU. Post-processing of numerical precipitation forecast based on spatial-temporal deep learning model[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1756-1765.

[2]	Zhe YANG,Hong-wei GE,Ting LI. Framework of feature fusion and distribution with mixture of experts for parallel recommendation algorithm[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1317-1325.

[3]	Yun-hong LI,Jiao-jiao DUAN,Xue-ping SU,Lei-tao ZHANG,Hui-kang YU,Xing-rui LIU. Calligraphy generation algorithm based on improved generative adversarial network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1326-1334.

[4]	Wei QUAN,Yong-qing CAI,Chao WANG,Jia SONG,Hong-kai SUN,Lin-xuan LI. VR sickness estimation model based on 3D-ResNet two-stream network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1345-1353.

[5]	Xin-lei ZHOU,Hai-ting GU,Jing LIU,Yue-ping XU,Fang GENG,Chong WANG. Daily water supply prediction method based on integrated learning and deep learning[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1120-1127.

[6]	Pei-feng LIU,Lu QIAN,Xing-wei ZHAO,Bo TAO. Continual learning framework of named entity recognition in aviation assembly domain[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1186-1194.

[7]	Jia-chi ZHAO,Tian-qi WANG,Li-fang ZENG,Xue-ming SHAO. Rapid prediction of unsteady aerodynamic characteristics of flapping wing based on GRU[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(6): 1251-1256.

[8]	Xiao-lu CAO,Fu-nan LU,Xiang ZHU,Li-bo WENG,Shu-fang LU,Fei GAO. Sketch-based compatible clothing image generation[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(5): 939-947.

[9]	Yu-ting SU,Rong-xuan LU,Wei ZHANG. Vehicle re-identification algorithm based on attention mechanism and adaptive weight[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 712-718.

[10]	Qing-lu MA,Jia-ping LU,Xiao-yao TANG,Xue-feng DUAN. Improved YOLOv5s flame and smoke detection method in road tunnels[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(4): 784-794.

[11]	Yao ZENG,Fa-qin GAO. Surface defect detection algorithm of electronic components based on improved YOLOv5[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 455-465.

[12]	Huan LAN,Jian-bo YU. Steel surface defect detection based on deep learning 3D reconstruction[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(3): 466-476.

[13]	Ju-xiang ZENG,Ping-hui WANG,Yi-dong DING,Lin LAN,Lin-xi CAI,Xiao-hong GUAN. Graph neural network based node embedding enhancement model for node classification[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 219-225.

[14]	Chao LIU,Bing KONG,Guo-wang DU,Li-hua ZHOU,Hong-mei CHEN,Chong-ming BAO. Deep clustering via high-order mutual information maximization and pseudo-label guidance[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 299-309.

[15]	Jian-sha LU,Qin BAO,Hong-tao TANG,Yi-ping SHAO,Wen-bin ZHAO. Optimal tag selection method for device-free human tracking system[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(2): 415-425.

Viewed

Full text

Abstract

Cited

Shared

Discussed