Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (1): 52-60    DOI: 10.3785/j.issn.1008-973X.2026.01.005
    
Multi-modal gait recognition based on SMPL model decomposition and embedding fusion
Yue WU1(),Zheng LIANG1,Wei GAO1,Maoda YANG1,Peisen ZHAO1,Hongxia DENG1,Yuanyuan CHANG2,*()
1. College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
2. School of Physical Education and Health Engineering, Taiyuan University of Technology, Taiyuan 030024, China
Download: HTML     PDF(1409KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A multimodal gait recognition method based on skinned multi-person linear (SMPL) modal decomposition and embedding fusion was proposed, to address the limitations in current gait recognition research, including insufficient gait information mining and inadequate cross-modal feature alignment that restrict recognition performance in real-world scenarios. The SMPL model was decomposed into a shape branch and a pose branch to comprehensively extract static body shape features and dynamic motion characteristics. An adaptive frame-joint attention module was constructed to focus on key frames and significant joints adaptively, thereby enhancing pose feature representation. A modality embedding fusion module was designed to project different modal features into a unified semantic space, and a modality consistency loss function was built to optimize cross-modal feature alignment and improve fusion effectiveness. Experimental results on the Gait3D dataset demonstrated that the proposed method achieved a Rank-1 accuracy of 70.4%, outperforming six silhouette-based methods, two skeleton-based methods, and five multimodal approaches combining silhouettes with skeletons or SMPL models. The method exhibits superior robustness in complex real-world scenarios, validating its effectiveness in modal feature extraction and cross-modal feature alignment.



Key wordsgait recognition      SMPL model      adaptive attention      feature alignment      modality fusion     
Received: 13 March 2025      Published: 15 December 2025
CLC:  TP 393  
Fund:  山西省中央引导地方科技发展资金项目(YDZJSX2022A016);山西省重点研发计划资助项目(2022ZDYF128);山西省科技战略项目(202404030401080).
Corresponding Authors: Yuanyuan CHANG     E-mail: 18634898755@163.com;changyuanyuan@tyut.edu.cn
Cite this article:

Yue WU,Zheng LIANG,Wei GAO,Maoda YANG,Peisen ZHAO,Hongxia DENG,Yuanyuan CHANG. Multi-modal gait recognition based on SMPL model decomposition and embedding fusion. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 52-60.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.01.005     OR     https://www.zjujournals.com/eng/Y2026/V60/I1/52


基于SMPL模态分解与嵌入融合的多模态步态识别

针对现有步态识别研究中步态信息挖掘不足和跨模态特征对齐不充分导致真实场景中识别性能受限的问题,提出基于蒙皮多人线性(SMPL)模态分解与嵌入融合的多模态步态识别方法. 通过将SMPL模型分解为形状分支和姿势分支,全面提取人体静态形状特征和动态运动特征;构建自适应帧关节注意力模块,自适应聚焦关键帧与重要关节,增强姿势特征表达能力;设计模态嵌入融合模块,将不同模态特征投影至统一语义空间,并构建模态一致性损失函数,优化跨模态特征对齐,提升融合效果. 在Gait3D数据集上的实验结果表明,与6种基于轮廓的方法、2种基于骨骼的方法以及5种基于轮廓和骨骼或SMPL模型的多模态方法比较,所提方法Rank-1准确率达到70.4%,在复杂真实场景中表现出更高鲁棒性,验证了所提方法在模态特征提取和跨模态特征对齐方面的有效性.


关键词: 步态识别,  SMPL模型,  自适应注意力,  特征对齐,  模态融合 
Fig.1 Structure of DFGait network
Fig.2 Structure of silhouette branch
Fig.3 Two-branch structure of SMPL
Fig.4 Implementation details of AFJAtt
Fig.5 Schematic diagram of modality feature alignment
Fig.6 Process of MEFusion
模块模块结构输出维度
Block0$ \left[3\times 3, 64\right],\; {\rm{stride}}=1 $30×64×64×44
Block1$ \left[\begin{array}{c}3\times 3, 64\\ 3\times 3, 64\end{array}\right],\;{\rm{stride}}=1 $
$ \left[3\times 1\times 1, 64\right], \;{\rm{stride}}=1 $
30×128×64×44
Block2$ \left[\begin{array}{c}3\times 3, 64\\ 3\times 3, 128\end{array}\right],\;{\rm{stride}}=2 $
$ \left[3\times 1\times 1, 128\right] , \;{\rm{stride}}=1 $
30×128×32×22
Block3$ \left[\begin{array}{c}3\times 3, 128\\ 3\times 3, 256\end{array}\right],\;{\rm{stride}}=2 $
$ \left[3\times 1\times 1, 256\right],\; {\rm{stride}}=1 $
30×256×16×11
Block4$ \left[\begin{array}{c}3\times 3, 256\\ 3\times \mathrm{3,256}\end{array}\right] ,\;{\rm{stride}}=1 $
$ \left[3\times 1\times 1, 256\right],\; {\rm{stride}}=1 $
30×256×16×11
Tab.1 ResNet-like backbone structure of silhouette branch
模块模块结构输出维度
Block0批量归一化层30×24×3
Block1基本层30×24×64
瓶颈层30×24×64
瓶颈层30×24×32
Block2瓶颈层30×24×64
瓶颈层30×24×128
瓶颈层30×24×256
瓶颈层30×24×256
Block3最大池化层1×256
Tab.2 ResGCN backbone structure of SMPL pose branch
模态方法来源Rank-1Rank-5mAP/%mINP/%
轮廓GaitSet[5]AAAI201936.7058.3030.0117.30
GaitPart[6]CVPR202028.2047.6021.5812.36
GaitGL[8]ICCV202129.7048.5022.2913.26
GaitGCI[9]CVPR202350.3068.5039.5024.30
GaitBase[10]CVPR202364.2079.5054.5136.36
DyGait[11]ICCV202366.3080.8056.4037.30
骨骼GaitGraph[13]ICIP20218.3016.607.144.80
GPGait[14]ICCV202322.50
轮廓+
骨骼/SMPL
MSAFF[17]IJCB202348.1066.6038.4523.49
GaitRef[18]IJCB202349.0069.3040.6925.26
GaitSTR[19]T-BIOM202465.1081.3055.5936.84
SMPLGait[20]CVPR202246.3064.5037.1622.23
HybirdGait[21]AAAI202453.3072.0043.2926.65
DFGait本研究70.4085.0061.0441.27
Tab.3 Comparison results of different methods on Gait3D dataset
MethodsR-1R-5mAP/%mINP/%
GaitGraph8.3016.607.144.80
GaitGraph+AFJAtt11.3022.509.876.56
SMPL姿势分支6.2012.604.922.94
SMPL姿势分支+AFJAtt8.1015.705.773.69
Tab.4 Ablation experiments for AFJAtt
Fig.7 Heatmap of AFJAtt weights
轮廓分支SMPL分支AFJAttMEFusionR-1R-5mAP/%mINP/%
EFusionMCLoss
26.4041.9017.4410.23
64.9082.2054.9635.70
66.1083.2055.6436.44
68.9084.2058.9439.11
69.5085.1060.6141.22
70.4085.0061.0441.27
Tab.5 Ablation study on multimodal structure, AFJAtt, and MEFusion
[1]   MAHMOUD M, KASEM M S, KANG H S A comprehensive survey of masked faces: recognition, detection, and unmasking[J]. Applied Sciences, 2024, 14 (19): 8781
doi: 10.3390/app14198781
[2]   JIA Z, HUANG C, WANG Z, et al Finger recovery transformer: toward better incomplete fingerprint identification[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 8860- 8874
doi: 10.1109/TIFS.2024.3419690
[3]   KUEHLKAMP A, BOYD A, CZAJKA A, et al. Interpretable deep learning-based forensic iris segmentation and recognition [C]// IEEE/CVF Winter Conference on Applications of Computer Vision Workshops. Waikoloa: IEEE, 2022: 359–368.
[4]   赵晓东, 刘作军, 陈玲玲, 等 下肢假肢穿戴者跑动步态识别方法[J]. 浙江大学学报: 工学版, 2018, 52 (10): 1980- 1988
ZHAO Xiaodong, LIU Zuojun, CHEN Lingling, et al Approach of running gait recognition for lower limb amputees[J]. Journal of Zhejiang University: Engineering Science, 2018, 52 (10): 1980- 1988
[5]   CHAO H, WANG K, HE Y, et al GaitSet: cross-view gait recognition through utilizing gait as a deep set[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022, 44 (7): 3467- 3478
[6]   FAN C, PENG Y, CAO C, et al. GaitPart: temporal part-based model for gait recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 14213–14221.
[7]   HUANG Z, XUE D, SHEN X, et al. 3D local convolutional neural networks for gait recognition [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 14900–14909.
[8]   LIN B, ZHANG S, YU X. Gait recognition via effective global-local feature representation and local temporal aggregation [C]// IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 14628–14636.
[9]   DOU H, ZHANG P, SU W, et al. GaitGCI: generative counterfactual intervention for gait recognition [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 5578–5588.
[10]   FAN C, LIANG J, SHEN C, et al. OpenGait: revisiting gait recognition toward better practicality [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 9707–9716.
[11]   WANG M, GUO X, LIN B, et al. DyGait: exploiting dynamic representations for high-performance gait recognition [C]// IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 13378–13387.
[12]   LIAO R, YU S, AN W, et al A model-based gait recognition method with body pose and human prior knowledge[J]. Pattern Recognition, 2020, 98: 107069
doi: 10.1016/j.patcog.2019.107069
[13]   TEEPE T, KHAN A, GILG J, et al. Gaitgraph: graph convolutional network for skeleton-based gait recognition [C]// IEEE International Conference on Image Processing. Anchorage: IEEE, 2021: 2314–2318.
[14]   FU Y, MENG S, HOU S, et al. GPGait: generalized pose-based gait recognition [C]// 2023 IEEE/CVF International Conference on Computer Vision. Los Alamitos: IEEE Computer Soc, 2023: 19538–19547.
[15]   ZHANG C, CHEN X P, HAN G Q, et al Spatial transformer network on skeleton-based gait recognition[J]. Expert Systems, 2023, 40 (6): e13244
doi: 10.1111/exsy.13244
[16]   SUN Y, FENG X, MA L, et al. TriGait: aligning and fusing skeleton and silhouette gait data via a tri-branch network [C]// IEEE International Joint Conference on Biometrics. Ljubljana: IEEE, 2023: 1–9.
[17]   ZOU S, XIONG J, FAN C, et al. A multi-stage adaptive feature fusion neural network for multimodal gait recognition [C]// IEEE International Joint Conference on Biometrics. Ljubljana: IEEE, 2023: 1–10.
[18]   ZHU H, ZHENG W, ZHENG Z, et al. GaitRef: gait recognition with refined sequential skeletons [C]// 2023 IEEE International Joint Conference on Biometrics. Ljubljana: IEEE, 2023: 1–10.
[19]   ZHENG W, ZHU H, ZHENG Z, et al GaitSTR: gait recognition with sequential two-stream refinement[J]. IEEE Transactions on Biometrics, Behavior, and Identity Science, 2024, 6 (4): 528- 538
doi: 10.1109/TBIOM.2024.3390626
[20]   ZHENG J, LIU X, LIU W, et al. Gait recognition in the wild with dense 3D representations and a benchmark [C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 20196–20205.
[21]   DONG Y, YU C, HA R, et al. HybridGait: a benchmark for spatial-temporal cloth-changing gait recognition with hybrid explorations [C]// AAAI Conference on Artificial Intelligence. Palo Alto: Assoc Advancement Artificial Intelligence, 2024: 1600–1608.
[22]   LOPER M, MAHMOOD N, ROMERO J, et al SMPL: a skinned multi-person linear model[J]. ACM Transactions on Graphics, 2015, 34 (6): 248
[23]   YU S, TAN D, TAN T. A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition [C]// International Conference on Pattern Recognition. Hong Kong: IEEE, 2006: 441–444.
[24]   TAKEMURA N, MAKIHARA Y, MURAMATSU D, et al Multi-view large population gait dataset and its performance evaluation for cross-view gait recognition[J]. IPSJ Transactions on Computer Vision and Applications, 2018, 10 (1): 4
doi: 10.1186/s41074-018-0039-6
[25]   ZHU Z, GUO X, YANG T, et al. Gait recognition in the wild: a benchmark [C]// 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 14789–14799.
[26]   KIPF T N, WELLING M. Semi-supervised classification with graph convolutional networks [C]// International Conference on Learning Representations. Toulon: [s. n. ], 2017.
[27]   LI J, ZHANG Y, SHAN H, et al. Gaitcotr: improved spatial-temporal representation for gait recognition with a hybrid convolution-transformer framework [C]// 2023 IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1–5.
[28]   SONG Y F, ZHANG Z, SHAN C, et al. Stronger, faster and more explainable: a graph convolutional baseline for skeleton-based action recognition [C]// ACM International Conference on Multimedia. Seattle: ACM, 2020: 1625–1633.
[1] ZHAO Xiao-dong, LIU Zuo-jun, CHEN Ling-ling, YANG Peng. Approach of running gait recognition for lower limb amputees[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(10): 1980-1988.