An energy-based object detection and metric learning method was proposed in order to solve the intensive computational cost and low accuracy issues in the training process of person re-identification and vehicle re-identification (re-ID) algorithms. A contrastive energy-based loss was designed based on the low energy characteristic of the same samples in the feature space, which took the form of the difference between samples’ true target response and non-target response of the training loss. The target response can be increased more accurately, and the non-target response can be suppressed. The classification accuracy can be improved, and the features within the same categories can stay more compact while different identities keep a distance away. Experiments on several person re-ID and vehicle re-ID databases showed that the efficiency of the training process was improved and the object re-ID accuracy was enhanced compared to the fused loss of Soft-max and Triplet.
Shi-lin ZHANG,Hong-nan GUO,Xuan LIU. Person and vehicle re-identification based on energy model. Journal of ZheJiang University (Engineering Science), 2022, 56(7): 1416-1424.
Fig.1Overall structure of contrastive energy-based method in network training process
T
α
β
Rank1/%
mAP/%
1
1
1
94.2
84.0
10?3
1
1
94.4
84.5
10?6
1
1
94.5
85.1
10?6
0.5
0.5
94.8
86.1
10?6
0.5
0.3
94.8
86.7
10?6
0.3
0.5
95.1
87.8
10?6
0.3
0.3
95.2
88.1
10?6
0.1
0.3
95.0
87.6
10?6
0.1
0.1
94.8
86.5
Tab.1Parameter influences on performance of person re-ID on Market1501 in relative energy loss
Fig.2Feature distribution of randomly selected vehicles
%
网络
损失函数
Rank1
mAP
OSNet
Soft-max
93.2
83.5
OSNet
AM-Soft-max
94.1
84.5
OSNet
Arc-Soft-max
94.3
84.9
OSNet
Soft-max+Center
94.8
85.1
OSNet
Soft-max+Triplet
95.3
87.5
OSNet
Energy-Loss
95.7
88.1
ResNet50
Soft-max
93.6
83.8
ResNet50
AM-Soft-max
94.4
84.7
ResNet50
Arc-Soft-max
94.7
85.2
ResNet50
Soft-max+Center
94.9
86.2
ResNet50
Soft-max+Triplet
95.4
88.1
ResNet50
Energy-Loss
95.9
88.5
ResNet50-NL
Soft-max
94.3
84.2
ResNet50-NL
AM-Soft-max
94.5
85.3
ResNet50-NL
Arc-Soft-max
95.1
85.9
ResNet50-NL
Soft-max+Center
95.2
86.8
ResNet50-NL
Soft-max+Triplet
95.6
88.2
ResNet50-NL
Energy-Loss
96.1
88.7
Tab.2Performance comparison of different loss functions over different networks
Fig.3Feature maps produced by energy loss and fused loss based network
%
方法
Market1501
DukeMTMC-ReID
MSMT
Rank1
mAP
Rank1
mAP
Rank1
mAP
Camstyle[19]
88.1
68.7
75.3
53.5
—
—
PN-GAN[20]
89.4
72.6
73.6
53.2
—
—
MGN[21]
95.7
86.9
88.7
78.4
—
—
Pyramid[22]
95.7
88.2
89.0
79.0
—
—
ABD-Net[23]
95.6
88.3
88.3
78.6
—
—
PCB[24]
93.8
81.6
83.3
69.2
68.2
40.4
SPReID[25]
92.5
81.3
84.4
71.0
—
—
MaskReID[26]
90.0
75.3
78.8
61.9
—
—
SCPNet[27]
91.2
75.2
80.3
62.6
—
—
HA-CNN[28]
91.2
75.7
80.5
63.8
—
—
SVDNet[29]
82.3
62.1
76.7
56.8
—
—
TransReID[30]
95.2
89.5
91.1
82.1
86.20
69.4
Energy-Loss
95.9
89.9
92.3
83.5
85.52
70.9
Tab.3Rank1 and mAP performance comparison with state of art methods on three person re-ID datasets
%
方法
VehicleID Small
VehicleID Medium
VehicleID Large
Rank1
mAP
Rank1
mAP
Rank1
mAP
CLVR[32]
62.00
—
56.10
—
50.60
—
VANet[33]
88.12
—
83.17
—
80.45
—
RAM[34]
75.20
—
72.30
—
67.70
—
ABLN[35]
52.63
—
—
—
—
—
VAMI[36]
63.12
—
52.87
—
47.34
—
NuFACT[37]
48.90
—
43.64
—
38.63
—
AAVER[38]
74.69
—
68.62
—
63.54
—
QD-DLF[39]
72.32
76.54
70.66
74.63
64.14
68.41
Part-Reg[40]
78.40
61.50
75.00
—
74.20
—
GSTE[41]
75.90
75.40
74.80
74.30
74.00
72.40
Energy-Loss
89.75
85.82
84.58
81.35
81.15
77.68
Tab.4Rank1 and mAP performance comparison with state of art methods on VehicleID
Fig.4Visualization of person re-identification under energy model
[1]
LIU W, WEN Y, YU Z, et al. Sphere face: deep hypersphere embedding for face recognition [C]// Proceedings of the Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 6738-6746.
[2]
DENG J K, GUO J, XUE N N, et al. Arcface: additive angular margin loss for deep face recognition [C]// Proceedings of the Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4685-4694.
[3]
WANG F, CHENG L, LIU W Additive margin Soft-max for face verification[J]. IEEE Signal Processing Letters, 2018, 25 (7): 926- 930
doi: 10.1109/LSP.2018.2822810
[4]
JIQUAN N, CHEN Z H, PANG W, et al. Learning deep energy models [C]// Proceedings of the 28th International Conference on Machine Learning, Washington: Omnipress, 2011: 1105-1112.
[5]
TAESUP K , YOSHUA B. Deep directed generative models with energy-based probability estimation [C]// Proceedings of the European Conference of Computer Vision. Amsterdam: Springer, 2016: 123-130.
[6]
YANN L, SUMIT C, RAIA H. A tutorial on energy-based learning [M]// Predicting structured data. Boston: MIT Press, 2006.
[7]
RITHESH K, ANIRUDH G, AARON C, et al. Maximum entropy generators for energy based models [C]// Proceedings of the International Conference on Computer Vision. Seoul: IEEE, 2019: 1701-1711.
[8]
LIU W T, WANG X Y, OWENS J. Energy-based out-of-distribution detection [C]// Proceedings of the Neural Information Processing System. Canada: IEEE, 2020: 112-123.
[9]
ZHOU K Y, YANG Y X, CAVALLARO A. Omni-scale feature learning for person re-identification [C]// Proceedings of the International Conference on Computer Vision. Seoul: IEEE, 2019: 3701-3711.
[10]
ZHENG L, SHEN L Y, TIAN L. Scalable person re-identification: a benchmark [C]// Proceedings of the Computer Vision and Pattern Recognition. Boston: IEEE, 2015: 1116-1124.
[11]
RISTANI E, SOLERA F, ZOU R, et al. Performance measures and a data set for multi-target, multi-camera tracking [C]// Proceedings of the European Conference of Computer Vision. Amsterdam: Springer, 2016: 17-35.
[12]
WEI L H, ZHANG S L, GAO W, et al. Person transfer gan to bridge domain gap for person re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 79-88.
[13]
LIU X C, LIU W, MEI T. A deep learning-based approach to progressive vehicle re-identification for urban surveillance [C]// Proceedings of the European Conference of Computer Vision. Amsterdam: Springer, 2016: 123-130.
[14]
LIU H Y, TIAN Y H, WANG Y W. Deep relative distance learning: tell the difference between similar vehicles [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nevada: IEEE, 2016: 2167-2175.
[15]
HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nevada: IEEE, 2016: 116-124.
[16]
WANG X, GIRSHICK R, GUPTA A, et al. Non-local neural networks [C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 7794-7803.
[17]
WEN Y, ZHANG K, LI Z. A discriminative feature learning approach for deep face recognition [C]// Proceedings of the European Conference of Computer Vision. Amsterdam: Springer, 2016: 23-30.
[18]
ALEXANDER H, LUCAS B, BASTIAN L. In defense of the triplet loss for person re-identification [C]// Proceedings of the International Conference on Computer Vision. Seoul: Springer, 2018: 1132-1139.
[19]
ZHONG Z, ZHENG L, ZHENG Z D, et al Camstyle: a novel data augmentation method for person re-identification[J]. IEEE Transactions on Image Processing, 2019, 28 (3): 1176- 1190
doi: 10.1109/TIP.2018.2874313
[20]
QIAN X L, FU Y W, XIANG T, et al. Pose-normalized image generation for person re-identification [C]// Proceedings of the European Conference of Computer Vision. Munich: Springer, 2018: 1123-1132.
[21]
WANG G S, YUAN Y F, CHEN X, et al. Learning discriminative features with multiple granularities for person re-identification [C]// Proceedings of the ACM Multimedia Conference on Multimedia Conference. Seoul: ACM, 2018: 1123-1132.
[22]
ZHENG F, DENG C, SUN X, et al. Pyramidal person re-identification via multi-loss dynamic training [C]// Proceedings of the Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 134-143.
[23]
CHEN T S, XU M X, HUI X L, et al. Learning semantic-specific graph representation for multi-label image recognition [C]// Proceedings of the International Conference on Computer Vision. Seoul: Springer, 2019: 2132-2139.
[24]
SUN Y F, ZHENG L, YANG Y. Beyond part models: person retrieval with refined part pooling and a strong convolutional baseline [C]// Proceedings of the European Conference of Computer Vision. Munich: Springer, 2018: 25-32.
[25]
KAKAYEH M M, BASRARN E. Human semantic parsing for person re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 99--107.
[26]
LEI Q, JING H, LEI W, et al. Maskreid: a mask based deep ranking neural network for person re-identification [C]// Proceedings of the International Conference of Multimedia Exposition. Shanghai: IEEE, 2019: 1138-1145.
[27]
FAN X, LUO H, ZHANG X, et al. SCPNet: spatial-channel parallelism network for joint holistic and partial person re-identification [C]// Proceedings of the Asian Conference of Computer Vision. Kyoto: IEEE, 2019: 2351-2359.
[28]
LI W, ZHU X T, GONG S G. Harmonious attention network for person re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 1324—1332.
[29]
SUN Y F, ZHENG L, DENG W J, et al. SVDNet for pedestrian retrieval[C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 99-107.
[30]
HE S, LUO H, WANG P C, et al. TransReID: transformer-based object re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 151-159.
[31]
ZHONG Z, ZHENG L, CAO D L, et al. Re-ranking person re-identification with k-reciprocal encoding [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 345-352.
[32]
LIU X C, LIU W, MEI T, et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance [C]// Proceedings of the European Conference of Computer Vision. Amsterdam: Springer, 2016: 123-130.
[33]
CHU R H, SUN Y F, LI Y D, et al. Vehicle re-identification with viewpoint aware metric learning [C]// Proceedings of the International Conference on Computer Vision. Seoul: Springer, 2019: 1132-1139.
[34]
LIU X B, ZHANG S L, HUANG Q M, et al. Ram: a region-aware deep model for vehicle re-identification [C]// Proceedings of the International Conference of Multimedia Exposition. San Diego: IEEE, 2018: 138-145.
[35]
ZHOU Y, SHAO L. Vehicle re-identification by adversarial bi-directional LSTM network [C]// Proceedings of the IEEE Winter Conference on Applications of Computer Vision. Salt Lake City: IEEE, 2018: 1123-1132.
[36]
ZHOU Y, SHAO L. Viewpoint-aware attentive multi-view inference for vehicle re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Utah: IEEE, 2018: 324-332.
[37]
LIU X C, LIU W, MEI T, et al PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance[J]. IEEE Transactions on Multimedia, 2018, 20 (3): 645- 658
doi: 10.1109/TMM.2017.2751966
[38]
KHORRAMSHAHI P, KUMAR A, PERI N, et al. A dual-path model with adaptive attention for vehicle re-identification [C]// Proceedings of the International Conference on Computer Vision. Seoul: Springer, 2019: 132-139.
[39]
ZHU J Q, ZENG H Q, HUANG J C, et al Vehicle re-identification using quadruple directional deep learning features[J]. IEEE Transactions on Intelligent Transportation Systems, 2019, 21 (1): 1- 11
[40]
HE B, LI J, ZHAO Y, et al. Part-regularized near-duplicate vehicle re-identification [C]// Proceedings of the Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 154-163.