Please wait a minute...
浙江大学学报(工学版)  2022, Vol. 56 Issue (11): 2204-2214    DOI: 10.3785/j.issn.1008-973X.2022.11.011
计算机技术     
深度监督对齐的零样本图像分类方法
曾素佳(),庞善民*(),郝问裕
西安交通大学 软件学院,陕西 西安 710049
Zero-shot image classification method base on deep supervised alignment
Su-jia ZENG(),Shan-min PANG*(),Wen-yu HAO
School of Software Engineering, Xi’an Jiaotong University, Xi’an 710049, China
 全文: PDF(985 KB)   HTML
摘要:

针对零样本图像分类中属性向量的类别区分性差及对可见类别产生分类偏好的问题,提出一种深度监督对齐的零样本图像分类(DSAN)方法. DSAN构造类语义的全局监督标记,与专家标注的属性向量联合使用以增强类语义间的区分性. 为了对齐视觉空间和语义空间的流形结构,采用视觉特征和语义特征分类网络分别学习2种空间特征的类别分布,并且无差异地对齐两者的分布. 利用生成对抗网络的原理消除特征间的本质差异,以按位加的方式合并视觉特征和类语义特征,并利用关系网络学习两者间的非线性相似度. 实验结果表明,DSAN在CUB、AWA1和AWA2数据集上对可见类别和未见类别的调和平均分类准确率比基线模型分别提高了4.3%、19.5%和21.9%;在SUN和APY数据集上,DSAN方法的调和平均分类准确率分别比CRnet方法高1.4%和2.2%,这些结果证明所提方法的有效性.

关键词: 零样本学习属性向量关系网络跨模态生成对抗网络    
Abstract:

A zero-shot image classification method based on deep supervised alignment network (DSAN) was proposed to address the problems caused by poor class discrimination of attributes and the bias of classifying images into seen classes in generalized zero-shot image classification. The global supervised tags were constructed and used along with the attribute vectors annotated by expert systems to enhance the discrimination ability of class semantics. To align the manifolds structure of visual and semantic space, image and semantic feature classification networks were designed to learn their class distributions respectively, which were aligned afterwards with no difference. The generative adversarial network was also utilized to eliminate the heterogeneity between them. The element-wise addition was used to merge visual features and class semantic features when learning their nonlinear similarity by relation network. Experimental results showed that the harmonic mean classification accuracy for seen and unseen classes of the proposed method outperformed the baseline model by 4.3%, 19.5% and 21.9% on CUB, AWA1, AWA2 datasets, respectively. The harmonic mean classification accuracy was 1.4% and 2.2% higher than those of the existing best-performing CRnet method on SUN and APY datasets, respectively. The results demonstrated the effectiveness of the proposed method.

Key words: zero-shot learning    attribute vector    relation network    cross modal    generative adversarial network
收稿日期: 2022-02-15 出版日期: 2022-12-02
CLC:  TP 181  
基金资助: 国家自然科学基金资助项目(61972312);陕西省重点研发计划一般工业资助项目(2020GY-002)
通讯作者: 庞善民     E-mail: zsujia19@stu.xjtu.edu.cn;pangsm@xjtu.edu.cn
作者简介: 曾素佳(1996—),女,硕士生,从事零样本学习研究. orcid.org/0000-0002-1230-6897. E-mail: zsujia19@stu.xjtu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
曾素佳
庞善民
郝问裕

引用本文:

曾素佳,庞善民,郝问裕. 深度监督对齐的零样本图像分类方法[J]. 浙江大学学报(工学版), 2022, 56(11): 2204-2214.

Su-jia ZENG,Shan-min PANG,Wen-yu HAO. Zero-shot image classification method base on deep supervised alignment. Journal of ZheJiang University (Engineering Science), 2022, 56(11): 2204-2214.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2022.11.011        https://www.zjujournals.com/eng/CN/Y2022/V56/I11/2204

图 1  DSAN方法框架图
数据集 图像数 属性个数 可见类别数 未见类别数
CUB 11 788 312 150 50
AWA1 30 475 85 40 10
AWA2 37 322 85 40 10
SUN 14 340 102 645 72
APY 15 339 64 20 12
表 1  零样本图像分类的数据集划分方式
方法 A
CUB AWA1 AWA2 SUN APY
1)注:带“*”的实验数据由文献作者公开的源代码复现得到;“?”代表未测;表中其他算法的数据来自文献[21]公开的复现结果.
SAE[25] 33.3 53.0 54.1 40.3 8.3
CDL[7] 54.5 69.9 ? 63.6 43.0
GAZSL[15] 55.8 68.2 70.2 61.3 41.1
DCN[16] 56.2 65.2 ? 61.8 43.6
f-CLSWGAN[18] 57.3 68.2 ? 60.8 ?
FD-fGAN[19] 58.3 72.6 ? 61.5 ?
Rnet[6] 55.6 68.2 64.2 49.3* 39.8*
SARN[8] 53.8 68.0 64.2 ? ?
TCN[10] 59.5 70.3 71.2 61.5 38.9
CRnet[11] 56.6* 69.1* 63.0* 61.4* 39.1*
DSAN 57.4 71.8 72.3 62.4 41.5
表 2  不同数据集上传统零样本分类准确率表现对比1)
方法 CUB AWA1 AWA2 SUN APY
U S H U S H U S H U S H U S H
SAE[25] 7.8 54.0 13.6 1.8 77.1 3.5 1.1 82.2 2.2 8.8 18.0 11.8 0.4 80.9 0.9
DEM[12] 19.6 57.9 29.2 32.8 84.7 47.3 30.5 86.4 45.1 20.5 34.3 25.6 75.1 11.1 19.4
CDL[7] 23.5 55.2 32.9 28.1 73.5 40.6 ? ? ? 21.5 34.7 26.5 19.8 48.6 28.1
GAZSL[15] 31.7 61.3 41.8 29.6 84.2 43.8 35.4 86.9 50.3 22.1 39.3 28.3 14.2 78.6 24.0
DCN[16] 28.4 60.7 38.7 25.5 84.2 39.1 ? ? ? 25.5 37.0 30.2 14.2 75.0 23.9
SE-GZSL[17] 41.5 53.3 46.7 56.3 67.8 61.5 58.3 68.1 62.8 40.9 30.5 34.9 ? ? ?
f-CLSWGAN[18] 43.7 57.7 49.7 57.9 61.4 59.6 ? ? ? 42.6 36.6 39.4 ? ? ?
FD-fGAN[19] 47.0 57.1 51.6 54.2 76.2 63.3 ? ? ? 42.7 38.1 40.3 ? ? ?
Rnet[6] 38.1 61.1 47.0 31.4 91.3 46.7 30.0 93.4 45.3 14.0* 23.3* 17.5* 9.8* 62.5* 17.0*
SARN[8] 37.4 64.6 47.1 33.1 90.8 48.5 35.9 92.9 51.8 ? ? ? ? ? ?
TCN[10] 52.6 52.0 52.3 49.4 76.5 60.0 61.2 65.8 63.4 31.2 37.3 34.0 24.1 64.0 35.1
CRnet[11] 45.5 56.8 50.5 58.1 74.7 65.4 52.6 78.8 63.1 34.1 36.5 35.3 32.4 68.4 44.0
CPDN[9] 46.6 58.9 52.0 49.1 82.7 61.6 44.6 85.4 58.6 ? ? ? ? ? ?
DRN[9] 46.9 58.8 52.2 50.1 81.4 62.1 44.9 85.3 58.8 ? ? ? ? ? ?
DSAN 46.9 56.6 51.3 58.1 77.1 66.2 58.6 78.8 67.2 33.2 41.1 36.7 34.1 71.6 46.2
表 3  不同数据集上广义零样本分类表现对比
图 2  不同合并方式下的分类表现对比
图 3  不同合并方式下H值在训练过程中的变化曲线
方法 CUB AWA1 AWA2 SUN APY
U S H U S H U S H U S H U S H
按位加 44.7 55.9 49.7 56.1 77.7 65.2 54.1 80.4 64.7 31.3 43.3 36.3 31.1 67.4 42.5
语义全局标记 45.8 57.5 51.0 58.1 79.6 67.2 54.9 83.8 66.3 32.6 42.0 36.7 34.3 68.0 45.6
分布差异约束 45.1 61.4 52.0 61.5 76.4 68.2 55.4 80.9 65.8 32.2 43.1 36.9 33.4 66.2 44.4
对抗约束 46.0 57.8 51.2 60.4 75.3 67.0 56.0 82.4 66.7 33.1 43.0 37.4 33.9 68.9 45.4
表 4  不同组件的消融实验结果
1 LAMPERT C H, NICKISCH H, HARMELING S. Learning to detect unseen object classes by between-class attributetransfer [C]// 2009 IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 951-958.
2 WANG W, ZHENG V W, YU H, et al A survey of zero-shot learning: settings, methods, and applications[J]. ACM Transactions on Intelligent Systems and Technology, 2019, 10 (2): 1- 37
3 冀中, 汪浩然, 于云龙等 零样本图像分类综述: 十年进展[J]. 中国科学:信息科学, 2019, 49 (10): 1299- 1320
JI Zhong, WANG Hao-ran, YU Yun-long, et al A decadal survey of zero-shot image classification[J]. Scientia Sinica (Informationis), 2019, 49 (10): 1299- 1320
doi: 10.1360/N112018-00312
4 刘靖祎, 史彩娟, 涂冬景等 零样本图像分类综述[J]. 计算机科学与探索, 2021, 15: 812- 824
LIU Jing-yi, SHI Cai-juan, TU Dong-jing, et al A survey of zero-shot image classification[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15: 812- 824
doi: 10.3778/j.issn.1673-9418.2010092
5 CHAO W L, CHANGPINYO S, GONG B, et al. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild [C]// Proceedings of the 14th European Conference on Computer Vision. Amsterdam: Springer, 2016: 52-68.
6 SUNG F, YANG Y, ZHANG L, et al. Learning to compare: relation network for few-shot learning [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1199-1208.
7 JIANG H. , WANG R. , SHAN S, et al. Learning class prototypes via structure alignment for zero-shot recognition [C]// European Conference on Computer Vision. Munich: Springer, 2018: 121-138.
8 HUI B, ZHU P, HU Q, et al. Self-attention relation network for few-shot learning [C]// 2019 IEEE International Conference on Multimedia and Expo Workshops. Shanghai: IEEE, 2019: 198-203.
9 HUANG S, LIN J, HUANGFU L Class-prototype discriminative network for generalized zero-shot learning[J]. IEEE Signal Processing Letters, 2020, 27: 301- 305
doi: 10.1109/LSP.2020.2968213
10 JIANG H, WANG R, SHAN S, et al. Transferable contrastive network for generalized zero-shot learning [C]// 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2019: 9764-9773.
11 ZHANG F, SHI G. Co-representation network for generalized zero-shot learning [C]// Proceedings of the 36th International Conference on Machine Learning. California: JMLR. org, 2019: 7434-7443.
12 ZHANG L, XIANG T, GONG S. Learning a deep embedding model for zero-shot learning [C]// 2017 IEEE Conference on Computer Vision and Pattern Recognition. Hawaii: IEEE, 2017: 3010-3019.
13 GOODFELLOW I J, POUGET ABADIE J, MIRZA M, et al. Generative adversarial nets [C]// 2014 Advances in Neural Information Processing Systems. Montreal: MIT Press, 2014: 2672-2680.
14 GANIN Y, LEMPITSKY V. Unsupervised domain adaptation by backpropagation [C]// 2015 Proceedings of the 32nd International Conference on International Conference on Machine Learning. Lille: JMLR. org, 2015: 1180-1189.
15 ZHU Y, ELHOSEINY M, LIU B, et al. A generative adversarial approach for zero-shot learning from noisy texts [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 1004-1013.
16 LIU S, LONG M, WANG J, et al. Generalized zero-shot learning with deep calibration network [C]// 2018 Proceedings of the 32nd International Conference on Neural Information Processing Systems. Montreal: Curran Associates Inc. , 2018: 2009-2019.
17 VERMA V K, ARORA G, MISHRA A, et al. Generalized zero-shot learning via synthesized examples [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 4281-4289.
18 XIAN Y, LORENZ T, SCHIELE B, et al. Feature generating networks for zero-shot learning [C]// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 5542-5551.
19 张越. 基于生成对抗网络的零样本图像分类研究[D]. 呼和浩特: 内蒙古大学, 2020: 16-38.
ZHANG Yue. Research on zero-shot image classification based on generative adversarial network [D]. Huhehaote: Inner Mongolia University, 2020: 16-38.
20 WAH C, BRANSON S, PERONA P, et al. Multiclass recognition and part localization with humans in the loop [C]// Proceedings of the 2011 International Conference on Computer Vision. Barcelona: IEEE, 2011: 2524-2531.
21 XIAN Y, LAMPERT C H, SCHIELE B, et al Zero-shot learning—a comprehensive evaluation of the good, the bad and the ugly[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 2251- 2265
22 PATTERSON G, XU C, SU H, et al The sun attribute database: beyond categories for deeper scene understanding[J]. International Journal of Computer Vision, 2014, 108 (1-2): 59- 81
doi: 10.1007/s11263-013-0695-z
23 FARHADI A, ENDRES I, HOIEM D, et al. Describing objects by their attributes [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Miami: IEEE, 2009: 1778-1785.
24 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778.
[1] 白文超,韩希先,王金宝. 基于条件生成模型的高效近似查询处理框架[J]. 浙江大学学报(工学版), 2022, 56(5): 995-1005.
[2] 何立,庞善民. 结合年龄监督和人脸先验的语音-人脸图像重建[J]. 浙江大学学报(工学版), 2022, 56(5): 1006-1016.
[3] 屠杭垚,王万良,陈嘉诚,李国庆,吴菲. 结合大气散射模型的生成对抗网络去雾算法[J]. 浙江大学学报(工学版), 2022, 56(2): 225-235.
[4] 张鹏,田子都,王浩. 基于改进生成对抗网络的飞参数据异常检测方法[J]. 浙江大学学报(工学版), 2022, 56(10): 1967-1976.
[5] 陈彤,郭剑锋,韩心中,谢学立,席建祥. 基于生成对抗模型的可见光-红外图像匹配方法[J]. 浙江大学学报(工学版), 2022, 56(1): 63-74.
[6] 陈雪云,黄小巧,谢丽. 基于多尺度条件生成对抗网络血细胞图像分类检测方法[J]. 浙江大学学报(工学版), 2021, 55(9): 1772-1781.
[7] 胡惠雅,盖绍彦,达飞鹏. 基于生成对抗网络的偏转人脸转正[J]. 浙江大学学报(工学版), 2021, 55(1): 116-123.
[8] 刘坤,文熙,黄闽茗,杨欣欣,毛经坤. 基于生成对抗网络的太阳能电池缺陷增强方法[J]. 浙江大学学报(工学版), 2020, 54(4): 684-693.
[9] 段然,周登文,赵丽娟,柴晓亮. 基于多尺度特征映射网络的图像超分辨率重建[J]. 浙江大学学报(工学版), 2019, 53(7): 1331-1339.