Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (1): 81-89    DOI: 10.3785/j.issn.1008-973X.2026.01.008
计算机技术     
基于双重引导的目标对抗攻击方法
孙月(),张兴兰*()
北京工业大学 计算机学院,北京 100124
Targeted adversarial attack method based on dual guidance
Yue SUN(),Xinglan ZHANG*()
School of Computer Science, Beijing University of Technology, Beijing 100124, China
 全文: PDF(2259 KB)   HTML
摘要:

为了提升目标对抗样本的迁移性能,提出基于目标类别印象和正则化对抗样本双重引导的生成式对抗攻击方法. 利用UNet模型的跳跃连接机制生成浅层特征的对抗扰动,增强对抗样本的攻击性. 将目标类别的类印象图和标签作为输入,引导生成器生成含有目标类别特征的对抗扰动,提高目标攻击成功率. 在训练阶段对生成的对抗扰动使用Dropout技术,降低生成器对替代模型的依赖,以提升对抗样本的泛化性能. 实验结果表明,在MNIST、CIFAR10以及SVHN数据集上,所提方法生成的对抗样本在ResNet18、DenseNet等分类模型上均有较好的目标迁移攻击效果,平均黑盒目标攻击成功率比基准攻击方法MIM提高了1.6%以上,说明所提方法生成的对抗样本可以更有效地评估深度模型的鲁棒性.

关键词: 深度学习对抗攻击对抗样本黑盒攻击目标攻击    
Abstract:

A generative adversarial attack method based on dual guidance of target class impressions and regularized adversarial examples was proposed to enhance the transferability of targeted adversarial samples. The adversarial perturbations of shallow features were generated by leveraging the skip-connection mechanism of the UNet model to improve the attack effectiveness of the adversarial samples. To improve the targeted attack success rate, the generator was guided to generate adversarial perturbations containing the features of target classes using the impression images and labels of target classes as input. The Dropout technique was employed on the generated adversarial perturbations in the training phase to reduce the dependence of the generator on surrogate models, thereby improving the generalization performance of the adversarial samples. Experimental results demonstrated that the adversarial samples generated by the proposed method exhibited significant targeted transferability on the MNIST, CIFAR10, and SVHN datasets when attacking classification models such as ResNet18 and DenseNet. The average black-box targeted attack success rate was improved by more than 1.6% compared with that of the benchmark attack method MIM, demonstrating that the adversarial samples generated by the proposed method could evaluate the robustness of the deep models more effectively.

Key words: deep learning    adversarial attack    adversarial example    black-box attack    targeted attack
收稿日期: 2025-02-20 出版日期: 2025-12-15
:  TP 391  
基金资助: 国家自然科学基金资助项目(62202017).
通讯作者: 张兴兰     E-mail: sunyues2022@emails.bjut.edu.cn;zhangxinglan@bjut.edu.cn
作者简介: 孙月(2000—),女,硕士生,从事深度学习安全研究. orcid.org/0009-0001-5046-4087. E-mail:sunyues2022@emails.bjut.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
孙月
张兴兰

引用本文:

孙月,张兴兰. 基于双重引导的目标对抗攻击方法[J]. 浙江大学学报(工学版), 2026, 60(1): 81-89.

Yue SUN,Xinglan ZHANG. Targeted adversarial attack method based on dual guidance. Journal of ZheJiang University (Engineering Science), 2026, 60(1): 81-89.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.01.008        https://www.zjujournals.com/eng/CN/Y2026/V60/I1/81

图 1  采用VGG16或LeNet模型生成的类印象图示例
图 2  对抗样本生成器结构图
图 3  所提方法的总体框架图
分类模型ACC/%
MNISTCIFAR10CIFAR10+ATSVHN
ResNet1899.5786.6784.1194.22
VGG1687.8779.5193.71
DenseNet90.0983.9494.93
WideResNet91.8988.2595.23
Inv390.3084.5094.43
LeNet99.20
AlexNet99.28
表 1  不同分类模型在原始数据集上的分类准确率
图 4  各类攻击方法在MNIST数据集上的目标攻击成功率
攻击方法tASR/%
VGG16ResNet18Inv3DenseNetWideResNet平均
MIM100.0083.4593.6493.9794.4191.37
Auto-PGD100.0073.6886.6887.3387.7483.86
DIM99.6478.2283.7284.2784.9582.79
TIM83.6629.7126.8226.2029.0827.95
SIM99.5373.4781.5881.7481.2579.51
VMI-FGSM99.9282.6490.3091.3290.7288.75
DO-M-DI295.0955.1267.4668.2572.1265.74
DeCowA98.9368.7671.4074.8677.0873.03
IDAA99.0486.7790.7589.7891.7489.76
LOGIT99.7574.4978.1180.2885.2579.53
POTRIP93.2251.3253.5854.1755.6453.68
AdvGAN97.0839.6861.0870.2865.3459.10
本研究方法98.3193.5296.0197.0997.6696.07
表 2  不同攻击方法在CIFAR10数据集上的目标攻击成功率
攻击方法tASR/%
VGG16ResNet18Inv3DenseNetWideResNet平均
MIM99.9197.3494.7895.7596.2896.04
Auto-PGD100.0091.1283.7286.9387.5487.33
DIM99.8191.7486.1488.6289.0688.89
TIM99.1883.6476.8279.7480.3280.13
SIM99.9194.9992.0693.3393.3693.44
VMI-FGSM99.9895.5193.3294.1794.4594.36
DO-M-DI299.8692.8287.3489.7890.7490.17
DeCowA99.1689.7783.8686.1386.6886.61
IDAA87.9783.5280.7880.3981.0581.44
LOGIT99.9796.8093.0894.6095.4994.99
POTRIP99.6992.1988.5688.4789.5289.69
AdvGAN97.5997.0094.7595.3095.9195.74
本研究方法97.9797.7997.0797.7897.6997.58
表 3  不同攻击方法在SVNH数据集上的目标攻击成功率
攻击方法tASR/%
VGG16ResNet18Inv3DenseNetWideResNet
MIM16.9319.6631.0322.9122.06
Auto-PGD17.4828.0757.5944.1139.26
DIM35.1757.9769.9964.1667.42
TIM35.4332.0724.3930.1333.63
SIM25.0940.3661.1552.0552.96
VMI-FGSM25.8336.5558.9147.0946.63
DO-M-DI211.0226.6937.0430.8741.12
DeCowA15.4124.9628.4525.5130.16
IDAA38.7748.9566.7959.9149.07
LOGIT39.5550.1059.2951.3855.14
POTRIP29.0837.2848.2242.8940.67
AdvGAN4.0931.4835.9535.1341.44
本研究方法28.4272.7383.9975.5486.51
表 4  CIFAR10数据集上各类攻击方法攻击经过对抗训练的模型时的目标攻击成功率
图 5  CIFAR10数据集上各类攻击方法攻击防御模型时的目标攻击成功率
数据集攻击方法tASR/%
ResNet18Inv3DenseNetWideResNet平均
CIFAR10AdvGAN39.6861.0870.2865.3459.10
AdvGAN+U93.0095.8096.1997.0395.51
本研究方法93.5296.0197.0997.6696.07
SVHNAdvGAN97.0094.7595.3095.9195.74
AdvGAN+U97.1196.1897.0497.1596.87
本研究方法97.7997.0797.7897.6997.58
表 5  不同生成式攻击方法在CIFAR10和SVHN数据集上的目标攻击成功率
图 6  所提方法在不同丢弃概率下的目标攻击成功率
图 7  所提方法在不同丢弃概率下攻击经过对抗训练的模型时的目标攻击成功率
1 侯小虎, 贾晓芬, 赵佰亭 眼底病变OCT图像的轻量化识别算法[J]. 浙江大学学报: 工学版, 2023, 57 (12): 2448- 2455
HOU Xiaohu, JIA Xiaofen, ZHAO Baiting Lightweight recognition algorithm for OCT images of fundus lesions[J]. Journal of Zhejiang University: Engineering Science, 2023, 57 (12): 2448- 2455
2 ZHAO Y, LV W, XU S, et al. DETRs beat YOLOs on real-time object detection [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 16965–16974.
3 HU Y, YANG J, CHEN L, et al. Planning-oriented autonomous driving [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Vancouver: IEEE, 2023: 17853–17862.
4 郑诚, 陈雪灵. 方面语义增强的融合网络用于方面级情感分析 [J/OL]. 小型微型计算机系统, 2024: 1–12. (2024-09-24). https://kns.cnki.net/kcms/detail/detail.aspx?filename=XXWX2024092000Q&dbname=CJFD&dbcode=CJFQ.
ZHENG Cheng, CHEN Xueling. Aspect semantic enhanced fusion network for aspect-based sentiment analysis [J/OL]. Journal of Chinese Computer Systems, 2024: 1–12. (2024-09-24). https://kns.cnki.net/kcms/detail/detail.aspx?filename=XXWX2024092000Q&dbname=CJFD&dbcode=CJFQ.
5 秦臻, 庄添铭, 朱国淞, 等 面向人工智能模型的安全攻击和防御策略综述[J]. 计算机研究与发展, 2024, 61 (10): 2627- 2648
QIN Zhen, ZHUANG Tianming, ZHU Guosong, et al Survey of security attack and defense strategies for artificial intelligence model[J]. Journal of Computer Research and Development, 2024, 61 (10): 2627- 2648
doi: 10.7544/issn1000-1239.202440449
6 SZEGEDY C, ZAREMBA W, SUTSKEVER I, et al. Intriguing properties of neural networks [C]// International Conference on Learning Representations. Banff: International Machine Learning Society, 2014: 1-10.
7 BAI Y, WANG Y, ZENG Y, et al Query efficient black-box adversarial attack on deep neural networks[J]. Pattern Recognition, 2023, 133: 109037
doi: 10.1016/j.patcog.2022.109037
8 REN Y, ZHU H, SUI X, et al Crafting transferable adversarial examples via contaminating the salient feature variance[J]. Information Sciences, 2023, 644: 119273
doi: 10.1016/j.ins.2023.119273
9 DONG Y, LIAO F, PANG T, et al. Boosting adversarial attacks with momentum [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City: IEEE, 2018: 9185–9193.
10 CROCE F, HEIN M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks [C]// Proceedings of the International Conference on Machine Learning. [S.l.]: JMLR. org, 2020: 2206–2216.
11 WANG X, HE K. Enhancing the transferability of adversarial attacks through variance tuning [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1924–1933.
12 PENG A, LIN Z, ZENG H, et al. Boosting transferability of adversarial example via an enhanced Euler’s method [C]// IEEE International Conference on Acoustics, Speech and Signal Processing. Rhodes Island: IEEE, 2023: 1–5.
13 WANG J, CHEN Z, JIANG K, et al Boosting the transferability of adversarial attacks with global momentum initialization[J]. Expert Systems with Applications, 2024, 255: 124757
doi: 10.1016/j.eswa.2024.124757
14 DONG Y, PANG T, SU H, et al. Evading defenses to transferable adversarial examples by translation-invariant attacks [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 4307–4316.
15 WU L, ZHAO L, PU B, et al. Boosting the transferability of adversarial examples via adaptive attention and gradient purification methods [C]// Proceedings of the International Joint Conference on Neural Networks. Yokohama: IEEE, 2024: 1–7.
16 ZHU Z, CHEN H, WANG X, et al. GE-AdvGAN: improving the transferability of adversarial samples by gradient editing-based adversarial generative model [C]// Proceedings of the 2024 SIAM International Conference on Data Mining. Houston: SIAM, 2024: 706–714.
17 QIAN Y, HE S, ZHAO C, et al. LEA2: a lightweight ensemble adversarial attack via non-overlapping vulnerable frequency regions [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 4487–4498.
18 LI M, DENG C, LI T, et al. Towards transferable targeted attack [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 638−646.
19 CHEN B, YIN J, CHEN S, et al. An adaptive model ensemble adversarial attack for boosting adversarial transferability [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Paris: IEEE, 2023: 4466–4475.
20 LIN J, SONG C, HE K, et al. Nesterov accelerated gradient and scale invariance for improving transferability of adversarial examples [C]// International Conference on Learning Representations. [S.l.]: International Machine Learning Society, 2020: 1–12.
21 XIE C, ZHANG Z, ZHOU Y, et al. Improving transferability of adversarial examples with input diversity [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 2725–2734.
22 QIAN Y, CHEN K, WANG B, et al Enhancing transferability of adversarial examples through mixed-frequency inputs[J]. IEEE Transactions on Information Forensics and Security, 2024, 19: 7633- 7645
doi: 10.1109/TIFS.2024.3430508
23 ZHAO Z, LIU Z, LARSON M. On success and simplicity: a second look at transferable targeted attacks [C]// Proceedings of the 35th International Conference on Neural Information Processing Systems. [S.l.]: NeurIPS Foundation, 2021: 6115–6128.
24 WU H, OU G, WU W, et al. Improving transferable targeted adversarial attacks with model self-enhancement [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2024: 24615–24624.
25 颜景坤. 人脸识别对抗攻击算法研究[D]. 武汉: 华中科技大学, 2020.
YAN Jingkun. Research of adversarial attack method in face recognition [D]. Wuhan: Huazhong University of Science and Technology, 2020.
26 XIAO C, LI B, ZHU J, et al. Generating adversarial examples with adversarial networks [C]// Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: AAAI Press, 2018: 3905–3911.
27 MOPURI K R, UPPALA P K, BABU R V. Ask, acquire, and attack: data-free UAP generation using class impressions [C]// Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 20–35.
28 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [C]// International Conference on Learning Representations. San Diego: ICLR, 2015: 1–14.
29 KRIZHEVSKY A, HINTON G. Learning multiple layers of features from tiny images [R]. Toronto: University of Toronto, 2009.
30 NETZER Y, WANG T, COATES A, et al. Reading digits in natural images with unsupervised feature learning [C]// Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning. Granada: NeurIPS Foundation, 2011: 1–9.
31 LECUN Y, CORTES C, BURGES C J C The MNIST database of handwritten digits[J]. Neural Computation, 1998, 10 (5): 1191- 1232
32 HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770–778.
33 HUANG G, LIU Z, VAN DER MAATEN L, et al. Densely connected convolutional networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261–2269
34 ZAGORUYKO S, KOMODAKIS N. Wide residual networks [C]// Proceedings of the British Machine Vision Conference. York: BMVA Press, 2016: 87.1–87.12.
35 SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 2818–2826.
36 KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe: NeurIPS Foundation, 2012: 1097–1105.
37 WANG X, HE K. Enhancing the transferability of adversarial attacks through variance tuning. [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 1924–1933.
38 LIN Q, LUO C, NIU Z, et al. Boosting adversarial transferability across model genus by deformation-constrained warping [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI Press, 2024: 3459–3467.
[1] 朱志航,闫云凤,齐冬莲. 基于扩散模型多模态提示的电力人员行为图像生成[J]. 浙江大学学报(工学版), 2026, 60(1): 43-51.
[2] 段继忠,李海源. 基于变分模型和Transformer的多尺度并行磁共振成像重建[J]. 浙江大学学报(工学版), 2025, 59(9): 1826-1837.
[3] 王福建,张泽天,陈喜群,王殿海. 基于多通道图聚合注意力机制的共享单车借还量预测[J]. 浙江大学学报(工学版), 2025, 59(9): 1986-1995.
[4] 张弘,张学成,王国强,顾潘龙,江楠. 基于三维视觉的软体机器人实时定位与控制[J]. 浙江大学学报(工学版), 2025, 59(8): 1574-1582.
[5] 王圣举,张赞. 基于加速扩散模型的缺失值插补算法[J]. 浙江大学学报(工学版), 2025, 59(7): 1471-1480.
[6] 章东平,王大为,何数技,汤斯亮,刘志勇,刘中秋. 基于跨维度特征融合的航空发动机寿命预测[J]. 浙江大学学报(工学版), 2025, 59(7): 1504-1513.
[7] 蔡永青,韩成,权巍,陈兀迪. 基于注意力机制的视觉诱导晕动症评估模型[J]. 浙江大学学报(工学版), 2025, 59(6): 1110-1118.
[8] 王立红,刘新倩,李静,冯志全. 基于联邦学习和时空特征融合的网络入侵检测方法[J]. 浙江大学学报(工学版), 2025, 59(6): 1201-1210.
[9] 徐慧智,王秀青. 基于车辆图像特征的前车距离与速度感知[J]. 浙江大学学报(工学版), 2025, 59(6): 1219-1232.
[10] 陈赞,李冉,冯远静,李永强. 基于时间维超分辨率的视频快照压缩成像重构[J]. 浙江大学学报(工学版), 2025, 59(5): 956-963.
[11] 马莉,王永顺,胡瑶,范磊. 预训练长短时空交错Transformer在交通流预测中的应用[J]. 浙江大学学报(工学版), 2025, 59(4): 669-678.
[12] 陈巧红,郭孟浩,方贤,孙麒. 基于跨模态级联扩散模型的图像描述方法[J]. 浙江大学学报(工学版), 2025, 59(4): 787-794.
[13] 顾正宇,赖菲菲,耿辰,王希明,戴亚康. 基于知识引导的缺血性脑卒中梗死区分割方法[J]. 浙江大学学报(工学版), 2025, 59(4): 814-820.
[14] 姚明辉,王悦燕,吴启亮,牛燕,王聪. 基于小样本人体运动行为识别的孪生网络算法[J]. 浙江大学学报(工学版), 2025, 59(3): 504-511.
[15] 梁礼明,龙鹏威,金家新,李仁杰,曾璐. 基于改进YOLOv8s的钢材表面缺陷检测算法[J]. 浙江大学学报(工学版), 2025, 59(3): 512-522.