Please wait a minute...
浙江大学学报(工学版)  2024, Vol. 58 Issue (8): 1728-1737    DOI: 10.3785/j.issn.1008-973X.2024.08.019
计算机技术、控制工程     
融合注意力机制和结构线提取的图像卡通化
李灿林1(),王新玥1,马利庄2,邵志文2,3,张文娇1
1. 郑州轻工业大学 计算机与通信工程学院,河南 郑州 450000
2. 上海交通大学 计算机科学与工程系,上海 200240
3. 中国矿业大学 计算机科学与技术学院,江苏 徐州 221116
Image cartoonization incorporating attention mechanism and structural line extraction
Canlin LI1(),Xinyue WANG1,Lizhuang MA2,Zhiwen SHAO2,3,Wenjiao ZHANG1
1. School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
3. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
 全文: PDF(5408 KB)   HTML
摘要:

为了解决图像卡通化没有突出表达图像中的重要特征信息及边缘处理不足的问题,提出融合注意力机制和结构线提取的图像卡通化方法. 构建融合注意力机制的生成器网络,通过空间和通道融合特征间的联系,从不同的特征中提取更加重要和丰富的图像信息. 为了更好地实现对卡通纹理的学习,设计与全局并行的线提取区域处理模块(LERM),以便对卡通纹理的边缘区域进行对抗性训练. 该方法不仅在重要区域和细节方面生成了高感知质量的卡通化图像,而且避免了内容和颜色的损失. 大量的实验结果表明,利用该方法取得了更好的卡通风格化效果,验证了该方法的有效性,.

关键词: 生成式对抗网络图像卡通化注意力机制结构线提取边缘检测    
Abstract:

An image cartoonization method that incorporated attention mechanism and structural line extraction was proposed in order to address the problem that image cartoonization does not highlight important feature information in the image and insufficient edge processing. The generator network with fused attention mechanism was constructed, which extracted more important and richer image information from different features by fusing the connections between features in space and channels. A line extraction region processing module (LERM) in parallel with the global one was designed to perform adversarial training on the edge regions of cartoon textures in order to better learn cartoon textures. This method not only generates cartoonish images with high perceptual quality in terms of important areas and details, but also avoids the loss of content and color. The extensive experimental results showed that the proposed method achieved better cartoonization, which validated the effectiveness of the method.

Key words: generative adversarial network    image cartoonization    attention mechanism    structural line extraction    edge detection
收稿日期: 2023-07-01 出版日期: 2024-07-23
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61972157,62106268);河南省科技攻关项目(242102211003);上海市科技创新行动计划人工智能科技支撑项目(21511101200);江苏省“双创博士”人才资助项目(JSSCBS20211220).
作者简介: 李灿林(1976— ),男,副教授,从事图像处理和计算机视觉的研究. orcid.org/0000-0003-0307-3910. E-mail:lcl_zju@aliyun.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李灿林
王新玥
马利庄
邵志文
张文娇

引用本文:

李灿林,王新玥,马利庄,邵志文,张文娇. 融合注意力机制和结构线提取的图像卡通化[J]. 浙江大学学报(工学版), 2024, 58(8): 1728-1737.

Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2024.08.019        https://www.zjujournals.com/eng/CN/Y2024/V58/I8/1728

图 1  提出方法的总体架构
图 2  通道空间注意力残差块
图 3  线提取区域处理模块
图 4  线条提取的比较
图 5  3种不同风格的转换
图 6  图像风格化的对比
图 7  同一种风格的对比图
风格FID
CartoonGANWhite-boxSDP-GANAnimeGANv2MS-CartoonGAN本文方法
Shinkai140.5126.8122.7121.3146.3120.2
Hayao166.6154.6125.6144.4140.8123.0
Hosoda163.2155.3136.6141.5160.1133.7
表 1  生成图像和目标图像的FID值
图 8  损失函数消融实验的对比图
图 9  组件消融实验的对比图
序号模型FID
A局部分支177.3
B全局分支135.8
C无颜色重建损失135.0
D无注意力模块134.5
ECanny134.0
F整个模型133.7
表 2  消融实验的FID值
1 梅洪, 陈昭炯 基于Mean Shift和FDoG的图像卡通化渲染[J]. 计算机工程与应用, 2016, 52 (10): 213- 217
MEI Hong, CHEN Zhaojiong Cartoonish rendering of images based on Mean Shift and FDoG[J]. Computer Engineering and Applications, 2016, 52 (10): 213- 217
doi: 10.3778/j.issn.1002-8331.1407-0015
2 刘侠 基于OpencCV中Mean Shift的图像卡通化处理[J]. 信息与电脑: 理论版, 2020, 32 (20): 54- 57
LIU Xia Image cartoon processing based on Mean Shift in OpencCV[J]. Information and Computer: Theory Edition, 2020, 32 (20): 54- 57
3 CANNY J A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8 (6): 679- 698
doi: 10.1109/TPAMI.1986.4767851
4 WU Z, ZHU Z, DU J, et al. CCPL: contrastive coherence preserving loss for versatile style transfer [C]// 17th European Conference on Computer Vision . Cham: Springer, 2022: 189-206.
5 ZHANG Y, TANG F, DONG W, et al. Domain enhanced arbitrary image style transfer via contrastive learning [C]// ACM SIGGRAPH 2022 Conference Proceedings . Vancouver: ACM, 2022: 1-8.
6 ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]// IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2223-2232.
7 CHEN Y, LAI Y K, LIU Y J. Cartoongan: generative adversarial networks for photo cartoonization [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9465-9474.
8 PĘŚKO M, SVYSTUN A, ANDRUSZKIEWICZ P, et al Comixify: transform video into comics[J]. Fundamenta Informaticae, 2019, 168 (2-4): 311- 333
doi: 10.3233/FI-2019-1834
9 WANG X, YU J. Learning to cartoonize using white-box cartoon representations [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 8090-8099.
10 LI R, WU C H, LIU S, et al SDP-GAN: saliency detail preservation generative adversarial networks for high perceptual quality style transfer[J]. IEEE Transactions on Image Processing, 2020, 30: 374- 385
11 CHEN J, LIU G, CHEN X. Animegan: a novel lightweight GAN for photo animation [C]// International Symposium on Intelligence Computation and Applications . Singapore: Springer, 2020: 242-256.
12 GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 2414-2423.
13 SHU Y, YI R, XIA M, et al GAN-based multi-style photo cartoonization[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 28 (10): 3376- 3390
14 DONG Y, TAN W, TAO D, et al cartoonLossGAN: learning surface and coloring of images for cartoonization[J]. IEEE Transactions on Image Processing, 2021, 31: 485- 498
15 GAO X, ZHANG Y, TIAN Y. Learning to incorporate texture saliency adaptive attention to image cartoonization [EB/OL]. (2022-08-02)[2023-06-01]. https://arxiv.org/abs/2208.01587.
16 KANG H, LEE S, CHUI C K Flow-based image abstraction[J]. IEEE Transactions on Visualization and Computer Graphics, 2008, 15 (1): 62- 76
17 WINNEMOELLER H, KYPRIANIDIS J E, OLSEN S C XDoG: an extended difference-of-Gaussians compendium including advanced image stylization[J]. Computers and Graphics, 2012, 36 (6): 740- 753
doi: 10.1016/j.cag.2012.03.004
18 SÝKORA D, BURIÁNEK J, ŽÁRA J Segmentation of black and white cartoons[J]. Image and Vision Computing, 2005, 23 (9): 767- 782
doi: 10.1016/j.imavis.2005.05.010
19 SÝKORA D, BURIÁNEK J, ŽÁRA J. Sketching cartoons by example [C]// Proceedings of Eurographics Workshop on Sketch Based Interfaces and Modeling . Schoten: Eurographics Association, 2005: 27-33.
20 ZHANG S H, CHEN T, ZHANG Y F, et al Vectorizing cartoon animations[J]. IEEE Transactions on Visualization and Computer Graphics, 2009, 15 (4): 618- 629
doi: 10.1109/TVCG.2009.9
21 LIU X, MAO X, YANG X, et al Stereoscopizing cel animations[J]. ACM Transactions on Graphics, 2013, 32 (6): 1- 10
22 LI C, LIU X, WONG T T Deep extraction of manga structural lines[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 12
23 WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Munich: Elsevier, 2018: 3-19.
24 RUSSAKOVSKY O, DENG J, SU H, et al Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
25 SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2014-09-04)[2023-06-01]. https://arxiv.org/abs/1409.1556.
26 MAO X, LI Q, XIE H, et al. Least squares generative adversarial networks [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2794-2802.
27 HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C]// Advances in Neural Information Processing Systems . Long Beach: MIT Press, 2017: 30.
28 SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 2818-2826.
[1] 吴书晗,王丹,陈远方,贾子钰,张越棋,许萌. 融合注意力的滤波器组双视图图卷积运动想象脑电分类[J]. 浙江大学学报(工学版), 2024, 58(7): 1326-1335.
[2] 马现伟,范朝辉,聂为之,李东,朱逸群. 对失效传感器具备鲁棒性的故障诊断方法[J]. 浙江大学学报(工学版), 2024, 58(7): 1488-1497.
[3] 杨军,张琛. 基于边界点估计与稀疏卷积神经网络的三维点云语义分割[J]. 浙江大学学报(工学版), 2024, 58(6): 1121-1132.
[4] 李运堂,李恒杰,张坤,王斌锐,关山越,陈源. 基于新型编码解码网络的复杂输电线识别[J]. 浙江大学学报(工学版), 2024, 58(6): 1133-1141.
[5] 邢志伟,朱书杰,李彪. 基于改进图卷积神经网络的航空行李特征感知[J]. 浙江大学学报(工学版), 2024, 58(5): 941-950.
[6] 刘毅,陈一丹,高琳,洪姣. 基于多尺度特征融合的轻量化道路提取模型[J]. 浙江大学学报(工学版), 2024, 58(5): 951-959.
[7] 魏翠婷,赵唯坚,孙博超,刘芸怡. 基于改进Mask R-CNN与双目视觉的智能配筋检测[J]. 浙江大学学报(工学版), 2024, 58(5): 1009-1019.
[8] 宦海,盛宇,顾晨曦. 基于遥感图像道路提取的全局指导多特征融合网络[J]. 浙江大学学报(工学版), 2024, 58(4): 696-707.
[9] 宋明俊,严文,邓益昭,张俊然,涂海燕. 轻量化机器人抓取位姿实时检测算法[J]. 浙江大学学报(工学版), 2024, 58(3): 599-610.
[10] 姚鑫骅,于涛,封森文,马梓健,栾丛丛,沈洪垚. 基于图神经网络的零件机加工特征识别方法[J]. 浙江大学学报(工学版), 2024, 58(2): 349-359.
[11] 秦思怡,盖绍彦,达飞鹏. 混合采样下多级特征聚合的视频目标检测算法[J]. 浙江大学学报(工学版), 2024, 58(1): 10-19.
[12] 李海烽,张雪英,段淑斐,贾海蓉,Huizhi Liang . 融合生成对抗网络与时间卷积网络的普通话情感识别[J]. 浙江大学学报(工学版), 2023, 57(9): 1865-1875.
[13] 赵小强,王泽,宋昭漾,蒋红梅. 基于动态注意力网络的图像超分辨率重建[J]. 浙江大学学报(工学版), 2023, 57(8): 1487-1494.
[14] 王慧欣,童向荣. 融合知识图谱的推荐系统研究进展[J]. 浙江大学学报(工学版), 2023, 57(8): 1527-1540.
[15] 宋秀兰,董兆航,单杭冠,陆炜杰. 基于时空融合的多头注意力车辆轨迹预测[J]. 浙江大学学报(工学版), 2023, 57(8): 1636-1643.