Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2024, Vol. 58 Issue (8): 1728-1737    DOI: 10.3785/j.issn.1008-973X.2024.08.019
    
Image cartoonization incorporating attention mechanism and structural line extraction
Canlin LI1(),Xinyue WANG1,Lizhuang MA2,Zhiwen SHAO2,3,Wenjiao ZHANG1
1. School of Computer and Communication Engineering, Zhengzhou University of Light Industry, Zhengzhou 450000, China
2. Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai 200240, China
3. School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China
Download: HTML     PDF(5408KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

An image cartoonization method that incorporated attention mechanism and structural line extraction was proposed in order to address the problem that image cartoonization does not highlight important feature information in the image and insufficient edge processing. The generator network with fused attention mechanism was constructed, which extracted more important and richer image information from different features by fusing the connections between features in space and channels. A line extraction region processing module (LERM) in parallel with the global one was designed to perform adversarial training on the edge regions of cartoon textures in order to better learn cartoon textures. This method not only generates cartoonish images with high perceptual quality in terms of important areas and details, but also avoids the loss of content and color. The extensive experimental results showed that the proposed method achieved better cartoonization, which validated the effectiveness of the method.



Key wordsgenerative adversarial network      image cartoonization      attention mechanism      structural line extraction      edge detection     
Received: 01 July 2023      Published: 23 July 2024
CLC:  TP 391  
Fund:  国家自然科学基金资助项目(61972157,62106268);河南省科技攻关项目(242102211003);上海市科技创新行动计划人工智能科技支撑项目(21511101200);江苏省“双创博士”人才资助项目(JSSCBS20211220).
Cite this article:

Canlin LI,Xinyue WANG,Lizhuang MA,Zhiwen SHAO,Wenjiao ZHANG. Image cartoonization incorporating attention mechanism and structural line extraction. Journal of ZheJiang University (Engineering Science), 2024, 58(8): 1728-1737.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2024.08.019     OR     https://www.zjujournals.com/eng/Y2024/V58/I8/1728


融合注意力机制和结构线提取的图像卡通化

为了解决图像卡通化没有突出表达图像中的重要特征信息及边缘处理不足的问题,提出融合注意力机制和结构线提取的图像卡通化方法. 构建融合注意力机制的生成器网络,通过空间和通道融合特征间的联系,从不同的特征中提取更加重要和丰富的图像信息. 为了更好地实现对卡通纹理的学习,设计与全局并行的线提取区域处理模块(LERM),以便对卡通纹理的边缘区域进行对抗性训练. 该方法不仅在重要区域和细节方面生成了高感知质量的卡通化图像,而且避免了内容和颜色的损失. 大量的实验结果表明,利用该方法取得了更好的卡通风格化效果,验证了该方法的有效性,.


关键词: 生成式对抗网络,  图像卡通化,  注意力机制,  结构线提取,  边缘检测 
Fig.1 Overall architecture of proposed method
Fig.2 CBAM_Resblock
Fig.3 Line extraction region processing module
Fig.4 Comparison of line extraction
Fig.5 Three different style transformations
Fig.6 Comparison of image stylization
Fig.7 Comparison chart of same style
风格FID
CartoonGANWhite-boxSDP-GANAnimeGANv2MS-CartoonGAN本文方法
Shinkai140.5126.8122.7121.3146.3120.2
Hayao166.6154.6125.6144.4140.8123.0
Hosoda163.2155.3136.6141.5160.1133.7
Tab.1 FID values for generated image and target image
Fig.8 Comparison chart of loss function ablation experiment
Fig.9 Comparison chart of component ablation experiment
序号模型FID
A局部分支177.3
B全局分支135.8
C无颜色重建损失135.0
D无注意力模块134.5
ECanny134.0
F整个模型133.7
Tab.2 FID value of ablation experiment
[1]   梅洪, 陈昭炯 基于Mean Shift和FDoG的图像卡通化渲染[J]. 计算机工程与应用, 2016, 52 (10): 213- 217
MEI Hong, CHEN Zhaojiong Cartoonish rendering of images based on Mean Shift and FDoG[J]. Computer Engineering and Applications, 2016, 52 (10): 213- 217
doi: 10.3778/j.issn.1002-8331.1407-0015
[2]   刘侠 基于OpencCV中Mean Shift的图像卡通化处理[J]. 信息与电脑: 理论版, 2020, 32 (20): 54- 57
LIU Xia Image cartoon processing based on Mean Shift in OpencCV[J]. Information and Computer: Theory Edition, 2020, 32 (20): 54- 57
[3]   CANNY J A computational approach to edge detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1986, PAMI-8 (6): 679- 698
doi: 10.1109/TPAMI.1986.4767851
[4]   WU Z, ZHU Z, DU J, et al. CCPL: contrastive coherence preserving loss for versatile style transfer [C]// 17th European Conference on Computer Vision . Cham: Springer, 2022: 189-206.
[5]   ZHANG Y, TANG F, DONG W, et al. Domain enhanced arbitrary image style transfer via contrastive learning [C]// ACM SIGGRAPH 2022 Conference Proceedings . Vancouver: ACM, 2022: 1-8.
[6]   ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]// IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2223-2232.
[7]   CHEN Y, LAI Y K, LIU Y J. Cartoongan: generative adversarial networks for photo cartoonization [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Salt Lake City: IEEE, 2018: 9465-9474.
[8]   PĘŚKO M, SVYSTUN A, ANDRUSZKIEWICZ P, et al Comixify: transform video into comics[J]. Fundamenta Informaticae, 2019, 168 (2-4): 311- 333
doi: 10.3233/FI-2019-1834
[9]   WANG X, YU J. Learning to cartoonize using white-box cartoon representations [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Seattle: IEEE, 2020: 8090-8099.
[10]   LI R, WU C H, LIU S, et al SDP-GAN: saliency detail preservation generative adversarial networks for high perceptual quality style transfer[J]. IEEE Transactions on Image Processing, 2020, 30: 374- 385
[11]   CHEN J, LIU G, CHEN X. Animegan: a novel lightweight GAN for photo animation [C]// International Symposium on Intelligence Computation and Applications . Singapore: Springer, 2020: 242-256.
[12]   GATYS L A, ECKER A S, BETHGE M. Image style transfer using convolutional neural networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 2414-2423.
[13]   SHU Y, YI R, XIA M, et al GAN-based multi-style photo cartoonization[J]. IEEE Transactions on Visualization and Computer Graphics, 2021, 28 (10): 3376- 3390
[14]   DONG Y, TAN W, TAO D, et al cartoonLossGAN: learning surface and coloring of images for cartoonization[J]. IEEE Transactions on Image Processing, 2021, 31: 485- 498
[15]   GAO X, ZHANG Y, TIAN Y. Learning to incorporate texture saliency adaptive attention to image cartoonization [EB/OL]. (2022-08-02)[2023-06-01]. https://arxiv.org/abs/2208.01587.
[16]   KANG H, LEE S, CHUI C K Flow-based image abstraction[J]. IEEE Transactions on Visualization and Computer Graphics, 2008, 15 (1): 62- 76
[17]   WINNEMOELLER H, KYPRIANIDIS J E, OLSEN S C XDoG: an extended difference-of-Gaussians compendium including advanced image stylization[J]. Computers and Graphics, 2012, 36 (6): 740- 753
doi: 10.1016/j.cag.2012.03.004
[18]   SÝKORA D, BURIÁNEK J, ŽÁRA J Segmentation of black and white cartoons[J]. Image and Vision Computing, 2005, 23 (9): 767- 782
doi: 10.1016/j.imavis.2005.05.010
[19]   SÝKORA D, BURIÁNEK J, ŽÁRA J. Sketching cartoons by example [C]// Proceedings of Eurographics Workshop on Sketch Based Interfaces and Modeling . Schoten: Eurographics Association, 2005: 27-33.
[20]   ZHANG S H, CHEN T, ZHANG Y F, et al Vectorizing cartoon animations[J]. IEEE Transactions on Visualization and Computer Graphics, 2009, 15 (4): 618- 629
doi: 10.1109/TVCG.2009.9
[21]   LIU X, MAO X, YANG X, et al Stereoscopizing cel animations[J]. ACM Transactions on Graphics, 2013, 32 (6): 1- 10
[22]   LI C, LIU X, WONG T T Deep extraction of manga structural lines[J]. ACM Transactions on Graphics, 2017, 36 (4): 1- 12
[23]   WOO S, PARK J, LEE J Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the European Conference on Computer Vision . Munich: Elsevier, 2018: 3-19.
[24]   RUSSAKOVSKY O, DENG J, SU H, et al Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115 (3): 211- 252
doi: 10.1007/s11263-015-0816-y
[25]   SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. (2014-09-04)[2023-06-01]. https://arxiv.org/abs/1409.1556.
[26]   MAO X, LI Q, XIE H, et al. Least squares generative adversarial networks [C]// Proceedings of the IEEE International Conference on Computer Vision . Venice: IEEE, 2017: 2794-2802.
[27]   HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium [C]// Advances in Neural Information Processing Systems . Long Beach: MIT Press, 2017: 30.
[28]   SZEGEDY C, VANHOUCKE V, IOFFE S, et al. Rethinking the inception architecture for computer vision [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition . Las Vegas: IEEE, 2016: 2818-2826.
[1] Shuhan WU,Dan WANG,Yuanfang CHEN,Ziyu JIA,Yueqi ZHANG,Meng XU. Attention-fused filter bank dual-view graph convolution motor imagery EEG classification[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1326-1335.
[2] Xianwei MA,Chaohui FAN,Weizhi NIE,Dong LI,Yiqun ZHU. Robust fault diagnosis method for failure sensors[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(7): 1488-1497.
[3] Jun YANG,Chen ZHANG. Semantic segmentation of 3D point cloud based on boundary point estimation and sparse convolution neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1121-1132.
[4] Yuntang LI,Hengjie LI,Kun ZHANG,Binrui WANG,Shanyue GUAN,Yuan CHEN. Recognition of complex power lines based on novel encoder-decoder network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(6): 1133-1141.
[5] Zhiwei XING,Shujie ZHU,Biao LI. Airline baggage feature perception based on improved graph convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 941-950.
[6] Yi LIU,Yidan CHEN,Lin GAO,Jiao HONG. Lightweight road extraction model based on multi-scale feature fusion[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 951-959.
[7] Cuiting WEI,Weijian ZHAO,Bochao SUN,Yunyi LIU. Intelligent rebar inspection based on improved Mask R-CNN and stereo vision[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(5): 1009-1019.
[8] Yin CAO,Junping QIN,Tong GAO,Qianli MA,Jiaqi REN. Generative adversarial network based two-stage generation of high-quality images from text[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 674-683.
[9] Hai HUAN,Yu SHENG,Chenxi GU. Global guidance multi-feature fusion network based on remote sensing image road extraction[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(4): 696-707.
[10] Mingjun SONG,Wen YAN,Yizhao DENG,Junran ZHANG,Haiyan TU. Light-weight algorithm for real-time robotic grasp detection[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 599-610.
[11] Yin CAO,Junping QIN,Qianli MA,Hao SUN,Kai YAN,Lei WANG,Jiaqi REN. Survey of text-to-image synthesis[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 219-238.
[12] Canlin LI,Wenjiao ZHANG,Zhiwen SHAO,Lizhuang MA,Xinyue WANG. Semantic segmentation method on nighttime road scene based on Trans-nightSeg[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 294-303.
[13] Xinhua YAO,Tao YU,Senwen FENG,Zijian MA,Congcong LUAN,Hongyao SHEN. Recognition method of parts machining features based on graph neural network[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(2): 349-359.
[14] Siyi QIN,Shaoyan GAI,Feipeng DA. Video object detection algorithm based on multi-level feature aggregation under mixed sampler[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(1): 10-19.
[15] Hai-feng LI,Xue-ying ZHANG,Shu-fei DUAN,Hai-rong JIA,Hui-zhi LIANG. Fusing generative adversarial network and temporal convolutional network for Mandarin emotion recognition[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(9): 1865-1875.