Please wait a minute...
浙江大学学报(工学版)  2023, Vol. 57 Issue (7): 1326-1334    DOI: 10.3785/j.issn.1008-973X.2023.07.007
自动化技术     
基于改进生成对抗网络的书法字生成算法
李云红(),段姣姣,苏雪平,张蕾涛,于惠康,刘杏瑞
西安工程大学 电子信息学院,陕西 西安 710048
Calligraphy generation algorithm based on improved generative adversarial network
Yun-hong LI(),Jiao-jiao DUAN,Xue-ping SU,Lei-tao ZHANG,Hui-kang YU,Xing-rui LIU
School of Electronics and Information, Xi’an Polytechnic University, Xi’an 710048, China
 全文: PDF(2087 KB)   HTML
摘要:

针对生成对抗网络生成字体存在笔画缺失、字形结构错乱、图像模糊与质量差的问题,提出改进zi2zi生成对抗网络的书法字生成算法. 在编码器中引入卷积核为1的残差块,提高生成器提取书法字体细节特征的能力,通过增加上下文感知注意力结构提取书法字体的风格特征. 在判别器中利用谱归一化增强模型的稳定性,避免因模型训练不稳定而带来的模式崩塌. 采用最小绝对误差L1范数约束生成字体边缘特征,使得字体轮廓更加清晰,最终生成2种风格的书法字. 颜真卿楷书与赵孟頫行书目标风格数据集的测试结果表明,提出算法的主观客观评价结果均优于对比算法,与zi2zi相比,峰值信噪比分别提高了1.58、1.76 dB,结构相似性分别提高了5.66%、6.91%,感知相似性分别降低了4.21%、6.20%.

关键词: 书法字生成深度学习生成对抗网络上下文感知注意力边缘损失    
Abstract:

An improved zi2zi generative adversarial network for generating calligraphic characters was proposed to solve the problems of missing strokes, wrong glyph structure, blurred images and poor quality in generating fonts by generative adversarial networks. The residual block with a convolution kernel of 1 was introduced into the encoder to improve the ability of the generator to extract detailed features of calligraphic fonts. The context-aware attention structure was increased to extract the stylistic features of calligraphic fonts. The spectral normalization was used in the discriminator to enhance model’s stability, which avoided pattern collapse due to unstable model training. The minimum absolute error L1 norm was used to constraint the font edge features, which made the font outline clearer. Then two styles of calligraphy characters were generated. The test results of the target style datasets of Yan Zhenqing Regular Script and Zhao Mengfu Running Script showed that the subjective and objective evaluation results of the proposed algorithm were better than the comparison algorithm compared with zi2zi. The peak signal-to-noise ratio was increased by 1.58 and 1.76 dB respectively, the structural similarity was increased by 5.66% and 6.91% respectively, and the perceptual similarity was reduced by 4.21% and 6.20% respectively.

Key words: calligraphy generation    deep learning    generative adversarial network    context-aware attention    edge loss
收稿日期: 2022-08-02 出版日期: 2023-07-17
CLC:  TP 391  
基金资助: 国家自然科学基金资助项目(61902301);陕西省科技厅自然科学基础研究重点资助项目(2022JZ-35)
作者简介: 李云红(1974—),女,教授,硕导,从事红外热像测温技术、图像处理、信号与信息处理技术的研究. orcid.org/0000-0001-8080-1040. E-mail: hitliyunhong@163.com
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
李云红
段姣姣
苏雪平
张蕾涛
于惠康
刘杏瑞

引用本文:

李云红,段姣姣,苏雪平,张蕾涛,于惠康,刘杏瑞. 基于改进生成对抗网络的书法字生成算法[J]. 浙江大学学报(工学版), 2023, 57(7): 1326-1334.

Yun-hong LI,Jiao-jiao DUAN,Xue-ping SU,Lei-tao ZHANG,Hui-kang YU,Xing-rui LIU. Calligraphy generation algorithm based on improved generative adversarial network. Journal of ZheJiang University (Engineering Science), 2023, 57(7): 1326-1334.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2023.07.007        https://www.zjujournals.com/eng/CN/Y2023/V57/I7/1326

图 1  zi2zi网络结构
图 2  改进后的网络整体框架
网络层类型 网络参数
卷积层 4×4 Conv, 64 filters, 2 strides
卷积单元层 4×4 Conv-IN-LRelu, 128 filters, 2 strides
卷积单元层 4×4 Conv-IN-LRelu, 256 filters, 2 strides
卷积单元层 4×4 Conv-IN-LRelu, 512 filters, 2 strides
卷积单元层 4×4 Conv-IN-LRelu, 512 filters, 2 strides
卷积单元层 4×4 Conv-IN-LRelu, 512 filters, 2 strides
残差单元层 Resnet Block 1×1 Conv, 512 filters,1stride
卷积单元层 4×4 Conv-IN-LRelu, 512 filters, 2 strides
残差单元层 Resnet Block 1×1 Conv, 512 filters, 1 stride
卷积层 4×4 Conv-LRelu, 512 filters, 2 strides
表 1  内容编码器的网络参数
图 3  残差单元
图 4  上下文感知注意力模块
图 5  自建的2种数据集样例
图 6  不同模块的消融实验(目标字体:颜真卿楷书)
图 7  不同模块的消融实验(目标字体:赵孟頫行书)
网络模型 SSIM PSNR/dB LPIPS
zi2zi 0.6975 9.3097 0.2383
zi2zi+残差 0.7115 9.8567 0.2452
zi2zi+残差+风格编码 0.7394 10.8484 0.1842
zi2zi+残差+风格编码+边缘损失 0.7425 10.8228 0.1821
表 2  消融实验的评价指标(目标字体:颜真卿楷书)
网络模型 SSIM PSNR/dB LPIPS
zi2zi 0.6917 8.5425 0.2947
zi2zi+残差 0.6918 8.4810 0.2845
zi2zi+残差+风格编码 0.7607 10.8883 0.2177
zi2zi+残差+风格编码+边缘损失 0.7665 10.9052 0.2152
表 3  消融实验的评价指标(目标字体:赵孟頫行书)
图 8  不同算法的生成结果(目标字体:颜真卿楷书)
图 9  不同算法的生成结果(目标字体:赵孟頫行书)
算法 SSIM PSNR/dB LPIPS
CycleGAN 0.5492 7.2531 0.3300
DenseNet CycleGAN 0.4958 7.2698 0.3472
zi2zi 0.6728 9.0505 0.2289
EMD 0.5567 8.7037 0.3229
CalliGAN 0.6254 9.2908 0.2385
LFFont 0.6016 9.1077 0.2311
MXFont 0.6151 9.3029 0.2497
本文算法 0.7249 10.6253 0.1868
表 4  不同算法的评价指标(目标字体:颜真卿楷书)
算法 SSIM PSNR/dB LPIPS
CycleGAN 0.5617 7.1628 0.3712
DenseNet CycleGAN 0.5317 6.9037 0.3608
zi2zi 0.6348 7.7615 0.3223
EMD 0.5614 8.4337 0.3595
CalliGAN 0.5737 7.8590 0.3271
LFFont 0.5941 8.1400 0.3025
MXFont 0.6078 7.8666 0.3087
本文算法 0.7039 9.5221 0.2603
表 5  不同算法的评价指标(目标字体:赵孟頫行书)
图 10  源字体为楷体的结果
图 11  源字体为宋体的结果
源字体 目标字体 SSIM PSNR/dB LPIPS
楷体 楷书 0.7044 10.1207 0.1965
楷体 行书 0.6914 9.2937 0.2503
宋体 楷书 0.7153 10.4740 0.2042
宋体 行书 0.6971 9.5003 0.2495
表 6  不同源字体生成目标字体的评价指标
图 12  训练过程的损失曲线图
1 程若然, 赵晓丽, 周浩军, 等 基于深度学习的中文字体风格转换研究综述[J]. 浙江大学学报: 工学版, 2022, 56 (3): 510- 519
CHENG Ruo-ran, ZHAO Xiao-li, ZHOU Hao-jun, et al Review of Chinese font style conversion based on deep learning[J]. Journal of Zhejiang University: Engineering Science, 2022, 56 (3): 510- 519
2 曾锦山, 陈琪, 王明文 基于田字格变换的自监督汉字字体生成[J]. 中国科学: 信息科学, 2022, 52 (1): 145- 159
ZENG Jin-shan, CHEN Qi, WANG Ming-wen Self-supervised font generation of Chinese characters based on Tian Zige transformation[J]. Science of China: Information Science, 2022, 52 (1): 145- 159
3 黄子君, 陈琪, 罗文兵 基于深度学习的汉字生成方法[J]. 计算机工程与应用, 2021, 57 (17): 29- 36
HUANG Zi-jun, CHEN Qi, LUO Wen-bing Chinese character generation method based on deep learning[J]. Computer Engineering and Application, 2021, 57 (17): 29- 36
doi: 10.3778/j.issn.1002-8331.2103-0297
4 TIAN Y. Rewrite [CP/OL]. [2022-08-02]. https://github.com/kaonashi-tyc/Rewrite.
5 ISOLA P, ZHU J Y, ZHOU T, et al. Image-to-image translation with conditional adversarial networks [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2017: 5967-5976.
6 TIAN Y. Zi2zi [CP/OL]. [2022-08-02]. https://github.com/kaonashi-tyc/zi2zi.
7 JIANG Y, LIAN Z, TANG Y, et al. DCFont: an end-to-end deep Chinese font generation system [C]// SIGGRAPH Asia 2017 Technical Briefs. New York: ACM, 2017: 1-4.
8 CHANG B, ZHANG Q, PAN S, et al. Generating handwritten Chinese characters using CycleGAN [C]// 2018 IEEE Winter Conference on Applications of Computer Vision. Lake Tahoe: IEEE, 2018: 199-207.
9 ZHU J Y, PARK T, ISOLA P, et al. Unpaired image-to-image translation using cycle-consistent adversarial networks [C]// 2017 IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 2242-2251.
10 REN C, LYU S, ZHAN H, et al SAFont: automatic font synthesis using self-attention mechanisms[J]. Australia Journal of Intelligent Information Processing Systems, 2019, 16 (2): 19- 25
11 WU S J, YANG C Y, HSU J Y. CalliGAN: style and structure-aware Chinese calligraphy character generator [EB/OL]. (2020-05-26) [2022-08-02]. https://arxiv.org/abs/2005.12500.
12 PARK S, CHUN S, CHA J, et al. Few-shot font generation with localized style representations and factorization [C]// Proceedings of the AAAI Conference on Artificial Intelligence. Vancouver: AAAI, 2021, 35(3): 2393-2402.
13 XIE Y, CHEN X, SUN L, et al. DGFont: deformable generative networks for unsupervised font generation [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville: IEEE, 2021: 5130-5140.
14 PARK S, CHUN S, CHA J, et al. Multiple heads are better than one: few-shot font generation with multiple localized experts [C]// Proceedings of the IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 13900-13909.
15 KONG Y, LUO C, MA W, et al. Look closer to supervise better: one-shot font generation via component-based discriminator [C]// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. New Orleans: IEEE, 2022: 13482-13491.
16 ODENA A, OLAH C, SHLENS J. Conditional image synthesis with auxiliary classifier GANs [C]// Proceedings of the 34th International Conference on Machine Learning. Sydney: ACM, 2017, 70: 2642–2651.
17 TAIGMAN Y, POLYAK A, WOLF L. Unsupervised cross-domain image generation [EB/OL]. (2016-10-07) [2022-08-02]. https://arxiv.org/abs/1611.02200.
18 WANG Z, BOVIK A C, SHEIKH H R, et al Image quality assessment: from error visibility to structural similarity[J]. IEEE Transactions on Image Processing, 2004, 13 (4): 600- 612
doi: 10.1109/TIP.2003.819861
19 ZHANG R, ISOLA P, EFROS A A, et al. The unreasonable effectiveness of deep features as a perceptual metric [C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Los Alamitos: IEEE, 2018: 586-595.
[1] 杨哲,葛洪伟,李婷. 特征融合与分发的多专家并行推荐算法框架[J]. 浙江大学学报(工学版), 2023, 57(7): 1317-1325.
[2] 周欣磊,顾海挺,刘晶,许月萍,耿芳,王冲. 基于集成学习与深度学习的日供水量预测方法[J]. 浙江大学学报(工学版), 2023, 57(6): 1120-1127.
[3] 刘沛丰,钱璐,赵兴炜,陶波. 航空装配领域中命名实体识别的持续学习框架[J]. 浙江大学学报(工学版), 2023, 57(6): 1186-1194.
[4] 赵嘉墀,王天琪,曾丽芳,邵雪明. 基于GRU的扑翼非定常气动特性快速预测[J]. 浙江大学学报(工学版), 2023, 57(6): 1251-1256.
[5] 曹晓璐,卢富男,朱翔,翁立波,卢书芳,高飞. 基于草图的兼容性服装生成方法[J]. 浙江大学学报(工学版), 2023, 57(5): 939-947.
[6] 苏育挺,陆荣烜,张为. 基于注意力和自适应权重的车辆重识别算法[J]. 浙江大学学报(工学版), 2023, 57(4): 712-718.
[7] 马庆禄,鲁佳萍,唐小垚,段学锋. 改进YOLOv5s的公路隧道烟火检测方法[J]. 浙江大学学报(工学版), 2023, 57(4): 784-794.
[8] 曾耀,高法钦. 基于改进YOLOv5的电子元件表面缺陷检测算法[J]. 浙江大学学报(工学版), 2023, 57(3): 455-465.
[9] 兰欢,余建波. 基于深度学习三维成型的钢板表面缺陷检测[J]. 浙江大学学报(工学版), 2023, 57(3): 466-476.
[10] 曾菊香,王平辉,丁益东,兰林,蔡林熹,管晓宏. 面向节点分类的图神经网络节点嵌入增强模型[J]. 浙江大学学报(工学版), 2023, 57(2): 219-225.
[11] 鲁建厦,包秦,汤洪涛,邵益平,赵文彬. 无设备人体追踪系统的择优标签方法[J]. 浙江大学学报(工学版), 2023, 57(2): 415-425.
[12] 马骏驰,迪骁鑫,段宗涛,唐蕾. 程序表示学习综述[J]. 浙江大学学报(工学版), 2023, 57(1): 155-169.
[13] 叶晨,战洪飞,林颖俊,余军合,王瑞,钟武昌. 基于推理-情境感知激活模型的设计知识推荐[J]. 浙江大学学报(工学版), 2023, 57(1): 32-46.
[14] 刘近贞,陈飞,熊慧. 多尺度残差网络模型的开放式电阻抗成像算法[J]. 浙江大学学报(工学版), 2022, 56(9): 1789-1795.
[15] 王万良,王铁军,陈嘉诚,尤文波. 融合多尺度和多头注意力的医疗图像分割方法[J]. 浙江大学学报(工学版), 2022, 56(9): 1796-1805.