Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2019, Vol. 53 Issue (7): 1363-1373    DOI: 10.3785/j.issn.1008-973X.2019.07.016
Automatic Technology, Computer Technology     
Design of activation function in CNN for image classification
Hong-xia WANG(),Jia-qi ZHOU,Cheng-hao GU,Hong LIN*()
School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China
Download: HTML     PDF(1107KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

A new combinatorial activation function called relu-softsign was proposed aiming at the problem that the derivative of the commonly used activation function relu in the convolutional neural network is constant to zero at the x negative axis, which makes it easy to cause neuron necrosis during training, and the existing combinatorial activation function relu-softplus can only use the small learning rate in the case of model convergence, which leads to slow convergence. The image classification effect was improved. The role of the activation function during training was analyzed, and the key points that need to be considered in the design of the activation function were given. The relu and softsign functions were combined piecewise in the positive and negative semi axis of the x axis according to these points, so that the derivative of x negative semi axis was no longer constant to zero. Then comparision with the single activation function and relu-softplus combination activation function was conducted on the MNIST, PI100, CIFAR-100 and Caltech256 datasets. The experimental results show that the combinatorial activation function relu-softsign improves the model classification accuracy, simply and effectively mitigates the irreversible " necrosis” phenomenon of neurons. The convergence speed of the model is accelerated, especially on complex data sets.



Key wordsimage classification      convolutional neural network      activation function      relu      neurons necrosis      combinatorial activation function     
Received: 31 October 2018      Published: 25 June 2019
CLC:  TP 391  
Corresponding Authors: Hong LIN     E-mail: 99575522@qq.com;linhong@whut.edu.cn
Cite this article:

Hong-xia WANG,Jia-qi ZHOU,Cheng-hao GU,Hong LIN. Design of activation function in CNN for image classification. Journal of ZheJiang University (Engineering Science), 2019, 53(7): 1363-1373.

URL:

http://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2019.07.016     OR     http://www.zjujournals.com/eng/Y2019/V53/I7/1363


用于图像分类的卷积神经网络中激活函数的设计

为了提高图像分类效果,针对卷积神经网络中常用激活函数relu在x负半轴的导数恒为零,导致训练过程中容易造成神经元“坏死”以及现有组合激活函数relu-softplus在模型收敛情况下学习率过小导致收敛速度慢的问题,提出新的组合激活函数relu-softsign. 分析激活函数在训练过程中的作用,给出激活函数在设计时需要考虑的要点;根据这些要点,将relu和softsign函数于x轴正、负半轴进行分段组合,使其x负半轴导数不再恒为零;分别在MNIST、PI100、CIFAR-100和Caltech256数据集上,与单一的激活函数和relu-softplus组合激活函数进行对比实验. 实验结果表明,使用relu-softsign组合激活函数提高了模型分类准确率,简单有效地缓解了神经元不可逆“坏死”现象;加快了模型的收敛速度,在复杂数据集上该组合函数的收敛性能更好.


关键词: 图像分类,  卷积神经网络,  激活函数,  relu,  神经元坏死,  组合激活函数 
Fig.1 Curve of sigmoid function and tanh function
Fig.2 Curve of softsign function and tanh function
Fig.3 Curve of relu function
Fig.4 Curve of relu-softplus combinatorial activation function
Fig.5 Curve of softplus function and softsign function
Fig.6 Curve of softplus and softsign’s derivative
Fig.7 Curve of relu-softsign function
激活函数 losst ACCt lossv ACCv
relu 0.041 6 0.987 7 0.039 3 0.989 0
softsign 0.047 9 0.985 4 0.041 0 0.986 7
relu-softplus 0.039 4 0.988 0 0.034 8 0.988 7
relu-softsign 0.037 1 0.988 2 0.034 8 0.989 1
Tab.1 Results on MNIST
Fig.8 Training loss of different activation functions on MNIST
激活函数 losst ACCt lossv ACCv
relu 0.047 8 0.987 9 0.486 8 0.879 8
softsign 0.052 3 0.985 0 0.626 2 0.863 1
relu-softplus 0.041 2 0.988 9 0.450 6 0.902 1
relu-softsign 0.037 4 0.990 4 0.432 6 0.906 8
Tab.2 Results on PI100
Fig.9 Training loss of different activation functions on PI100
激活函数 losst ACCt lossv ACCv
relu 0.748 9 0.770 2 1.706 7 0.595 2
softsign 1.041 3 0.699 0 1.607 4 0.585 0
relu-softplus 0.853 8 0.749 4 1.596 2 0.603 1
relu-softsign 0.743 1 0.778 2 1.582 3 0.614 3
Tab.3 Results on CIFAR-100
Fig.10 Training loss of different functions on CIFAR-100
激活函数 losst ACCt lossv ACCv
relu 0.864 9 0.760 0 3.528 3 0.420 0
softsign 0.933 5 0.750 1 3.620 6 0.397 6
relu-softplus 0.840 5 0.770 4 3.453 4 0.414 7
relu-softsign 0.756 9 0.789 4 3.499 4 0.420 5
Tab.4 Results on Caltech256
Fig.11 Training loss of different activation functions on Caltech256
学习率 MNIST收敛情况 PI100收敛情况
relu-softplus relu-softsign relu-softplus relu-softsign
0.000 1 成功收敛,分类准确率为98% 成功收敛,分类准确率为98% 成功收敛,分类准确率为99% 成功收敛,分类准确率为99%
0.001 成功收敛,分类准确率为98% 成功收敛,分类准确率为98% 第10轮开始不收敛 成功收敛,分类准确率为99%
0.01 第2轮开始不收敛 第2轮开始不收敛 第4轮开始不收敛 成功收敛,分类准确率为85%
0.1 第1轮开始不收敛 第1轮开始不收敛 第2轮开始不收敛 第3轮开始不收敛
Tab.5 Comparison of convergence conditions under different learning rates on MNIST and PI100
学习率 CIFAR-100收敛情况 Caltech256收敛情况
relu-softplus relu-softsign relu-softplus relu-softsign
0.000 1 成功收敛,分类准确率为75% 成功收敛,分类准确率为78% 成功收敛,分类准确率为77% 成功收敛,分类准确率为79%
0.001 第40轮开始不收敛 成功收敛,分类准确率为76% 第10轮开始不收敛 成功收敛,分类准确率为79%
0.01 第4轮开始不收敛 成功收敛,分类准确率为72% 第2轮开始不收敛 成功收敛,分类准确率为75%
0.1 第4轮开始不收敛 第3轮开始不收敛 第2轮开始不收敛 第20轮开始不收敛
Tab.6 Comparison of convergence conditions under different learning rates on CIFAR-100 and Caltech256
Fig.12 Training loss of relu-softsign and relu-softplus on Caltech256
[1]   黄凯奇, 任伟强, 谭铁牛 图像物体分类与检测算法综述[J]. 计算机学报, 2014, 36 (6): 1225- 1240
HUANG Kai-qi, REN Wei-qiang, TAN Tie-niu A review on image object classification and detection[J]. Chinese Journal of Computers, 2014, 36 (6): 1225- 1240
[2]   常亮, 邓小明, 周明全, 等 图像理解中的卷积神经网络[J]. 自动化学报, 2016, 42 (9): 1300- 1312
CHANG Liang, DENG Xiao-ming, ZHOU Ming-quan, et al Convolution neural network in image understanding[J]. Acta Automatica Sinica, 2016, 42 (9): 1300- 1312
[3]   吴正文. 卷积神经网络在图像分类中的应用研究[D]. 成都: 电子科技大学, 2015.
WU Zheng-wen. Application of convolution neural network in image classification [D]. Chengdu: University of Electronic Science and Technology of China, 2015.
[4]   KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C] // International Conference on Neural Information Processing Systems. Lake Tahoe: Springer, 2012: 1097-1105.
[5]   NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines [C] // Proceedings of the 27th International Conference on Machine Learning (ICML-10). Haifa: Omnipress, 2010: 807-814.
[6]   DOLEZEL P, SKRABANEK P, GAGO L Weight initialization possibilities for feedforward neural network with linear saturated activation functions[J]. IFAC-PapersOnLine, 2016, 49 (25): 49- 54
doi: 10.1016/j.ifacol.2016.12.009
[7]   MAAS A L, HANNUN A Y, NG A Y. Rectifier nonlinearities improve neural network acoustic models [C] // Proceedings of the 30th International Conference on Machine Learning. Atlanta: ACM, 2013: 456-462.
[8]   CLEVERT D A, UNTERTHINER T, HOCHREITER S Fast and accurate deep network learning by exponential linear units (ELUs)[J]. Computer Science, 2015, 5 (2): 716- 730
[9]   HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification [C] // Proceedings of the IEEE international conference on computer vision. Santiago: IEEE, 2015: 1026-1034.
[10]   石琪. 基于卷积神经网络图像分类优化算法的研究与验证[D]. 北京: 北京交通大学, 2017.
SHI Qi. Research and verification of image classification optimization algorithm based on convolutional neural network [D]. Beijing: Beijing Jiaotong University, 2017.
[11]   LECUN Y, BOTTOU L, BENGIO Y, et al Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86 (11): 2278- 2324
doi: 10.1109/5.726791
[12]   Microsoft Research: product image categorization data set (PI 100)[DB/OL]. [2010-11-01]. http://research.microsoft.com/en-us/people/xingx/pi100.aspx.
[13]   FERRARI V, JURIE F, SCHMID C From images to shape models for object detection[J]. International Journal of Computer Vision, 2010, 87 (3): 284- 303
doi: 10.1007/s11263-009-0270-9
[14]   GRIFFIN G, HOULUB A, PERONA P. The Caltech-256. Technical report [R]. Pasadena: California Institute of Technology, 2007.
[15]   李明威. 图像分类中的卷积神经网络方法研究 [D]. 南京: 南京邮电大学, 2016.
LI Ming-wei. Research of convolutional neural network in image classification [D]. Nanjing: Nanjing University of Posts and Telecommunications, 2016.
[16]   DUDA R O, HART P E, STORK D G. Pattern classification [M]. [S.l.]: Wiley, 2004.
[17]   贾世杰. 基于内容的商品图像分类方法研究[D]. 大连: 大连理工大学, 2013.
JIA Shi-jie. Research on content based classification of commodity image [D]. Dalian: Dalian University of Technology, 2013.
[1] Ying-jie ZHENG,Song-rong WU,Ruo-yu WEI,Zhen-wei TU,Jin LIAO,Dong LIU. Metro location point matching and false alarm elimination based on FCM algorithm of target image[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(3): 586-593.
[2] Qiao-hong CHEN,YI CHEN,Wen-shu Li,Yu-bo JIA. Clothing image classification based on multi-scale SE-Xception[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(9): 1727-1735.
[3] Qi SHEN,Yan ZHAO,Xiao-wei ZHOU,Xiao-ran YUAN. Image Hashing algorithm based on structure and gradient[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1525-1533.
[4] Deng-wen ZHOU,Jin-yue TIAN,Lu-yao MA,Xiu-xiu SUN. Lightweight image semantic segmentation based on multi-level feature cascaded network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(8): 1516-1524.
[5] Yan-nan ZHANG,Xiao-hong HUANG,Yan MA,Qun CONG. Method with recording text classification based on deep learning[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1264-1271.
[6] Zhuang KANG,Jie YANG,Hao-qi GUO. Automatic garbage classification system based on machine vision[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1272-1280.
[7] Tao MING,Dan WANG,Ji-chang GUO,Qiang LI. Breast cancer histopathological image classification using multi-scale channel squeeze-and-excitation model[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1289-1297.
[8] Hai-jin WANG,Zong-yu YIN,Zhen-zheng KE,Ying-jie GUO,Hui-yue DONG. Wear monitoring of helical milling tool based on one-dimensional convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(5): 931-939.
[9] Ping YANG,Dan WANG,Zi-jian KAGN,Tong LI,Li-hua FU,Yue-ren YU. Prediction model of paroxysmal atrial fibrillation based on pattern recognition and ensemble CNN-LSTM[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(5): 1039-1048.
[10] Lie-jun JIN,Jian-ming ZHAN,Jun-hua CHEN,Tao WANG. Drill pipe fault diagnosis method based on one-dimensional convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(3): 467-474.
[11] Xiao-feng FU,Li NIU,Zhuo-qun HU,Jian-jun LI,Qing WU. Deep micro-expression spotting network training based on concept of transition frame[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(11): 2128-2137.
[12] Gang YE,Yi-bo LI,Zhu-xi MA,Jie CHENG. End-to-end aluminum strip surface defects detection and recognition method based on ViBe[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(10): 1906-1914.
[13] Zi-yu JIA,You-fang LIN,Hong-jun ZHANG,Jing WANG. Sleep stage classification model based ondeep convolutional neural network[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(10): 1899-1905.
[14] Hong-guang LI,Ying GUO,Ping SUI,Zi-sen QI. Frequency hopping modulation recognition of convolutional neural network based on time-frequency characteristics[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(10): 1945-1954.
[15] Wan-liang WANG,Xiao-han YANG,Yan-wei ZHAO,Nan GAO,Chuang LV,Zhao-juan ZHANG. Image enhancement algorithm with convolutional auto-encoder network[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(9): 1728-1740.