Pervasive Computing and Computer Human Interaction
Natural scene text detection based on multi level MSER
TANG You bao, BU Wei, WU Xiang qian
1. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China;
2. Department of New Media Technologies and Arts, Harbin Institute of Technology, Harbin 150001, China
A novel scene text detection method based on multilevel maximally stable extremal regions (MSER) was proposed, which consisted of two main stages, including candidate regions extraction and text regions detection. In the stage of candidate regions extraction, a multilevel MSER region extraction technique was developed by considering multiple color spaces, multiple scale transformations of original image and multiple thresholds of MSER detection. All extracted regions from the input image were used as candidate character regions for text region detection. In the stage of text detection, the handdesigned bottom features and CNN based features were extracted for each candidate character region as first, then a random forest regressor trained from training datasets was used to get the character regions. After that, the character regions were merged to form candidate word regions, from which the features were extracted and classified to get the final text detection results by using the similar process of candidate character region classification. The proposed method was evaluated on two standard benchmark datasets, including ICDAR2011 and ICDAR2013, and both got the Fmeasure performance of 0.79, respectively, Which demonstrates the effectiveness of the proposed natural scene text detection method.
TANG You bao, BU Wei, WU Xiang qian. Natural scene text detection based on multi level MSER. JOURNAL OF ZHEJIANG UNIVERSITY (ENGINEERING SCIENCE), 2016, 50(6): 1134-1140.
[1] SHAHAB A, SHAFAIT F, DENGEL A. ICDAR 2011 robust reading competition challenge 2: reading text in scene images [C] ∥ Proceeding of International Conference on Document Analysis and Recognition. Beijing: IEEE, 2011: 1491-1496.
[2] KARATZAS D, SHAFAIT F, UCHIDA S, et al. ICDAR 2013 robust reading competition [C] ∥ Proceeding of International Conference on Document Analysis and Recognition. Washington: IEEE, 2013: 1484-1493.
[3] YE Q, DOERMANN D. Text detection and recognition in imagery: a survey [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(7):1480-1500.
[4] CHEN X, YUILLE A. Detecting and reading text in natural scenes [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Washington: IEEE, 2004: 366-373.
[5] WANG K, BABENKO B, BELONGIE S. Endtoend scene text recognition [C] ∥ Proceeding of International Conference on Computer Vision. Barcelona: IEEE, 2011: 1457-1464.
[6] MISHRA A, ALAHARI K, JAWAHAR C. Topdown and bottomup cues for scene text recognition [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 2687-2694.
[7] JADERBERG M, VEDALDI A, ZISSERMAN A. Deep features for text spotting [C] ∥ Proceeding of European Conference on Computer Vision. Zurich: Springer, 2014: 512-528.
[8] EPSHTEIN B, OFEK E, WEXLER Y. Detecting text in natural scenes with stroke width transform [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. San Francisco: IEEE, 2010:2963-2970.
[9] MATAS J, CHUM O, URBAN M, et al. Robust wide baseline stereo from maximally stable extremal regions [C] ∥ Proceeding of British Machine Vision Conference. Cardiff: Elsevier, 2002: 761-767.
[10] HUANG W, LIN Z, YANG J, et al. Text localization in natural images using stroke feature transform and text covariance descriptors [C] ∥ Proceeding of International Conference on Computer Vision. Sydney: IEEE, 2013: 1241-1248.
[11] NEUMANN L, MATAS J. Scene text localization and recognition with oriented stroke detection [C] ∥ Proceeding of International Conference on Computer Vision. Sydney: IEEE, 2013: 97-104.
[12] YAO C, BAI X, LIU W, et al. Detecting texts of arbitrary orientations in natural images [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 1083-1090.
[13] YAO C, BAI X, LIU W. A unified framework for multioriented text detection and recognition [J]. IEEE Transactions on Image Processing, 2014, 23(11):4737-4749.
[14] LI Y, JIA W, SHEN C, et al. Characterness: An indicator of text in the wild [J]. IEEE Transactions on Image Processing, 2014, 23(4): 1666-1677.
[15] HUANG W, QIAO Y, TANG X. Robust scene text detection with convolution neural network induced MSER trees [C] ∥ Proceeding of European Conference on Computer Vision. Zurich: Springer, 2014: 497-511.
[16] NEUMANN L, MATAS J. Realtime scene text localization and recognition [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Providence: IEEE, 2012: 3538-3545.
[17] YIN X, YIN X, HUANG K, et al. Robust text detection in natural scene images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 36(5): 970-983.
[18] NEUMANN L, MATAS J. A method for text localization and recognition in realworld images [C] ∥ Proceeding of Asian Conference on Computer Vision. Queenstown: Springer, 2010: 770-783.
[19] ZAMBERLETTI A, NOCE L, GALLO I. Text localization based on fast feature pyramids and multiresolution maximally stable extremal regions [C] ∥ Proceeding of ACCV Workshops on Robust Reading. Singapore: Springer, 2014: 91-105.
[20] KOO H, KIM D. Scene text detection via connected component clustering and nontext filtering [J]. IEEE Transactions on Image Processing, 2013, 22(6):2296-2305.
[21] KANG L, LI Y, DOERMANN D. Orientation robust text line detection in natural images [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 4034-4041.
[22] WANG T, WU D, COATES A, et al. Endtoend text recognition with convolutional neural networks [C] ∥ Proceeding of International Conference on Pattern Recognition. Tsukuba: IEEE, 2012: 3304-3308.
[23] ZHANG Q, XU L, JIA J. 100+ times faster weighted median filter (WMF) [C] ∥ Proceeding of IEEE Conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 2830-2837.
[24] KRIZHEVSKY A, HINTON G. Convolutional deep belief networks on cifar10 [J]. Unpublished manuscript, 2010, 40.
[25] BREIMAN L. Random forests [J]. Machine learning, 2001, 45(1): 532.