Please wait a minute...
Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering)  2005, Vol. 6 Issue (11): 1297-1305    DOI: 10.1631/jzus.2005.A1297
Digital Library Technology     
Optical Character Recognition for printed Tamil text using Unicode
SEETHALAKSHMI R., SREERANJANI T.R., BALACHANDAR T., Abnikant Singh, Markandey Singh, Ritwaj Ratan, Sarvesh Kumar
Shanmugha Arts Science Technology and Research Academy, Thirumalaisamudram, Thanjavur, Tamil Nadu, India
Download:     PDF (0 KB)     
Export: BibTeX | EndNote (RIS)      

Abstract  Optical Character Recognition (OCR) refers to the process of converting printed Tamil text documents into software translated Unicode Tamil Text. The printed documents available in the form of books, papers, magazines, etc. are scanned using standard scanners which produce an image of the scanned document. As part of the preprocessing phase the image file is checked for skewing. If the image is skewed, it is corrected by a simple rotation technique in the appropriate direction. Then the image is passed through a noise elimination phase and is binarized. The preprocessed image is segmented using an algorithm which decomposes the scanned text into paragraphs using special space detection technique and then the paragraphs into lines using vertical histograms, and lines into words using horizontal histograms, and words into character image glyphs using horizontal histograms. Each image glyph is comprised of 32×32 pixels. Thus a database of character image glyphs is created out of the segmentation phase. Then all the image glyphs are considered for recognition using Unicode mapping. Each image glyph is passed through various routines which extract the features of the glyph. The various features that are considered for classification are the character height, character width, the number of horizontal lines (long and short), the number of vertical lines (long and short), the horizontally oriented curves, the vertically oriented curves, the number of circles, number of slope lines, image centroid and special dots. The glyphs are now set ready for classification based on these features. The extracted features are passed to a Support Vector Machine (SVM) where the characters are classified by Supervised Learning Algorithm. These classes are mapped onto Unicode for recognition. Then the text is reconstructed using Unicode fonts.

Key wordsOCR      Unicode      Features      Support Vector Machine (SVM)      Artificial Neural Networks     
Received: 05 August 2005     
CLC:  TP391  
Cite this article:

SEETHALAKSHMI R., SREERANJANI T.R., BALACHANDAR T., Abnikant Singh, Markandey Singh, Ritwaj Ratan, Sarvesh Kumar. Optical Character Recognition for printed Tamil text using Unicode. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2005, 6(11): 1297-1305.

URL:

http://www.zjujournals.com/xueshu/zjus-a/10.1631/jzus.2005.A1297     OR     http://www.zjujournals.com/xueshu/zjus-a/Y2005/V6/I11/1297

[1] Xin Wang, Zhi-zhen Ye, Yi-zheng Jin. Syntheses and characterizations of alloyed CoxNi1−xO nanocrystals[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2017, 18(4): 306-312.
[2] She-rong Zhang, An-kui Hu, Chao Wang. Three-dimensional inversion analysis of an in situ stress field based on a two-stage optimization algorithm[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2016, 17(10): 782-802.
[3] Han-jiang Lai, Jun-jie Zheng, Rong-jun Zhang, Ming-juan Cui. Visualization of the formation and features of soil arching within a piled embankment by discrete element method simulation[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2016, 17(10): 803-817.
[4] Liang TANG, Qi XUAN, Rong XIONG, Tie-jun WU, Jian CHU. A multi-class large margin classifier[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2009, 10(2): 253-262.
[5] Kasthurirangan GOPALAKRISHNAN. Evaluation of accelerated deterioration in NAPTF flexible test pavements[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2008, 9(9): 1157-1166.
[6] LI Guo-qi, SHENG Huan-ye. Classification analysis of microarray data based on ontological engineering[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2007, 8(4): 638-643.
[7] LENG Biao, QIN Zheng, LI Li-qun. Support Vector Machine active learning for 3D model retrieval[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2007, 8(12): 1953-1961.
[8] WU Xue-dong, SONG Zhi-huan. Gaussian particle filter based pose and motion estimation[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2007, 8(10): 1604-1613.
[9] Chan Siu-ping, Sun Ming-ting. A network condition classification scheme for supporting video delivery over wireless Internet[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(5 ): 15-.
[10] Ru Xue-min, Zhuang Yue-ting, Wu Fei. Audio steganalysis based on “negative resonance phenomenon” caused by steganographic tools[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(4 ): 15-.
[11] Xu Yun, Zhang Feng. Using SVM to construct a Chinese dependency parser[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(2): 199-203.
[12] Wen Xiang-jun, Zhang Yu-nong, Yan Wei-wu, Xu Xiao-ming. Nonlinear decoupling controller design based on least squares support vector regression[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(2): 275-284.
[13] ZHANG Ri-dong, WANG Shu-qing. Predictive control of a class of bilinear systems based on global off-line models[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(12): 5-.
[14] AUDY J.. An appraisal of techniques and equipment for cutting force measurement[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7(11): 1781-1789.
[15] FAN Min, KANG Bao-sheng, ZHAO Hua. Two-order Hermite vector-interpolating subdivision schemes[J]. Journal of Zhejiang University-SCIENCE A (Applied Physics & Engineering), 2006, 7( 9): 15-.