# Introduction II. # Implementation of Character Recognition System The character recognition system can be divided as segmentation of text document into character and recognition of the character. The whole process is shown in he subject of character recognition has been receiving considerable attention in recent years due to the advancement of the automation process. Automatic character recognition improves the interaction between man and machine in many applications like office automation, cheque verification, mail sorting, and a large variety of banking, business and data entry applications. We are concerned here with the recognition of character in Bangla language. Bangla is the mother language of Bangladesh and approximately 10% of the world's population speaks in Indian, Chinese and other languages trying to develop the complete character recognition system. In our country, research works in this field have achieved a limited success so far as compared to the other foreign languages. Though, the achievement in this fascinating field is not enough to reach the ultimate goal. But the progress of such research with Bangla language is still in an initial level. This research is a simple flourish to T implement that dream as the initial step to convert the Bangla text to computer readable form that is development of complete Bangla Character Recognition system. Individual Bangla characters were recognized using various techniques such as geometric shape analysis, black runs and concavity measurement technique. The input images are acquired from documents containing text by using scanner as an input device or using Adobe Photoshop or Paint. Acquired images are then stored in Hard Disk in JPG picture format. This image is then passed for preprocessing. # Source # b) Pre-Processing The scanned image is converted into binary image. At first, the RGB image is converted into grayscale image and then binary image i.e. an image with pixel 0 (white) and 1 (black). After converting the image, the unnecessary pixels (0s) from the original image is removed. # c) RGB to Grayscale and Gray to RGB Conversion In practical cases most of the images are generally color (RGB), but it is complex to work with a three-dimensional array. So it needs to convert the RGB image into the grayscale image. The RGB to grayscale conversion is performed by MATLAB command. # I = rgb2gray(f) For ease of analysis, the grayscale image is converted into binary image by using the following MATLAB command. BW = im2bw(I) III. # Text Segmentation Text segmentation is a process where the text is partitioned into its elementary entities i.e. characters [10]. The total performance of the character recognition process depends on the accuracy of the segmentation process of the text into the characters. In the segmentation phase, first the document is segmented into text lines, the text lines are segmented into text words and then the words are segmented into characters. # a) Line Segmentation Text line segmentation is performed by scanning the input image horizontally. Frequency of black pixels in each raw is counted to separate the line. The position between two consecutive lines, where the number of black pixels in a raw is zero denotes a boundary between the lines [13]. The output image is shown in In English text there is a minimum gap between two consecutive characters and two consecutive words. The minimum gap between two consecutive words is greater than two consecutive characters. Although maximum characters in Bangla text line are connected by matra line with each other, the same case occurs if the gap exists between them. For word segmentation from the text line, the vertical scan is performed. If there exists n consecutive scan that find no black pixel, we denote it to be a marker between two words. The value of n is the minimum gap between two consecutive words which is taken experimentally. The output is shown in For character segmentation from the word, the vertical scan is performed. The starting boundary of a character is the first column where the first black is found. After finding the starting boundary of a character, it continues scanning until a column without any black pixel is found, which is the ending boundary of the character being processed [14]. Fig. 4 shows a single segmented character and its corresponding binary format. The knowledge base is designed based on the feature matrix of various characters. In order to build the knowledge base, first, the RGB character image is converted into grayscale image then it is converted into binary image. After getting the binary image, the unnecessary pixels from the character boundary is eliminated. # e) Feature Extraction Feature extraction is the process of extracting essential information content from the image segment. It plays an important role in the whole recognition process [10]. # f) Scaling Depending on the height and width of the database image the segmented characters are scaled. If the size of the segmented character is higher than the database character then the system will be scaled down all the segmented characters to the size of the database character, otherwise scaled up. If C be the segmented character then the scaled image S is obtained by the following MATLAB command: S = imresize(C, [height, width]). Where, height and width is the dimension of the database character. Character recognition performance depends on the scaling. If the segmented character is too higher or too lower than the database image then the character recognition performance is reduced. The character recognition procedure is described in following Algorithm: BEGIN 1. Calculate total pixel = height×width. 2. Take XOR between first database character and scaled character S. 3. Calculate no. of correct pixels (0 is the correct pixel), correct pixel. 4 In this way, for all database character the error (%) calculation is repeated. If the database character exactly or approximately matches with the segmented character then the error (%) will minimum. So base on the minimum error, the system gives the corresponding output character. # IV. Result and Performance Analysis The system is divided in two main phases: segmentation and character recognition. So the overall performance of the system directly depends on the performance of the two individual phases. The accuracy of this system is measured as the success rate for the recognition of characters. It is measured using Eq. ( 1 The segmentation performance of this system is shown in Table 1. V. # Discussion and Conclusion The aim of this system is to recognize Bangla characters. This system can recognize these characters with slight limitations. The limitations are discussed in the following section. # a) Limitation The performance of this system depends on the segmentation and recognition. If the characters of text are in very close or overlap to each other, then the system fails to segment the characters. For Bangla characters, different font size is possible in practical. It is not possible to store all the front size in database. So it needs to scale the character which causes distortion in character shape. It should create a problem but the system should not fail always. b) Further Scope Due to the limitations described in previous section the system is not suitable for on-line applications. The overlapping character can be segmented by using Flood fill and Boundary fill algorithm. It is further target to perform this work. # c) Conclusion In this paper the off line bangle character recognition system is developed by using automatic feature extraction and XOR operation. The efficiency of this system is not so high. In future, MLP and SVM classifier can be used for character recognition. ![Fig 1. ](image-2.png "") 1![Figure 1 : Block diagram of character recognition system](image-3.png "Figure 1 :") ![Fig 2.](image-4.png "") 2![Figure 2 : Line Segmentation (a) Bangla input text image, (b) Image of first segmented line and (c) Text image without first line b) Word Segmentation](image-5.png "GlobalFigure 2 :") 3![Figure 3 : Word Segmentation (a) Bangla Text Line, (b) Image of first segmented word and (c) Image without first word c) Character Segmentation](image-6.png "Figure 3 :") 4![Figure 4 : (a) Binary Form of a Segmented Character d) Knowledge Base](image-7.png "Figure 4 :") 5![Figure 5 :](image-8.png "Figure 5 :") 6![Figure 6 : Character recognition (a) Database image of size 16×16, (b) Scaled image of size 16×16, (c) Image after XOR between (a) and (b)](image-9.png "Figure 6 :") 1No. of Lines in aLine SegmentationWord SegmentationCharacter SegmentationText DocumentAccuracy (%)Accuracy (%)Accuracy (%)41009789.05510097.592.5061009490.71710096.6792.6981009490.32b) Segmented Character Recognition Performancetechnique. The character recognition performance ofFor character recognition, this system usesthis system is shown in Table 2 for Shoroborno andXOR operation which is a very simple matchingtable 3 for Numerical Character. 2No. of TestTotal No. ofTotalSuccessAverage SuccessSampleCharactersNo. of SuccessRate (%)Rate (%)11209075215011677.3333331259475.276.96368413010278.46154517013478.82353 3No. of TestTotal No. ofTotalSuccessAverage SuccessSampleCharactersNo. of SuccessRate (%)Rate (%)15042842705375.714293655686.1538583.273634403382.55504488 © 2013 Global Journals Inc. (US) * References Références Referencias * Bangla Sorting Algorithm: A Linguistic Approach MdRahman MdIqbal Zafar Proceedings of International Conference on Computer and Information Technology International Conference on Computer and Information TechnologyDhaka 18-20 December 1998 * Computer Representation of Bangla Character and Sorting of Bangla Words FahimmMinhaz TanvirZibran ShammiArif AbdusRajiullah Md Proceedings of 5th ICCIT 5th ICCITDhaka, Bangladesh 2002. December 2002 * An Approach to Implement Signature Recognition System Using Neural Network and Genetic Algorithm Md Chowdhury RUET, Rajshahi, Bangladesh * Computer and Robot Vision RobertMHaralick LindaGShapiro 1992 Addision-Wesley 1 * Image Processing Analysis and Machine Vision MSonka VHlavac Boyle 1998 PWS Publishing * Image processing Toolbox User's Guide-For Use with MATLAB, Version 2, The Math Works May 1997 * Computer Vision LindaGShapiro GeorgeCStockman 2001 Prentice-Hall New Jersey * Bangla off-line Handwritten character Recognition Using Superimposed Matrices AhmedShah Mashiyat AhmedShah Mehadi KamrulHasan Talukder 7th International Conference on Computer and Information Technology (ICCIT 2004) Dhaka, Bangladesh 26-28 December, 2004 BRAC University * RafaelGGonzalez RichardEWoods StevenLEddins Digital Image Processing Using MATLAB Pearson Education, Inc * A Complete OCR System for Continuous Bangla Characters MohammedJalal Uddin Mahmud Chowdhury MofizurFeroz Raihan Rahman Proceedings of the Conference on Convergent Technologies for the Asia Pacific the Conference on Convergent Technologies for the Asia Pacific 2003 * Automatic Detection and Translation of Bengali Text on Road Sign for Visually Impaired SMHaque ShahidaArbi TabassumTamanna Sadia Mahsina Itu * Rotation Independent Image Object Recognition Using Automatic Feature Extraction and Artificial Neural Networks AbuSayeed Md AA MSohail MAHaque Mottalib ICCIT-2004 December 2004 Dhaka