Segmentation of Touching Conjunct Consonants in Telugu using Minimum Area Bounding Boxes
J. Bharathi1, P. Chandrasekar Reddy2
1J. Bharathi, Department of Electronics and Communication Engineering, Deccan College of Engineering and Technology, Hyderabad, India.
2Dr. P. Chandrasekhar Reddy, Department of Electronics and Communication Engineering, JNTU College of Engineering, Hyderabad, India.
Manuscript received on June 03, 2013. | Revised Manuscript received on June 29, 2013. | Manuscript published on July 05, 2013. | PP: 260-264 | Volume-3 Issue-3, July 2013. | Retrieval Number: C1705073313/2013©BEIESP
Open Access | Ethics and Policies | Cite
© The Authors. Published By: Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Abstract: This paper addresses the problem of segmenting touching characters which are written or printed in the bottom zone. In the segmentation of machine printed Telugu document image, conjunct consonants are more prone to touching due to shape of the characters. It is important to segment them properly to improve the accuracy of the Telugu OCR as otherwise the reconstruction and mapping to editable electronic document is incomplete and often needs lot of tedious manual intervention. It is based on the script level characteristic that the secondary form of consonants are written in smaller size and its bounding box is smaller compared to the primary character. The structural feature of sharp peaks in both left and right side profiles at the touching location of the combined character is used for determining the correct segmentation location. The algorithm is tested on a dataset created from large set of documents. The success rate of 96.39% is achieved.
Keywords: Minimum area bounding box, segmentation, side profile peaks, touching conjunct consonants.