Search code examples
pythontesseracttext-segmentation

Using Tesseract OCR for Character Segmentation Only


I want to do text segmentation on a printed document. I already segment the document to the character segmentation but i failed when i meet some touching character. I want to use the Tesseract OCR only to segment the word. I know Tesseract can do this task, but i dont know how to access that without digging the internal code of tesseract. Can anyone give some advice for me? If it is possible, i need that in Python.


Solution

  • If you can call TessBaseAPIGetComponentImages API method, you can retrieve the segmentation at various pageIteratorLevel levels (Symbol/Character, Word, Line, etc.) without performing actual OCR on the image.