Search code examples
ocrtesseractpython-tesseractopencvpython

How to recognize deformed text under some other bigger object by using pytesseract and opencv-python in python?


I am using pytesseract to recognize text as follow

td = pytesseract.image_to_data(img, output_type=Output.DICT)
tn_boxes = len(td['level'])
for o in range(0, tn_boxes):
    text = td['text'][o]
    print(text)

i am just making an index of Examples by using a simple logic detect keyword 'Example no.' find it's end point keyword 'Sol.' and put a piece of image from keyword 'Example no.' to keyword 'Sol.' into index and then find next example and so on
But when i try following image image without line above it Then it show output SET THEORY ae . . 5 (6) Let A = {x: x isa negative odd integer} = {-1,-3,-5,-7,...etc
See how it is not recognizing first line Sol. (a) Let A={x:x is a natural number..etc.
And when i try it with following image not having horizontal line image without line above it it just works fine.

Is there any way to configure pytesseract to recognize text with having a line above it ?

Edited:

sometimes when we place some image above text or some other text with higher size then pytesseract fails to detect text below that bigger object.

Is there any solution for this kind of problem may be there is a way to configure detection minimum size or configure to detect all possible sized text even under some bigger objects ?

For example it show output usually denoted by o(G). ors a a {= 7 Wave =e () oe that the set of ae | group usual ition of integers.
See how it is not detecting keyword Example 1. for folowing image enter image description here

But when i try following image it shows output usually denoted by o(G). Example 1. (2) Prove that th . group under usual addition of integers, Now it is detecting keyword Example 1. enter image description here


Solution

  • Read e.g. image processing to improve tesseract OCR accuracy and read the docs.