I am using pytesseract to recognize text as follow
td = pytesseract.image_to_data(img, output_type=Output.DICT)
tn_boxes = len(td['level'])
for o in range(0, tn_boxes):
text = td['text'][o]
print(text)
i am just making an index of Examples
by using a simple logic detect keyword 'Example no.' find it's end point keyword 'Sol.' and put a piece of image from keyword 'Example no.' to keyword 'Sol.' into index and then find next example and so on
But when i try following image
Then it show output
SET THEORY ae . . 5 (6) Let A = {x: x isa negative odd integer} = {-1,-3,-5,-7,
...etc
See how it is not recognizing first line Sol. (a) Let A={x:x is a natural number
..etc.
And when i try it with following image not having horizontal line
it just works fine.
sometimes when we place some image above text or some other text with higher size then pytesseract fails to detect text below that bigger object.
For example
it show output usually denoted by o(G). ors a a {= 7 Wave =e () oe that the set of ae | group usual ition of integers.
See how it is not detecting keyword Example 1.
for folowing image
But when i try following image
it shows output usually denoted by o(G). Example 1. (2) Prove that th . group under usual addition of integers,
Now it is detecting keyword Example 1.
Read e.g. image processing to improve tesseract OCR accuracy and read the docs.