python-3.x ocr tesseract python-tesseract

PyTesseract image_to_data function isn't recognizing my image

I'm using pytesseract to return the coordinates of the objects in an image.

By using this piece of code:

import pytesseract
from pytesseract import Output
import cv2
img = cv2.imread('wine.jpg')

d = pytesseract.image_to_data(img, output_type=Output.DICT)
    print(d)

for i in range(n_boxes):
    (x, y, w, h) = (d['left'][i], d['top'][i], d['width'][i], d['height'][i])
    cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 2)

cv2.imshow('img', img)
cv2.waitKey(0)

I get that:

{'level': [1, 2, 3, 4, 5, 5, 2, 3, 4, 5, 4, 5, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3], 'par_num': [0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 2, 2, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1, 2, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1], 'left': [0, 485, 485, 485, 485, 612, 537, 537, 555, 555, 537, 537, 454, 454, 454, 454], 'top': [0, 323, 323, 323, 323, 324, 400, 400, 400, 400, 426, 426, 0, 0, 0, 0], 'width': [1200, 229, 229, 229, 115, 102, 123, 123, 89, 89, 123, 123, 296, 296, 296, 296], 'height': [900, 29, 29, 29, 28, 28, 40, 40, 15, 15, 14, 14, 892, 892, 892, 892], 'conf': ['-1', '-1', '-1', '-1', 58, 96, '-1', '-1', '-1', 95, '-1', 95, '-1', '-1', '-1', 95], 'text': ['', '', '', '', "JACOB'S", 'CREEK', '', '', '', 'SHIRAZ', '', 'CABERNET', '', '', '', '']}

[image used][ enter image description here ]1

However, when I use this image:

I get that:

{'level': [1, 2, 3, 4, 5], 'page_num': [1, 1, 1, 1, 1], 'block_num': [0, 1, 1, 1, 1], 'par_num': [0, 0, 1, 1, 1], 'line_num': [0, 0, 0, 1, 1], 'word_num': [0, 0, 0, 0, 1], 'left': [0, 0, 0, 0, 0], 'top': [0, 162, 162, 162, 162], 'width': [1200, 0, 0, 0, 0], 'height': [900, 276, 276, 276, 276], 'conf': ['-1', '-1', '-1', '-1', 95], 'text': ['', '', '', '', '']}

Any idea why some image are working and some aren't?

Solution

It is mainly caused by different quality and contrast. it is much easier for the OCR engine to detect texts in desired images. you can add a few pre-processing routines, including thresholding, blurring, histogram equalization and lots of other techniques. it is mainly subjective so I can not provide you with working code, it is more like trial and error to find the best technique for your scope

UPDATE: here is a code that might help you

def preprocessing_typing_detection(inputImage):
    inputImage= cv2.cvtColor(inputImage, cv2.COLOR_BGR2GRAY)
    inputImage= cv2.Laplacian(inputImage, cv2.CV_8U)
    return inputImage