python ocr tesseract python-tesseract wand

How do I change the contrast of a picture using Wand?

I have the picture below used in Tesseract OCR:

My code to process the picture is:

# HOCR
with image[450:6200, 840:3550] as cropped:
    imgPage = wi(image = cropped)
    imageBlob = imgPage.make_blob('png')
    horas = gerarHocr(imageBlob)

def gerarHocr(imageBlob):
    image = Image.open(io.BytesIO(imageBlob))
    markup = pytesseract.image_to_pdf_or_hocr(image, lang='por', extension='hocr', config='--psm 6')
    soup = BeautifulSoup(markup, features='html.parser')

    spans = soup.find_all('span', {'class' : 'ocrx_word'})

    listHoras = []
    ...
    return listHoras

Although my OCR is getting sometimes confused and duplicating 8 with 3 and returning 07:44/14:183 instead of 07:44/14:13 for example.

I think if I remove the grey lines using Wand I improve the confidence of the OCR. How do I do that, please?

Thank you,

Solution

If the system is using ImageMagick-6, you can call Image.threshold(), but might need to remove the transparency first.

with Image(filename='PWILE.png') as img:
    img.background_color = 'WHITE'
    img.alpha_channel = False
    img.threshold(threshold=0.5)
    img.save(filename='output_threshold.png')

If you're using ImageMagick-7 (anything above version 7.0.8-41), then Image.auto_threshold() will work.

with Image(filename='support/PWILE.png') as img:
    img.auto_threshold(method='otsu')