Search code examples
pythonocrtesseractpython-tesseractwand

How do I change the contrast of a picture using Wand?


I have the picture below used in Tesseract OCR:

enter image description here

My code to process the picture is:

# HOCR
with image[450:6200, 840:3550] as cropped:
    imgPage = wi(image = cropped)
    imageBlob = imgPage.make_blob('png')
    horas = gerarHocr(imageBlob)

def gerarHocr(imageBlob):
    image = Image.open(io.BytesIO(imageBlob))
    markup = pytesseract.image_to_pdf_or_hocr(image, lang='por', extension='hocr', config='--psm 6')
    soup = BeautifulSoup(markup, features='html.parser')

    spans = soup.find_all('span', {'class' : 'ocrx_word'})

    listHoras = []
    ...
    return listHoras

Although my OCR is getting sometimes confused and duplicating 8 with 3 and returning 07:44/14:183 instead of 07:44/14:13 for example.

I think if I remove the grey lines using Wand I improve the confidence of the OCR. How do I do that, please?

Thank you,


Solution

  • If the system is using ImageMagick-6, you can call Image.threshold(), but might need to remove the transparency first.

    with Image(filename='PWILE.png') as img:
        img.background_color = 'WHITE'
        img.alpha_channel = False
        img.threshold(threshold=0.5)
        img.save(filename='output_threshold.png')
    

    Image.threshold

    If you're using ImageMagick-7 (anything above version 7.0.8-41), then Image.auto_threshold() will work.

    with Image(filename='support/PWILE.png') as img:
        img.auto_threshold(method='otsu')