Search code examples
pythonocrtesseractpython-tesseract

How to use tessedit_write_images with pytesseract?


I'm using pytesseract 0.3.10 with tesseract 5.3.0. I want to take a look at how tesseract processed my images. I tried setting tessedit_write_images to true via:

import pytesseract as pt
pt.image_to_string(crop_img, lang='eng+deu+fra+spa', config="--psm 6 -c tessedit_write_images=1")

But this is not working. The tessinput.tif file is nowhere to be found. (The --psm 6 part is working.)

I also tried to use tessedit_write_images=True or tessedit_write_images=T. Using pt.run_and_get_output() is also not working.

Is there a possibility to set the variable tessedit_write_images to true outside my python script?


Solution

  • Create a "config" text file and write into it:

    tessedit_write_images true
    

    Than use the command line: tesseract Text.png out.txt config

    This gives you a text and a .tiff file. If you rename config to config.txt works also in python subprocess:

    import subprocess
    
    process = subprocess.run(["tesseract", "Text.png", "out.txt", "config.txt"], shell=False, stdout=subprocess.PIPE)
    

    PS: I used tesseract v5.1.0.20220510 leptonica-1.78.0