Search code examples
pythontesseract

How to get Tesseract confidence levels in python or command window?


How can we get the confidence levels after OCR of an image using tesseract 3.05 in windows? I am calling tesseract from python using subprocess commands:

retcode = subprocess.call("tesseract -l eng myImage.png txt -psm 6" , stdin=None, stdout=False, stderr=None, shell=False)


Solution

  • This is the wrapper that you need: https://pypi.python.org/pypi/tesserocr/2.0.0 . Also there are tons of python wrapper out there, but this library is the closest wrapper that nearly cover all of C++ API.

    Example:

    from PIL import Image
    from tesserocr import PyTessBaseAPI
    
    image = Image.open('/usr/src/tesseract/testing/phototest.tif')
    with PyTessBaseAPI() as api:
        api.SetImage(image)
        boxes = api.GetComponentImages(RIL.TEXTLINE, True)
        print 'Found {} textline image components.'.format(len(boxes))
        for i, (im, box, _, _) in enumerate(boxes):
            # im is a PIL image object
            # box is a dict with x, y, w and h keys
            api.SetRectangle(box['x'], box['y'], box['w'], box['h'])
            ocrResult = api.GetUTF8Text()
            conf = api.MeanTextConf()
            print (u"Box[{0}]: x={x}, y={y}, w={w}, h={h}, "
                   "confidence: {1}, text: {2}").format(i, conf, ocrResult, **box)