Search code examples
pythontesseractfont-sizepython-tesseract

get Font Size in Python with Tesseract and Pyocr


Is it possible to get font size from an image using pyocr or Tesseract? Below is my code.

tools = pyocr.get_available_tools()
tool = tools[0]
txt = tool.image_to_string(
      Imagee.open(io.BytesIO(req_image)),
      lang=lang,
      builder=pyocr.builders.TextBuilder()
)

Here i get text from image using function image_to_string . And now, my question is, if i can get font-size(number) too of my text.


Solution

  • Using tesserocr, you can get a ResultIterator after calling Recognize on your image, for which you can call the WordFontAttributes method to get the information you need. Read the method's documentation for more info.

    import io
    import tesserocr
    from PIL import Image
    
    with tesserocr.PyTessBaseAPI() as api:
        image = Image.open(io.BytesIO(req_image))
        api.SetImage(image)
        api.Recognize()  # required to get result from the next line
        iterator = api.GetIterator()
        print iterator.WordFontAttributes()
    

    Example output:

    {'bold': False,
     'font_id': 283,
     'font_name': u'Times_New_Roman',
     'italic': False,
     'monospace': False,
     'pointsize': 9,
     'serif': True,
     'smallcaps': False,
     'underlined': False}