Search code examples
image-processingocrtesseract

What are the numbers in Tesseract box file?


I cannot for the love of me find any documentation about how Tesseract box files work, and what the coordinates represent.

For instance, I'm getting:

T 2768 165 2789 191 0

The first token is obviously the character. I know that Tesseract uses bottom-left. 2768 should therefore be the bottom. The 4th token (2789) seems to be the top. I don't get what the 3rd (165), 5th (191), and 6th (0) tokens are. 165 and 191 are incorrect as left/right coordinates, and 0 I have no idea what it refers to.

Can anyone help me? Are these pixel coordinates, or do I have to factor in the DPI of the image?

Thanks!


Solution

  • According to documentation, the format for each line is

    <symbol> <left> <bottom> <right> <top> <page>
    

    Where:

    • <symbol> is the character e.g. a or b.
    • <left> <bottom> <right> <top> are the coordinates of the rectangle that fits the character on the page. Note that the coordinates system used by Tesseract has (0,0) in the bottom-left corner of the image!
    • <page> is only relevant if you’re using multi-page TIFF files. In all other cases just put 0 in here.

    So in your particular case

    T 2768 165 2789 191 0
    

    would be

    • character: T
    • left: 2768
    • bottom: 165
    • right: 2789
    • top: 191
    • page: 0