I cannot for the love of me find any documentation about how Tesseract box files work, and what the coordinates represent.
For instance, I'm getting:
T 2768 165 2789 191 0
The first token is obviously the character. I know that Tesseract uses bottom-left. 2768
should therefore be the bottom. The 4th token (2789
) seems to be the top. I don't get what the 3rd (165
), 5th (191
), and 6th (0
) tokens are. 165
and 191
are incorrect as left/right coordinates, and 0
I have no idea what it refers to.
Can anyone help me? Are these pixel coordinates, or do I have to factor in the DPI of the image?
Thanks!
According to documentation, the format for each line is
<symbol> <left> <bottom> <right> <top> <page>
Where:
<symbol>
is the character e.g. a or b.<left> <bottom> <right> <top>
are the coordinates of the rectangle that fits the character on the page. Note that the coordinates system used by Tesseract has (0,0) in the bottom-left corner of the image!<page>
is only relevant if you’re using multi-page TIFF files. In all other cases just put 0 in here.So in your particular case
T 2768 165 2789 191 0
would be
T
2768
165
2789
191
0