Search code examples
google-cloud-platformocrgoogle-cloud-vision

Char's bounding box order of vertices


Google Vision API documentation states that vertices of detected characters will always be in the same order:

// The bounding box for the symbol.
// The vertices are in the order of top-left, top-right, bottom-right,
// bottom-left. When a rotation of the bounding box is detected the rotation
// is represented as around the top-left corner as defined when the text is
// read in the 'natural' orientation.
// For example:
//   * when the text is horizontal it might look like:
//      0----1
//      |    |
//      3----2
//   * when it's rotated 180 degrees around the top-left corner it becomes:
//      2----3
//      |    |
//      1----0
//   and the vertice order will still be (0, 1, 2, 3).

However sometimes I can see a different order of vertices. Here is an example of two characters from the same image, which have the same orientation:

[x:778 y:316  x:793 y:316  x:793 y:323  x:778 y:323 ]
0----1
|    |
3----2

and

[x:857 y:295  x:857 y:287  x:874 y:287  x:874 y:295 ]
1----2
|    |
0----3

Why order of vertices is not the same? and not as in documentation?


Solution

  • It seems like a bug in Vision API. The solution is to detect image orientation and then reorder vertices in the correct order.

    Unfortunately Vision API doesn't provide image orientation in it's output, so I had to write code to detect it.

    Horizontal/vertical orientation can be detected by comparing character height and width. Height is usually larger than width.

    Next step is to detect direction of text. For example in case of vertical image orientation, text may go from up to down or from down to up.

    Most of characters in output seem to appear in the natural way. So by looking at stats we can detect text direction. So for example: line 1 has Y coord 1000 line 2 has Y coord 900 line 3 has Y coord 950 line 4 has Y coord 800 We can see that image is rotated upside down.