Search code examples
azureazure-cognitive-servicesface-api

What unit is (X,Y) coordinate in Microsoft Azure Text recognition bounding box response?


What unit is (X,Y) coordinate in Microsoft Azure Text recognition bounding box response?

Ex.:

{
  "status": "Succeeded",
  "succeeded": true,
  "failed": false,
  "finished": true,
  "recognitionResult": {
    "lines": [
      {
        "boundingBox": [
          67,
          204,
          668,
          210,
          667,
          272,
          66,
          267
        ],
        "text": "Our greatest glory is not",
        ...

The json response shows the four coordinates of the bounding boxes in a clockwise disposition. However, I haven't found the unit. I assume that it is pixels, but it's not written anywhere...

The API is available here:

https://westus.dev.cognitive.microsoft.com/docs/services/5adf991815e1060e6355ad44/operations/587f2cf1154055056008f201


Solution

  • They are pixels as you assumed, try to track the coordinates provided by the API on your image as pixels and you will find yourself drawing a rectangle around each object detected by the API.

    Just think of your image as a coordinate plane, and pixels are its unit, which is in fact the reality.