Clarifai - Returning regions for custom trained models

The documentation shows that only concepts are returned for custom trained models:

{
  "status": {
    "code": 10000,
    "description": "Ok"
  },
  "outputs": [
   ...,
  "created_at": "2016-11-22T16:59:23Z",
  "model": {
   ...
    },
    "model_version": {
    ...
      }
    }
  },
  "input": {
    "id": "e1cf385843b94c6791bbd9f2654db5c0",
    "data": {
      "image": {
        "url": "https://s3.amazonaws.com/clarifai-api/img/prod/b749af061d564b829fb816215f6dc832/e11c81745d6d42a78ef712236023df1c.jpeg"
      }
    }
  },
  "data": {
    "concepts": [
      {
       ...
      },

Whereas pre-trained models such as demographic and face return regions with the x/y location in the image.

If I want to detect WHERE in the image the concept is predicted for my custom models. Is my only option to split the image into a grid and submit as bytes? This seems counter-productive as this would incur additional lookups.

Solution

In the Clarifai platform: Demographics, Face Detections and Apparel Detections are all object detection models. General, travel, food, etc. are classification models. Classification and object detection are two different (although similar seeming) computer vision tasks.

For example, if you're looking to classify an image as 'sad', it doesn't make sense to have a bounding box (i.e. area outlining) the 'sadness'. Classification takes into account the entire image.

Object detection, looks at individual parts of the image and tries to see if the object is there (kind of like you were suggesting with your work around). So where is the 'knife' or whatever you're looking for as a discrete object.

Confusingly, you could have conceptual overlap, such as having a concept of 'face'. You could have a picture have this classification, but there could also be a specific 'face' object that is detected at a specific place. Classifications are not limited to abstract concepts (although it is helpful to think of them that when when thinking about the differences between these two approaches).

Right now all custom models are classification models and not object detection models. I think there is work being done for this on the enterprise level of the system but I don't believe there is anything currently available. The general models you are using sound like they happen to be object detection models - so you get some bonus information with them!

BTW: If I understand it, your proposed workaround should work, by basically splitting the image up into small images and asking for classification on each of them. You're right it would be inefficient, but I'm not sure of a better option at the moment.