Search code examples
pythonimage-recognitiongoogle-cloud-visionimage-classification

Using Google Vision API to Predict Score of User-Specified Labels


Suppose I have an image of a rainbow and would like to use Google Vision API to predict the score for a set of user-specified labels, for example:

0    Rainbow:   0.965621
1    Sky        0.887454
2    Artwork    0.813930
3    Giraffe    0.015654
4    Coffee     0.012483

The following Google Vision API code:

def detect_labels(path):
    """Detects labels in the file."""
    from google.cloud import vision
    import io
    client = vision.ImageAnnotatorClient()

    with io.open(path, 'rb') as image_file:
        content = image_file.read()

    image = vision.Image(content=content)

    response = client.label_detection(image=image)
    labels = response.label_annotations
    print('Labels:')

    for label in labels:
        print(label.description, label.score, label.mid)

Returns the following labels and score values:

  description       score          mid
0     Rainbow    0.965621    /m/0b48hv
1  Vertebrate    0.924276     /m/09686
2       White    0.921867     /m/083jv
3     Cartoon    0.918200     /m/0215n
4     Product    0.908071    /m/01jwgf
5       Green    0.907698     /m/038hg
6    Organism    0.875143     /m/05nnm
7     Textile    0.873498     /m/0dnr7
8   Rectangle    0.853343     /m/0j62f
9        Font    0.841818   /m/03gq5hm

Since only the 'top 10' labels are returned, I do not have the score for labels such as 'Coffee' and 'Giraffe'.

  1. Is it possible to return more than 10 labels? Or is this a limitation of Google Vision API?

  2. Instead of returning the top 10 labels, can I use Google Vision API to predict the likelihood of a user-specified label? For example, predict that the likelihood of 'Coffee' is 0.012483?

  3. Is it possible to access all label descriptions and mid values? According to EntityAnnotation, mid is an 'opaque' entity ID, however it states that some values are apparently available in Google Knowledge Graph Search API. Does 'opaque' in this context mean that Google does not share their full list of labels?

I understand I can train my own AutoML Vision model and define my own labels, however it seems silly doing this since I'm happy to use Google's existing label categorisation. I just want more access to label data. Is there a way I simply can request the score of a chosen mid for a given image?

Note: I am happy to explore an alternative API if the data I need is not accessible via Google API.


Solution

  • To answer your questions:

    1. Yes it is possible return more than 10 labels. Just adjust the max_results in the request to do that.
    2. Yes, you can cross check the entered user-specific label on the response from the API.
    3. No, the labels used by Google are in a repository that is continuously growing in numbers which is probably in millions.

    I did a different approach on using Vision API wherein I used batch_annotate_images() and used a request to define the type of detection that will be used. With this approach I can easily control the features to be used to process the image.

    def detect_labels(path):
        """Detects labels in the file."""
        from google.cloud import vision
        import io
        client = vision.ImageAnnotatorClient()
    
        with io.open(path, 'rb') as image_file:
            content = image_file.read()
    
        image = vision.Image(content=content)
        features = [{"type_": vision.Feature.Type.LABEL_DETECTION, "max_results": 11}]
        requests = [{"image": image, "features": features}]
    
        response = client.batch_annotate_images(requests=requests)
    
        for image_response in response.responses:
            for label in image_response.label_annotations:
                print(u"description : {}".format(label.description))
                print(u"score : {}".format(label.score))
                print(u"mid : {}\n".format(label.mid))
    

    Used this image for testing and changed the value of max_results to 3 and 11.

    Changed max_results to 3: enter image description here

    Changed max_results to 11: enter image description here