Suppose I have an image of a rainbow and would like to use Google Vision API to predict the score
for a set of user-specified labels, for example:
0 Rainbow: 0.965621
1 Sky 0.887454
2 Artwork 0.813930
3 Giraffe 0.015654
4 Coffee 0.012483
The following Google Vision API code:
def detect_labels(path):
"""Detects labels in the file."""
from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
response = client.label_detection(image=image)
labels = response.label_annotations
print('Labels:')
for label in labels:
print(label.description, label.score, label.mid)
Returns the following labels and score values:
description score mid
0 Rainbow 0.965621 /m/0b48hv
1 Vertebrate 0.924276 /m/09686
2 White 0.921867 /m/083jv
3 Cartoon 0.918200 /m/0215n
4 Product 0.908071 /m/01jwgf
5 Green 0.907698 /m/038hg
6 Organism 0.875143 /m/05nnm
7 Textile 0.873498 /m/0dnr7
8 Rectangle 0.853343 /m/0j62f
9 Font 0.841818 /m/03gq5hm
Since only the 'top 10' labels are returned, I do not have the score
for labels such as 'Coffee'
and 'Giraffe'
.
Is it possible to return more than 10 labels? Or is this a limitation of Google Vision API?
Instead of returning the top 10 labels, can I use Google Vision API to predict the likelihood of a user-specified label? For example, predict that the likelihood of 'Coffee'
is 0.012483
?
Is it possible to access all label descriptions
and mid
values? According to EntityAnnotation, mid
is an 'opaque' entity ID, however it states that some values are apparently available in Google Knowledge Graph Search API. Does 'opaque' in this context mean that Google does not share their full list of labels?
I understand I can train my own AutoML Vision model and define my own labels, however it seems silly doing this since I'm happy to use Google's existing label categorisation. I just want more access to label data. Is there a way I simply can request the score
of a chosen mid
for a given image?
Note: I am happy to explore an alternative API if the data I need is not accessible via Google API.
To answer your questions:
max_results
in the request to do that.I did a different approach on using Vision API wherein I used batch_annotate_images() and used a request to define the type of detection that will be used. With this approach I can easily control the features to be used to process the image.
def detect_labels(path):
"""Detects labels in the file."""
from google.cloud import vision
import io
client = vision.ImageAnnotatorClient()
with io.open(path, 'rb') as image_file:
content = image_file.read()
image = vision.Image(content=content)
features = [{"type_": vision.Feature.Type.LABEL_DETECTION, "max_results": 11}]
requests = [{"image": image, "features": features}]
response = client.batch_annotate_images(requests=requests)
for image_response in response.responses:
for label in image_response.label_annotations:
print(u"description : {}".format(label.description))
print(u"score : {}".format(label.score))
print(u"mid : {}\n".format(label.mid))
Used this image for testing and changed the value of max_results to 3 and 11.