Search code examples
azure-cognitive-services

Difference between Computer Vision API and Custom Vision API


I'm fairly new to using Microsoft's cognitive services. I'd like to know what is the difference between MS Computer Vision API and MS Custom Vision API?


Solution

  • They both deal with computer vision on images, but hopefully, I can help make them more distinguishable here. :)

    Computer Vision

    The Computer Vision API is where Microsoft has built their own image models that can give you a few things:

    • Image classification - This is where the API will give you a number of tags that classify the image. It should also give you a confidence score of how strongly the model predicts the image to be of that tag.
    • Content Moderation - The API can give you an isAdult and isRacy flags to determine if the image meets those criteria. An accompanied confidence score is with those, too.
    • OCR - The API can read text within the images and will give you the text. This API can also work with handwritten text instead of just text on signs.
    • Facial Recognition - This API will recognize the faces of celebrities or other well-known people within images.
    • Landmark Recognition - This will recognize landmarks within images.

    Custom Vision

    The Custom Vision service is a little bit different where you can train a model of your own images based off of a prebuilt model that Microsoft has. For one thing, this can only do image classification and object detection. The object detection portion is where it will tell you not only what tag an image is, but show where in the image it is. Currently, this part of the service is in preview, but I've seen good results with it so far.

    Another difference is that the Custom Vision service allows you to upload your own images. For image classification, this means you can upload your images and, for each image, give it one or multiple tags. So when you run an image through the model it will return the tag(s) it thinks it is along with the tag's confidence score. For object detection, you do the same process, but you pick in the images where the object is you want to detect and give that a tag.

    Each time you upload and tag new images the model needs to be trained. From there you can evaluate how well your model performs, give it test images, or even use the REST URLs or SDKs to interact with it.

    To summarize, the biggest difference between the two is the Custom Vision service can only do image classification and object detection, as well as take in your own images to perform those against. The Computer Vision APIs can do a bit more, but you don't have any control over how the models are trained.

    Hope that helps! If you have any questions, just let me know.