I am using Microsoft Computer Vision API for OCR processing and I noticed that they are getting charged as S3 transactions instead of S2 in my bill.
I'm using the .NET SDK and the API I am using is this one. https://learn.microsoft.com/en-us/dotnet/api/microsoft.azure.cognitiveservices.vision.computervision.computervisionclientextensions.readasync?view=azure-dotnet
I have also confirmed that the actual REST API the SDK calls is the following POST /vision/v3.2/read/analyze https://centraluseuap.dev.cognitive.microsoft.com/docs/services/computer-vision-v3-2/operations/5d986960601faab4bf452005
According to documentation, that should be the OCR Read API, am I correct? https://learn.microsoft.com/en-us/azure/cognitive-services/computer-vision/vision-api-how-to-topics/call-read-api
I am puzzled as to why my calls are getting charged as S3 instead of S2. This is important for me because S3 is 50% more expensive than S2. Using the Pricing Calculator, 1000 S2 transactions is $1, whereas 1000 S3 transactions is $1.5. https://azure.microsoft.com/en-us/pricing/calculator/?service=cognitive-services
What's the difference between OCR and "Describe and Recognize Text" anyways? OCR (Optical Character Recognition) by definition must recognize text. I am calling the Read API without any of the optional parameters so I did not ask for "Describe" hence the call should be S2 feature rather than S3 feature I think.
I already posted this question at Microsoft Q&A but I thought SO might get more traffic hence help me get an answer faster. https://learn.microsoft.com/en-us/answers/questions/689767/computer-vision-api-charged-as-s3-transaction-inst.html
To help you understand, you need a bit of history of those services. Computer Vision API (and all "calling" SDKs, whether C#/.Net, Java, Python etc using these APIs) have moved frequently and it is sometimes hard to understand which SDK calls which version of the APIs.
Regarding optical character reading operations, there have been several operations:
See definition here was containing:
OCR
operation, a synchronous operation to recognize printed textRecognize Handwritten Text
operation, an asynchronous operation for handwritten text (with "Get Handwritten Text Operation Result" operation to collect the result once completed)See definition here. OCR was still there, but "Recognize Handwritten Text" was changed. So there were:
OCR
operation, a synchronous operation to recognize printed textRecognize Text
operation, asynchronous (+ Get Recognize Text Operation Result to collect the result), accepting both printed or handwritten text (see mode
input parameter)Batch Read File
operation, asynchronous (+ "Get Read Operation Result" to collect the result), which was also processing PDF files whereas the other one were only accepting images. It was intended "for text-heavy documents"Computer Vision 2.1 was similar in terms of operations.
See definition here.
Main changes: Recognize Text
and Batch Read File
were "unified" into a Read
operation, with models improvements. No more need to specify handwritten / printed for example (see link).
The Read API is optimized for text-heavy images and multi-page, mixed language, and mixed type (print – seven languages and handwritten – English only) documents
So there were:
OCR
operation, a synchronous operation to recognize printed textRead
operation, asynchronous (+ Get Read Result to collect the result), accepting both printed or handwritten text, images and PDF inputs.Same for Computer Vision v3.1-preview.1, v3.1-preview.2, v3.1, v3.2-preview.1, v3.2-preview.2, v3.2-preview.3
All recent versions of the SDKs implementing a Read
method are calling this 3.x. operation. See the changelog for example for .Net SDK here:
v7.0.x of the SDK "supports v3.2 Cognitive Services Computer Vision API endpoints."
It is normal that you are billed S3 for Read
. But the calculator is misleading as the "Recognize Text" term should be changed for "Read".
If you really want to use OCR
operation, use RecognizePrintedTextAsync
method of the SDK which is the one using it.
OCR is an old model, used only for printed text. Read operation is the latest model. I can also confirm (based on a few tests that I made) that the performance is lower than Read operation. If you want to quickly test, you can use your key on this website: it is an open-source portal created by another Microsoft MVP, where I also contributed. You will be able to see both results of OCR and Read operations. It is currently using 6.0.0 SDK version of Computer Vision (see source).