I'm attempting to leverage the Computer Vision API to OCR a PDF file that is a scanned document but is treated as an image PDF.
I've tested it and it tells me that the PDF is "InvalidImageFormat", "Input data is not a valid image". When I test it on a PNG, it works perfectly.
Is there anyway to use the API against a PDF image or is there an Azure API that I could use in conjunction to go PDF > PNG > Text?
Edit
Since answering additional services have become available, although I have not personally tried some of them, they may suit this purpose.
https://learn.microsoft.com/en-us/azure/search/cognitive-search-concept-intro
And at some point in the future when It goes GA. https://aws.amazon.com/textract/
Original Answer
Unfortunately Azure has no PDF integration for it's Computer Vision API. To make use of Azure Computer Vision you would need to change the pdf to an image (JPG, PNG, BMP, GIF) yourself.
Google do now offer pdf integration and I have been seeing some really good results from it from my testing so far.
This is done through the asyncBatchAnnotateFiles Method of the vision Client (I have been using the NodeJS Variant of the API)
It can handle files up to 2000 pages, Results are divided up into 20 page segments and output to Google Cloud Storage.