Search code examples
azure-cognitive-searchazure-blob-storage

Content extract issue in Azure search with blob storage containing image files


My requirement is to search thought the "Content inside images" and image content inside pdf.

I have chosen blob storage to keep the all the files. I consists of file types like pdf, xml, text, png, jpeg.

I should be able to search through the content inside the images ( even the image is inside pdf). I see the microsoft documentation that blob storage dont support extracting the content of image files.

I came across the option "AzureSearch_SkipContent", which will allow to search through the metadata of the image (unsupported) files.

My question is, searching thought the content of the image files is not possible only blob storage or it is not even possible in all the storage options below. • Azure SQL Database • SQL Server relational data on an Azure VM • Azure Cosmos DB • Azure Blob storage • Azure Table storage

Thanks in advance.


Solution

  • UPDATE May 21, 2018

    This functionality is now available to all customers as a part of Cognitive Search feature of Azure Search.

    Original response:

    Azure Search is starting a private preview of OCR support for image files in Azure blob storage, as well as images inside PDFs /scanned PDFs. If you'd like to participate, please reach out. I'll add contact info as a comment below.