Search code examples
azureazure-cognitive-search

Azure Search, skip large blobs but still index metadata


We have a set of blobs, all sorts of content.

We need to index the metadata and the content, but we are happy to just skip the content for unsupported file types and very large files. For example we have

File One.docx - supported type - Indexes metadata and content (good)

File Two.dat - unsupported type - Indexes metadata skip content (good)

File Three.txt - supported type, fails due to the size of the blob. (bad)

Our search is config is based on the docs , we just added failOnUnsupportedContentType to the Configuration and set it to false

We would like to index the metadata for File Three.txt but skip the large content, something like failOnOversizedContent which we would set to false.

Right now we get an error relating the size of the blob being too large.


Solution

  • UPDATE Jan 3, 2018

    I realized that my original suggestion to use AzureSearch_SkipContent blob metadata does not resolve the issue since blob still needs to be downloaded to process content type metadata.

    To make this scenario work gracefully, we are adding indexStorageMetadataOnlyForOversizedDocuments indexer configuration setting. It takes a bool value and is false by default, so set it to true in the indexer configuration to enable it. This is fresh off the presses and will be deployed in production worldwide by January 19.

    Original response

    You can add AzureSearch_SkipContent: true metadata to the large blobs, as described in Controlling which parts of the blob are indexed. I realize it may be inconvenient, but that's something that can unblock you.

    We would like to index the metadata for File Three.txt but skip the large content, something like failOnOversizedContent which we would set to false.

    This looks like a useful feature request - please add a suggestion at our UserVoice site and we'll consider this, especially if we see other customers asking for this.