We have a set of blobs, all sorts of content.
We need to index the metadata and the content, but we are happy to just skip the content for unsupported file types and very large files. For example we have
File One.docx
- supported type - Indexes metadata and content (good)
File Two.dat
- unsupported type - Indexes metadata skip content (good)
File Three.txt
- supported type, fails due to the size of the blob. (bad)
Our search is config is based on the docs , we just added failOnUnsupportedContentType
to the Configuration
and set it to false
We would like to index the metadata for File Three.txt
but skip the large content, something like failOnOversizedContent
which we would set to false
.
Right now we get an error relating the size of the blob being too large.
UPDATE Jan 3, 2018
I realized that my original suggestion to use AzureSearch_SkipContent
blob metadata does not resolve the issue since blob still needs to be downloaded to process content type metadata.
To make this scenario work gracefully, we are adding indexStorageMetadataOnlyForOversizedDocuments
indexer configuration setting. It takes a bool value and is false
by default, so set it to true
in the indexer configuration to enable it. This is fresh off the presses and will be deployed in production worldwide by January 19.
Original response
You can add AzureSearch_SkipContent: true
metadata to the large blobs, as described in Controlling which parts of the blob are indexed. I realize it may be inconvenient, but that's something that can unblock you.
We would like to index the metadata for File Three.txt but skip the large content, something like
failOnOversizedContent
which we would set to false.
This looks like a useful feature request - please add a suggestion at our UserVoice site and we'll consider this, especially if we see other customers asking for this.