Search code examples
azureazure-cognitive-search

Truncated text when adding data to Chat playground in Azure


I am adding data to my assistant in Azure AI Studio manually by dragging and dropping a file. The file is 3.3MB and 265 pages long. When executing the index creation. I get this warning:

Your data was connected with the following warnings

Truncated extracted text to '65536' characters. (1 item(s) impacted)

Which makes me think that the whole PDF is not available in the index. No matter what chunk size I select, I get this error. And there doesn't seem to be any other way of affecting this output. How can this be fixed?


Solution

  • In Azure AI Search, there is a limit on the amount of text that an indexer can extract from each of your documents that varies by the search service SKU. Based on your error, it looks like you are using a Basic Azure AI Search instance. Using an S1 would allow you to be able to extract 4 million characters per document instead of only 64 thousand, which is usually sufficient for most customer documents.

    Reference: Indexer limits See the "Blob indexer: maximum characters of content extracted from a blob" limit.