Search code examples
c#azureasp.net-corememorystreamazure-form-recognizer

Azure Form Recognizer only analyzes the first file in a stream


I am testing some AI Document analysis stuff, and am currently trying to allow users to Upload Files to a WebApp, which in turn sends them to Azure Form Recognizer and processes the results.

I am however not able to do so in a single Request.

This is how the Files are represented:

[BindProperty] public List<IFormFile> Upload { get; set; }

I can iterate over these and get the expected results, but this makes the operation take quite long. I would like to just send all of the files in one request (as shown below), but it only ever analyzes the first one. I am using Azure.AI.FormRecognizer.DocumentAnalysis, so the client and StartAnalyzeDocument Method is from there.

        using (var stream = new MemoryStream())
        {
            foreach (IFormFile formFile in Upload)
            {
                formFile.CopyTo(stream);
            }
            stream.Seek(0, SeekOrigin.Begin);
            AnalyzeDocumentOperation operation = client.StartAnalyzeDocument(modelId, stream);
            operation.WaitForCompletion();
            Console.WriteLine("This many documents were analysed: " + operation.Value.Documents.Count);
            result = operation.Value;
        };

"result" is what I process later on. I am quite stumped on this, as I would have expected the appended stream to just work. If anyone has a solution or could point me in the right direction, it would be much appreciated.


Solution

  • Form Recognizer does not yet support processing multiple documents in a single analyze operation for prebuilt-invoice and custom models. Furthermore, most file formats cannot just be appended together to concatenate the content.

    One way to speed up the analysis of multiple files in a batch is to call the analyze operation in parallel. Here is a sketch.

    var results = Upload.AsParallel().ForAll(formFile => 
    {
        using (var stream = formFile.OpenReadStream())
        {
            var operation = client.StartAnalyzeDocument(modelId, stream);
            operation.WaitForCompletion();
            return operation.Value;
        }
    }).ToArray();