Search code examples
azure-form-recognizer

v2 to v3 Transition for Form Recognizer


Because FR v3.0 is still Preview mode, so I went v2.1 Quickstarts, "Analyze using a Prebuilt model", Navigate to the Form Recognizer Sample Tool. Using Form Type = "Invoice" to test many size and text including handwriting, very happy with the results, especially returned JSON file structure:

...
"analyzeResult":
  {
    ...
    readResults:[...],
    pageResults:[...]
    ...
  }

For large/complex image/doc, use pageResults.tables[0].cells based on rowIndex and columnIndex, I can easily piece each row text restoring the whole doc. For small/simple image/doc or when pageResults.tables.length==0, use readResults.lines achieve the same OCR outcome, like one-size-fits-all, perfect!

Next is my own hands-on for the same images, Samples, JavaScript. Because I've been using Invoice only, so I picked recognizeInvoice.js, great sample, easy and simple to follow. Even it's v3 and missing readResults and pageResutls, I'm still able to use invoice.pages[0].tables[0].cells achieve the same result for large/complex image/doc. For small/simple image found 2 issues:

  1. invoice.pages[0].tables.length = 0, so no text values.
  2. only text value is NRT LLC. of invoice.fields.VendorName.value, all other printed text and handwriting returned by v2.1 are gone!

I believe there must be some reasons at MS side for the above changes, for us it means v3 is not backward compatible. And more importantly we wouldn't be able to know if the image fits a model and/or will return something before submitting, even we provide a list of choices of models users may frustrate by extra manual work. At the moment all we can do is switching back to Google. So,

  • where is the v2.x sample code and when will MS discontinue v2.x?
  • how does v2.x transit to v3?

Below is my navigation route. Thank you and really appreciate the great work!

enter image description here


Solution

  • It is a bit confusing, but the versions of the @azure/ai-form-recognizer package on NPM are one major version ahead of the Form Recognizer API versions. The preview API version "2021-09-30-preview" (REST API "v3") can be used with Form Recognizer SDK version 4.0.0-beta.2. REST API version v2.1 (GA) is used with SDK version 3.2.0. On the README for @azure/ai-form-recognizer 3.2.0, it explains this:

    Note: This package targets Azure Form Recognizer service API version 2.x.

    I'm guessing based on what you've said that you are using the latest stable version 3.2.0 of the SDK. When extracting data using a prebuilt or custom model in this version, tables are attached to pages, and pages are attached to Forms, so you can access a table by looking through the forms:

    const poller = await client.beginRecognizeInvoices(inputs);
    const invoices = await poller.pollUntilDone();
    
    const table = invoices[0].pages[0].tables[0];
    

    If a table appears on a page that isn't associated with any form (no form appears on that page), it can't be accessed using this method. That feature is present in the new beta SDK for the new preview API, but in the current SDK to get all pages (regardless of whether or not they contain a form), you could consider using the beginRecognizeContent method.