Search code examples
azureazure-form-recognizer

How to train custom model for different document layouts with the same set of labels?


I'm trying to understand what is the best way to train a custom model for invoices in languages not supported by the prebuilt invoice model, french as an example.

As normal we will have many different invoice layouts from different vendors, but in all of them, we will extract the same set of labels (invoice number, amount, date, vendor name, etc).

Should I create a model per vendor and compose it? If I do so, do I need to train it for all vendors, or will it work for invoices that were not trained, but use the same verbiage as trained invoices?


Solution

  • I got an answer from Microsoft on MS QA site, see below:
    "For invoices (I believe he meant English invoices) you should use the pre-built Invoice model, no training required - https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/concept-invoices.
    If you need to train a model and not use the pre-built than yes a model per vendor\provider and compose them. Start with the top providers so that you get more coverage."

    Find more information on the MS QA Question.