Search code examples
.netsdkocrazure-form-recognizer

Why can't Form Recognizer SDK v3 find any OCR documents to train?


I am trying to build a Form Recognizer custom model using the v3 preview, using the sample code:

Uri trainingFileUri = new Uri(sasToken);
var client = new DocumentModelAdministrationClient(
               new Uri(endpoint), new 
               AzureKeyCredential(apiKey));

BuildModelOperation operation = await client.StartBuildModelAsync(trainingFileUri);

Response<DocumentModel> operationResponse = await operation.WaitForCompletionAsync();

The sas token is a for a Blob container containing 20 pdf files. When I run I get the error

Status: 200 (OK) ErrorCode: InvalidRequest

Additional Information: AdditionInformation: InvalidRequest: Invalid request.

Details: ModelBuildError: Could not build the model: Can't find any OCR files for training.

Raw:

{ "code": "InvalidRequest", "message": "Invalid request.", "details": [ { "code": "ModelBuildError", "message": "Could not build the model: Can\u0027t find any OCR files for training." } ] }

The SAS token has read, write, list, etc permissions, so I don't know why the client could not find any documents to train. Any ideas?


Solution

  • The preview API you linked to does not support training without labels. You will need a labeled dataset to train a model.

    Did you use the Form Recognizer Studio to label your files?

    Training a model requires your storage account to contain 3 types of files:

    1. A single file - fields.json
    2. For each file in your training dataset 2 additional files are created during the labeling process
      • {FileName}.labels.json
      • {FileName}.ocr.json

    The error message indicates that you may not have labeled your documents.