Search code examples
azureazure-form-recognizer

How to target Local Path on TrainCustomModelAsync Form Recognizer


Can someone explain to me how TrainModelAsync can access local path on windows as the Source files. The documentation said:

The request must include a source parameter that is either an externally accessible Azure storage blob container Uri (preferably a Shared Access Signature Uri) or valid path to a data folder in a locally mounted drive. When local paths are specified, they must follow the Linux/Unix path format and be an absolute path rooted to the input mount configuration setting value e.g., if '' configuration setting value is '/input' then a valid source path would be '/input/contosodataset'. All data to be trained is expected to be under the source folder or sub folders under it. Models are trained using documents that are of the following content type - 'application/pdf', 'image/jpeg', 'image/png', 'image/tiff'. Other type of content is ignored.

  1. What is the valid format for example i have the train files in C:\input\ ?
  2. What is input mount configuration setting value?

Here is my code: (This run successfully if I set the "Source" property to a blob storage)

  var client = new HttpClient();
        var uri = "https://MYRESOURCENAME.cognitiveservices.azure.com/formrecognizer/v2.0-preview/custom/models/";
        // Request headers
        client.DefaultRequestHeaders.Add("Ocp-Apim-Subscription-Key", ENDPOINT_KEY);

        var body =
            new
            {
                source = new Uri("C:\\train\\").AbsolutePath,
                sourceFilter = new
                {
                    prefix = "",
                    includeSubFolders = false
                },
                useLabelFile = true
            };


        StringContent stringContent = new StringContent(JsonConvert.SerializeObject(body), Encoding.UTF8, "application/json");
        var response = await client.PostAsync(uri, stringContent);

Solution

  • The local path option only applies when you run the Form Recognizer service as a container in your own Docker/Kubernetes environment. The hosted Form Recognizer service can only read training data from an Azure Blob Container URL.

    That said, local containers are currently only available for the older v1.0-preview. You can read more about v1.0-preview container at https://learn.microsoft.com/en-us/azure/cognitive-services/form-recognizer/form-recognizer-container-howto