Search code examples
azureazure-machine-learning-service

Metric Document is too large in Azure ML Service


I am trying to save metrics : loss, validation loss and mAP at every epoch during 100 and 50 epochs but at the end of the experiment I have this error: Run failed: RunHistory finalization failed: ServiceException: Code: 400 Message: (ValidationError) Metric Document is too large

I am using this code to save the metrics

run.log_list("loss", history.history["loss"])
run.log_list("val_loss", history.history["val_loss"])
run.log_list("val_mean_average_precision", history.history["val_mean_average_precision"])

I don't understand why trying to save only 3 metrics exceeds the limits of Azure ML Service.


Solution

  • You could break the run history list writes into smaller blocks like this:

    run.log_list("loss", history.history["loss"][:N])
    run.log_list("loss", history.history["loss"][N:])
    

    Internally, the run history service concatenates the blocks with same metric name into a contiguous list.