Search code examples
pythondeep-learningcntk

eval and test_minibatch in cntk


We are using TrainResNet_CIFAR10.py as an example to learn cntk. We have created two methods, eval_metric and calc_error as below:

def eval_metric(trainer, reader_test, test_epoch_size, label_var, input_map) :
    # Evaluation parameters
    minibatch_size = 16

    # process minibatches and evaluate the model
    metric_numer    = 0
    metric_denom    = 0
    sample_count    = 0

    while sample_count < test_epoch_size:
        current_minibatch = min(minibatch_size, test_epoch_size - sample_count)
        # Fetch next test min batch.
        data = reader_test.next_minibatch(current_minibatch, input_map=input_map)
        # minibatch data to be trained with
        metric_numer += trainer.test_minibatch(data) * current_minibatch
        metric_denom += current_minibatch
        # Keep track of the number of samples processed so far.
        sample_count += data[label_var].num_samples

    return metric_numer / metric_denom

def calc_error(trainer, fileList, mean_value, test_size) :
    if (len(fileList) != test_size) :
        return 0

    n   = 0
    m = 0
    while n < test_size:
        c = evalute(trainer, fileList[n].filename, mean_value);
        if (c != fileList[n].classID) :
            m += 1 
        n += 1

    return m / test_size

def evalute(trainer, img_name, mean_value) :
    rgb_image = np.asarray(Image.open(img_name), dtype=np.float32) - mean_value
    bgr_image = rgb_image[..., [2, 1, 0]]
    pic = np.ascontiguousarray(np.rollaxis(bgr_image, 2))
    probs = trainer.eval({trainer.arguments[0]:[pic]})
    predictions = np.squeeze(probs)
    top_class = np.argmax(predictions)
    return top_class

We thought test_minibatch(data) returns percent of incorrect results and the two methods should give similar results. My questions are:

  1. What does trainer.test_minibatch(data) return?
  2. For CIFAR-10 test images, the differences between the two methods are within 10%, but for our own sample images, which have 64x64x3 and 4 classes, the differences are more than 100%. What could cause the large difference?
  3. If we pass the trainer directly to calc_error, it gives error during eval. We have to save and load_model first before call calc_error, why?

Solution

  • trainer.test_minibatch returns the average value of the loss (or in general the first argument).

    There are also these methods that you can use after calling test_minibatch: trainer.previous_minibatch_loss_average, trainer.previous_minibatch_sample_count, and trainer.previous_minibatch_evaluation_average.

    The differences are likely coming from pre-processing. Is the mean_value the same as when you trained the network? Is it in RGB order or in BGR order?

    Have you considered reducing your evaluation set to a single image and verify that you get exactly the same output with the reader and by manually loading the image?