tensorflow object-detection object-detection-api

Does one step in Object Detection API mean processing one picture or one bounding box?

In pipeline.config file in Tensorflow Object Detection API we have parameter NUM_STEPS.

Does one step mean processing one whole picture, or one bounding box?

In the config file, we have:

model {
  faster_rcnn {
    # (...)
  }

  train_config: {
    batch_size: 1
    optimizer {
      # (...)
    }
    gradient_clipping_by_norm: 10.0
    # (...)
    num_steps: 200000  # <-- HERE IT IS
    # (...)
  }
}

E.g. We've got a training TFRecord with 2 pictures, 10 bboxes each. If I have NUM_STEPS set to 10, does this mean, that I would process first 10 bboxes, or each photo 5 times?

Full config file can be found here:

https://github.com/tensorflow/models/blob/32dadfc2def4f05faeedacce98e4c4099be4c433/research/object_detection/samples/configs/faster_rcnn_inception_v2_coco.config#L113

Solution

One 'step' corresponds to one batch processing.

The input of faster-RCNN is a full image and your batch size is 1, thus it means that you are using one image each time. In your case, the first step will process the five box of the first image and the second step the five of the second one.