Search code examples
pythontensorflowmachine-learningobject-detection-api

Is one step equal to on epoch if i train my model on one (large) image?


I am implementing an active learning pipeline with the tensorflow object detection api. Therefor i am starting with one image from the xView Dataset (about 3000x4000px in size).

Now i am training my faster_rcnn network with a batch size of 1. If there is only one image to train on and the batch size is 1, is every step (printed in the console) equal to one epoch?

Lets say after 20 active learning cycle there are 20 images in the training dataset and i train for 19 steps, the last image is never trained on, right?

If the image number increases but the step number per active learning cycle stays the same, the network will never train on the later added images or will the training resume where it stopped (image 19 for example)


Solution

  • You seem to understand epoch correctly: it's a training pass in which you train once on each image in the data set. If your batch size is equal to the data set size, then yes, you have one iteration per epoch.

    If you train for 19 steps (batch size = 1) and there are 20 images, then one of them will be left out of the near-epoch ... but the left-out image is not necessarily the "last" one, ("last" depending on how your images are ordered). This depends on your data ingestion software -- which you didn't specify.

    Most of these input packages employ a "shuffle" operation, a function that will randomly order the data set at the beginning of each epoch. I have also worked with one ingestion package that did as you suggest, picking up each pass (pseudo-epoch training group) from where the previous one left off. It also had an option to re-shuffle or not, each time the data set was exhausted.

    For a definitive answer, you'll have to check your framework's documentation and the configuration choices made for your particular model. If there is no such documentation, you're stuck doing what I've had to do a few times: take ten minutes of primal scream time :-) , and then read the code.