I want to display the batch size when running a tf.distribute strategy. I do this by creating a custom Keras layer as so:
class DebugLayer(tf.keras.layers.Layer):
def __init__(self):
super().__init__()
def build(self, input_shape):
pass
def call(self, inputs):
print_op = tf.print("******Shape is:", tf.shape(inputs) , name='shapey')
#print_op = tf.print("Debug output:", loss, y_true, y_true.shape)
with tf.control_dependencies([print_op]):
return tf.identity(inputs)
If I run with one worker it gives 128 for batch size which is what I set in my tf.data dataset flow .batch(128)
.
If I run with two workers, each worker outputs 128. I want to know how many examples are being run on each worker? How many examples are being run simultaneously?
In my Model.fit()
call, I specify steps_per_epoch
and have a .repeat
in my dataflow. If my training set consists of 1024 samples, I have 2 workers, and my .batch
is set to 128, what should the steps_per_epoch
be set to for one epoch?
When using tf.data
operations there is a .batch()
method that is typically applied to the data. Let's say that value is 128. That will be the number of total examples that will be run per batch regardless of number of workers. if...
For the 3 worker case, I'm not sure the exact number since 128/3 is not an integer value.
For setting steps_per_epoch, divide the total number of samples by the batch size that you set in .batch(). So, for my example in the question it would be 8, which is 1024/128.
This is somewhat inconvenient because you need to know the number of training examples and if they change you need to adjust the steps_per_epoch value. Also, if not an integer multiple you need to decide if you should round, floor, or ceiling the steps_per_epoch value.