Using the code below, I would like to ask a few questions about what exactly is happening underneath.
dataset =
dataset =, num_parallel_calls=4)
dataset = dataset.repeat()
dataset = dataset.shuffle(1024)
dataset = dataset.batch(16)
iterator = dataset.make_one_shot_iterator(), num_parallel_calls=4)
- How many records are we loading here ? How much will fit in the memory or some fixed number ?
2.dataset = dataset.repeat()
- What exactly do we repeat ? Currently loaded piece of data from point .1 ? If so, does it mean that we will not load the others anymore ?
3.How exactly does shuffle work?
4.Can we use repeat, shuffle and batch before map and work on file paths instead of files alone ?
and shuffle
together here.