Can ideal num_workers for a large dataset in PyTorch be 0?

I am currently testing out different num_workers in DataLoader in PyTorch and it seems that 0 has the shortest running time.

I also tried out https://github.com/developer0hye/Num-Workers-Search, which is an automated num_workers search based on dataset and batch_size (and some other parameters), and it also gives 0 as the ideal num_workers.

The CPU itself is a server based AMD Epyc with 128 cores (256 threads), running on Ubuntu 20.04.

Can the CPU processing power be the answer to why the ideal num_workers is set to 0? It is a bit counter-intuitive, especially with a large number of threads.

Solution

Yes this definitely can be the case. If e.g. the bottleneck is e.g. a hard drive from which the data is read, then even with multiple workers data cannot be read faster. Then the overhead of having multiple processes really just decreases the performance. But it can also be the case that wherever you load your data from is so fast that it is just no bottleneck at all, in which case having more workers also just adds overhead.