I wonder if there is a way to distributed the dataloader/Dataset to many CPUs, even when using a single GPU.
Specifically, I would like to have a Dataset class, and the
__getitem__ function will be distributed across many different CPUs (using mpi maybe? but any other way is also good).
EDIT My title was erroneously edited, I am not trying to distribute the model itslef, I only want to distribute the data loading/parsing of the model
EDIT - 2 Some interesting discussion in this direction is available here
Fetching data from remote server in pytorch dataloader is kinda a duplicate of your question so I can suggest the same answer.
I've written RPCDataloader to distribute dataloader workers on remote servers. It's not using mpi (yet) because the bandwidth on simple TCP sockets (over IB) was sufficient in my case, and I can get the node configuration from SLURM.
It takes 3 steps to use:
python -m rpcdataloader.launch --host=0.0.0.0 --port=xxxx
dataset = rpcdataloader.RPCDataset(
root=args.data_path + "/train",
dataloader = rpcdataloader.RPCDataloader(