Running a PyTorch dataloader/Dataset on multiple distributed CPUs

I wonder if there is a way to distributed the dataloader/Dataset to many CPUs, even when using a single GPU. Specifically, I would like to have a Dataset class, and the __getitem__ function will be distributed across many different CPUs (using mpi maybe? but any other way is also good).


EDIT My title was erroneously edited, I am not trying to distribute the model itslef, I only want to distribute the data loading/parsing of the model

EDIT - 2 Some interesting discussion in this direction is available here


  • Fetching data from remote server in pytorch dataloader is kinda a duplicate of your question so I can suggest the same answer.

    I've written RPCDataloader to distribute dataloader workers on remote servers. It's not using mpi (yet) because the bandwidth on simple TCP sockets (over IB) was sufficient in my case, and I can get the node configuration from SLURM.

    It takes 3 steps to use:

    1. Start workers on the data node: python -m rpcdataloader.launch --host= --port=xxxx
    2. Create dataset in the trainer(s), this will instantiate actual datasets on the workers and a placeholder object in the trainer(s):
    dataset = rpcdataloader.RPCDataset(
        workers=['node01:6543', 'node02:5432'],
        root=args.data_path + "/train",
    1. Create Dataloader:
    dataloader = rpcdataloader.RPCDataloader(