Search code examples
pythonpytorchpytorch-dataloader

RuntimeError: DataLoader worker (pid(s) 15876, 2756) exited unexpectedly


I am compiling some existing examples from the PyTorch tutorial website. I am working especially on the CPU device no GPU.

When running a program the type of error below is shown. Does it become I'm working on the CPU device or setup issue? raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e RuntimeError: DataLoader worker (pid(s) 15876, 2756) exited unexpectedly`. How can I solve it?

import torch
import torch.functional as F
import torch.nn as nn
import torch.optim as optim

import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

from torch.utils.tensorboard import SummaryWriter
from torch.utils.data import DataLoader
from torchvision import datasets

device = 'cpu' if torch.cuda.is_available() else 'cuda'
print(device)

transform = transforms.Compose(
[transforms.ToTensor(),
 transforms.Normalize((0.5,), (0.5,))]
)
#Store separate training and validations splits in data
training_set = datasets.FashionMNIST(
 root='data',
 train=True,
 download=True,
 transform=transform
)
validation_set = datasets.FashionMNIST(
root='data',
train=False,
download=True,
transform=transform
)
training_loader = DataLoader(training_set, batch_size=4, shuffle=True, num_workers=2)
validation_loader = DataLoader(validation_set, batch_size=4, shuffle=False, num_workers=2)
classes = ('T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
    'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle Boot')


def matplotlib_imshow(img, one_channel=False):
  if one_channel:
     img = img.mean(dim=0)
img = img/2+0.5 #unnormalize
npimg = img.numpy()
if one_channel:
    plt.imshow(npimg, cmap="Greys")
else:
    plt.imshow(np.transpose(npimg, (1, 2, 0)))


dataiter = iter(training_loader)
images, labels = dataiter.next()

img_grid = torchvision.utils.make_grid(images)
matplotlib_imshow(img_grid, one_channel=True)

Solution

  • You need to first figure out why the dataLoader worker crashed. A common reason is out of memory. You can check this by running dmesg -T after your script crashes and see if the system killed any python process.