Premise
I am fairly new to using PyTorch, and more times than not I am getting a segfault when training my neural network with a small custom dataset (10 images of 90 classifications).
The output below is from these print statements that ran twice (with MNIST dataset at idx 0 and my custom dataset at idx 0). Both datasets were compiled using a csv file formatted the exact same (img_name, class) and with the image directory MNIST subet is of size 30, and my custom dataset is of size 10:
example, label = dataset[0]
print(dataset[0])
print(example.shape)
print(label)
The first tensor is an MNIST 28X28 png converted to a tensor using:
image = torchvision.io.read_image().type(torch.FloatTensor)
This was so I had a working dataset to compare to. It uses the same custom dataset class as the custom data I have.
The Neural Net class is the exact same as my custom data NN except it has 10 outputs as opposed to the 90 from my custom data.
The custom data is of varied sizes, that have all been resized to 28 X 28 using the transforms.Compose() listed below. In this 10 image subset of the data, there are images that are dimensions 800X170, 96X66, 64X34, 208X66, etc...
The second tensor output is from a png that was of size 800 X 170.
The transforms performed on both datasets are the exact same:
tf=transforms.Compose([
transforms.Resize(size = (28,28)),
transforms.Normalize(mean=[-0.5/0.5],std=[1/0.5])
])
There is no target transforms performed.
(tensor([[[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 19.5000,
119.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 93.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 106.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 106.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 106.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.5000,
127.5000, 107.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 63.5000,
127.0000, 106.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 59.0000,
127.0000, 58.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 66.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 106.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 33.0000,
128.0000, 107.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 32.5000,
127.0000, 88.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 59.5000,
127.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.0000,
127.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.0000,
127.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.5000,
128.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.0000,
127.0000, 54.0000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.0000,
127.0000, 60.0000, 8.0000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 85.0000,
127.0000, 127.5000, 84.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 28.0000,
118.5000, 65.5000, 14.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000],
[ 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000,
0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000, 0.5000]]]), 1)
torch.Size([1, 28, 28])
1
Train Epoch: 1 [0/25 (0%)] Loss: -1.234500
Test set: Average loss: -1.6776, Accuracy: 1/5 (20%)
(tensor([[[68.1301, 67.3571, 68.4286, 67.9375, 69.5536, 69.2143, 69.0026,
69.2283, 70.4464, 70.2857, 68.8839, 68.6071, 71.3214, 70.5102,
71.0753, 71.9107, 71.5179, 71.5625, 73.6071, 71.9464, 73.2513,
72.5804, 73.5000, 74.1429, 72.7768, 72.9107, 73.1786, 74.9069],
[68.2028, 70.0714, 68.4821, 69.3661, 70.8750, 69.6607, 70.6569,
70.2551, 70.9464, 70.3393, 70.3929, 71.3571, 71.1250, 72.1901,
70.6850, 71.9464, 72.1071, 72.8304, 72.3036, 72.3214, 73.4528,
73.4898, 72.4286, 73.0179, 73.1071, 73.5179, 73.0357, 74.0280],
[71.3457, 70.4643, 70.4464, 70.7857, 70.6071, 71.9821, 71.6786,
72.7564, 72.4107, 72.2321, 72.8571, 72.7321, 70.0357, 72.2640,
73.8214, 72.8750, 73.0000, 73.0089, 74.8393, 74.1964, 74.9872,
73.4248, 72.0179, 74.5357, 74.9018, 74.9821, 75.0357, 72.9286],
[70.1429, 70.3750, 69.8750, 70.6250, 69.8750, 72.8750, 71.4107,
71.5089, 73.3750, 73.2500, 74.4375, 73.8750, 73.0000, 74.4375,
72.2768, 72.7500, 72.6250, 72.6250, 73.1250, 73.2500, 72.3571,
73.0625, 72.5000, 74.8750, 73.6875, 74.2500, 75.2500, 73.7411],
[53.1428, 56.1607, 57.4286, 58.3393, 60.6607, 59.3393, 62.2589,
62.8380, 64.1250, 66.6429, 66.9821, 67.8750, 74.7679, 70.5192,
68.7411, 69.3036, 66.0001, 67.9733, 67.4822, 68.3393, 68.3534,
69.5740, 69.4465, 70.9465, 69.0983, 72.2679, 70.4286, 70.1493],
[61.2143, 63.0000, 69.0357, 65.3393, 62.3214, 59.8036, 56.2730,
54.5829, 52.8393, 52.8929, 50.8304, 52.9107, 66.4643, 69.6875,
71.1849, 72.2678, 73.9821, 74.4643, 73.0357, 74.1250, 75.6492,
76.2360, 75.7679, 75.6071, 75.2857, 74.9286, 74.8929, 75.1850],
[54.9439, 62.5357, 69.7143, 72.0000, 71.2500, 74.1607, 75.9987,
79.6416, 79.5179, 81.4822, 77.3214, 75.2143, 49.6071, 59.7513,
71.4350, 74.4822, 73.5000, 73.8214, 72.2322, 73.7143, 73.9822,
74.5893, 74.7322, 74.8572, 76.2947, 71.5714, 73.4822, 74.8533],
[63.4298, 61.0357, 61.6072, 59.6697, 57.8036, 59.2322, 56.5982,
57.2079, 55.3393, 56.3572, 56.5804, 58.7322, 79.7499, 73.1900,
65.2423, 75.5357, 74.5356, 75.6250, 72.5893, 74.7321, 74.6135,
75.8852, 75.6964, 75.7678, 76.4286, 74.2500, 74.7857, 76.1671],
[63.7870, 60.3750, 67.5179, 67.5446, 66.7857, 66.2857, 66.4515,
68.5089, 68.5714, 67.0714, 68.5982, 66.7678, 57.3929, 67.2806,
68.9503, 72.9286, 74.0893, 73.4911, 74.2143, 73.3393, 72.4873,
73.3916, 71.7500, 75.4821, 73.8393, 74.8750, 74.6429, 75.0906],
[72.9260, 69.0178, 67.9643, 69.2321, 67.5178, 67.3750, 66.3814,
64.8890, 63.8572, 64.9464, 66.9821, 66.3928, 63.0000, 64.7449,
74.8800, 63.5178, 72.2143, 73.2321, 74.9286, 74.5893, 71.6938,
74.8635, 73.9107, 75.5536, 75.8036, 76.2857, 76.3750, 75.2564],
[72.1160, 69.5000, 72.0000, 69.4375, 71.2500, 70.5000, 72.3392,
73.5982, 71.5000, 72.3750, 68.8750, 67.1249, 65.3750, 60.2856,
61.6427, 65.3749, 67.4999, 65.0624, 70.4999, 69.4999, 65.3124,
71.9107, 69.7499, 72.8750, 72.5625, 72.7500, 74.8750, 73.7053],
[64.3763, 64.8571, 70.4642, 66.7857, 64.3214, 65.3928, 67.4859,
68.7385, 67.8750, 67.8750, 71.0267, 72.8749, 67.5356, 59.4106,
58.7625, 70.2319, 62.5534, 65.7141, 68.1249, 69.0713, 65.2013,
72.8392, 67.1427, 71.7500, 72.8482, 72.6071, 74.4285, 74.0051],
[69.7219, 71.8214, 67.4464, 68.6518, 66.0178, 66.1071, 65.5089,
65.6964, 65.6964, 61.0714, 61.4375, 61.8214, 67.8214, 61.8762,
57.3354, 66.8749, 63.8571, 60.3302, 62.9999, 67.8214, 68.9043,
71.6365, 67.5357, 75.6250, 74.6518, 73.6071, 74.5178, 75.3877],
[72.2857, 66.2857, 63.1964, 69.2232, 68.8214, 70.2857, 68.7895,
70.2436, 70.1250, 66.8750, 69.9643, 66.0893, 52.8393, 60.3201,
52.9273, 66.8571, 58.0535, 57.3035, 63.2321, 60.1785, 59.6058,
69.9936, 69.4286, 73.4821, 72.7143, 72.8750, 72.7500, 74.0791],
[65.7334, 56.6430, 60.7143, 67.8035, 66.5178, 65.8214, 67.6760,
67.3061, 65.6964, 64.5893, 53.1430, 68.4820, 52.7676, 48.1604,
48.1311, 65.3034, 51.9640, 61.8213, 59.6605, 57.3927, 54.6974,
75.5752, 73.1250, 74.3928, 74.0446, 72.2142, 72.2857, 77.7806],
[55.4095, 60.0893, 69.7142, 66.0892, 66.8750, 65.6607, 67.1926,
66.3712, 63.0000, 56.9465, 41.6073, 48.6609, 61.8035, 39.7281,
44.9195, 61.5892, 47.5891, 62.7678, 56.9641, 55.9820, 58.1236,
70.0548, 70.3750, 69.8392, 68.1517, 72.0535, 76.5893, 65.4489],
[60.6237, 66.5714, 67.8571, 65.7232, 66.2500, 67.6250, 66.9311,
67.3303, 64.8214, 48.9644, 45.9019, 49.4108, 51.6608, 43.9259,
47.5012, 38.9642, 37.5356, 66.0000, 65.5178, 49.3392, 57.3571,
67.8252, 69.7678, 70.2143, 51.7410, 76.1607, 69.7143, 54.4056],
[61.9643, 67.2500, 66.5000, 65.6875, 66.2500, 65.0000, 65.0625,
65.5268, 63.7500, 49.8750, 50.4375, 53.1250, 38.7500, 25.3750,
43.4286, 31.1250, 35.3750, 59.7500, 63.3750, 39.5000, 51.8125,
58.6249, 69.5000, 70.1250, 48.0000, 75.8750, 48.7500, 61.4018],
[67.8915, 65.7500, 66.3035, 66.5982, 66.0357, 64.9464, 65.4643,
65.8074, 63.4643, 56.2325, 48.3306, 54.9467, 22.0715, 23.6990,
29.0955, 27.3211, 29.4997, 57.8660, 68.2321, 36.9819, 50.7715,
52.6707, 69.7143, 71.3392, 55.5534, 45.7855, 62.9463, 64.1556],
[63.8431, 66.0893, 65.3571, 65.6161, 65.0893, 64.6964, 64.3444,
65.1225, 62.9107, 57.4287, 57.3216, 54.9287, 26.4465, 30.5689,
23.2499, 23.5534, 25.1605, 55.1071, 69.4643, 41.9642, 52.6619,
59.8954, 72.0893, 79.7322, 47.2856, 64.5000, 52.9463, 81.6888],
[64.2589, 69.9643, 71.5000, 75.2857, 77.6786, 78.6429, 76.2513,
71.0089, 67.5536, 60.8929, 57.2501, 48.1072, 22.4821, 44.3316,
17.5369, 24.3928, 22.8214, 45.4821, 67.8036, 35.4821, 43.7028,
52.7806, 81.8929, 56.7321, 60.5357, 44.2321, 82.6964, 72.7500],
[63.6748, 61.8929, 58.0001, 41.7859, 47.3037, 35.2502, 40.0525,
63.9669, 76.1962, 74.6603, 67.2228, 43.3748, 19.9821, 37.0776,
15.6544, 30.9823, 22.0182, 51.0984, 65.8215, 32.5717, 49.4747,
39.5946, 49.5359, 55.7859, 40.7681, 81.7857, 76.0357, 73.2832],
[60.0192, 53.6429, 43.5359, 44.8037, 39.9287, 48.8037, 48.3241,
35.5882, 22.6071, 20.7142, 33.8838, 45.3570, 25.0714, 32.6657,
26.8559, 22.9644, 27.7324, 69.4375, 62.5001, 33.9823, 48.6047,
33.4811, 38.3930, 58.5358, 74.2857, 73.2679, 68.8572, 71.0817],
[63.2500, 63.3393, 43.1608, 50.3751, 68.6786, 69.6429, 63.9324,
65.5510, 59.6249, 54.3035, 40.5267, 20.6071, 32.1785, 31.9834,
30.0791, 20.3036, 34.1073, 71.0000, 56.2322, 48.2501, 42.9695,
37.1225, 53.7322, 68.3750, 76.2232, 72.4822, 70.6072, 72.9324],
[63.1071, 64.1250, 65.7500, 41.7500, 26.2500, 25.6250, 25.1071,
24.1339, 18.8750, 23.5000, 35.5625, 44.5000, 31.1250, 37.3393,
28.3125, 23.6250, 39.3750, 67.1875, 60.7500, 53.2500, 41.6250,
39.1339, 61.2500, 81.0000, 71.3125, 70.8750, 71.5000, 72.1339],
[67.4796, 68.1429, 68.9821, 76.4286, 75.0893, 74.6250, 73.8419,
72.7398, 58.4108, 44.3572, 33.2322, 19.8036, 32.6965, 29.7296,
28.5957, 19.8750, 42.7499, 69.9196, 66.3214, 51.9285, 43.6848,
44.9017, 64.2857, 73.2857, 71.7321, 71.4286, 73.9286, 73.5893],
[67.7080, 67.9465, 68.0358, 69.1786, 69.1071, 69.7857, 69.0650,
70.3635, 60.1247, 52.3744, 52.1690, 44.3031, 30.2678, 29.7014,
20.1314, 25.4645, 45.8042, 74.2947, 63.4110, 56.0183, 49.2722,
50.1485, 73.1251, 74.6608, 74.3036, 73.8572, 72.2322, 74.1570],
[67.5868, 68.5179, 68.1786, 66.9018, 67.3215, 67.9822, 67.2628,
65.4694, 49.2318, 43.7318, 39.5888, 47.7318, 29.2499, 28.3277,
15.6326, 30.8215, 34.2502, 64.6428, 63.3572, 63.0001, 50.1688,
51.6037, 77.5000, 75.8215, 73.7501, 74.9286, 74.3572, 74.6097]]]), 20)
torch.Size([1, 28, 28])
20
Train Epoch: 1 [0/8 (0%)] Loss: -1.982941
Test set: Average loss: 0.0000, Accuracy: 0/2 (0%)
Error information
This output is when it ran successfully with no segfault, the segfault usually occurs 4 times out of 5. When a segfault does occur, it never occurs processing the MNIST subset, it only occurs while attempting to access the dataset either at dataset[0] or whichever 1, or literally any of them, but if I run the simple print statements enough times on any of the indices I can get it to output at least once and not crash. Here is an occasion when it crashed more gracefully (outputted the tensor info and size/class, but crashed upon train:
torch.Size([1, 28, 28])
65
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
989 try:
--> 990 data = self._data_queue.get(timeout=timeout)
991 return (True, data)
9 frames
/usr/lib/python3.7/queue.py in get(self, block, timeout)
178 raise Empty
--> 179 self.not_empty.wait(remaining)
180 item = self._get()
/usr/lib/python3.7/threading.py in wait(self, timeout)
299 if timeout > 0:
--> 300 gotit = waiter.acquire(True, timeout)
301 else:
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/signal_handling.py in handler(signum, frame)
65 # Python can still get and update the process status successfully.
---> 66 _error_if_any_worker_fails()
67 if previous_handler is not None:
RuntimeError: DataLoader worker (pid 1132) is killed by signal: Segmentation fault.
The above exception was the direct cause of the following exception:
RuntimeError Traceback (most recent call last)
<ipython-input-9-02c9a53ca811> in <module>()
68
69 if __name__ == '__main__':
---> 70 main()
<ipython-input-9-02c9a53ca811> in main()
60
61 for epoch in range(1, args.epochs + 1):
---> 62 train(args, model, device, train_loader, optimizerAdadelta, epoch)
63 test(model, device, test_loader)
64 scheduler.step()
<ipython-input-6-93be0b7e297c> in train(args, model, device, train_loader, optimizer, epoch)
2 def train(args, model, device, train_loader, optimizer, epoch):
3 model.train()
----> 4 for batch_idx, (data, target) in enumerate(train_loader):
5 data, target = data.to(device), target.to(device)
6 optimizer.zero_grad()
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
519 if self._sampler_iter is None:
520 self._reset()
--> 521 data = self._next_data()
522 self._num_yielded += 1
523 if self._dataset_kind == _DatasetKind.Iterable and \
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
1184
1185 assert not self._shutdown and self._tasks_outstanding > 0
-> 1186 idx, data = self._get_data()
1187 self._tasks_outstanding -= 1
1188 if self._dataset_kind == _DatasetKind.Iterable:
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _get_data(self)
1140 elif self._pin_memory:
1141 while self._pin_memory_thread.is_alive():
-> 1142 success, data = self._try_get_data()
1143 if success:
1144 return data
/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _try_get_data(self, timeout)
1001 if len(failed_workers) > 0:
1002 pids_str = ', '.join(str(w.pid) for w in failed_workers)
-> 1003 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
1004 if isinstance(e, queue.Empty):
1005 return (False, None)
RuntimeError: DataLoader worker (pid(s) 1132) exited unexpectedly
Generally speaking however this issue appears to 'crash for an unknown reasons' and here is what my logs look like when that occurs:
What I think is going on/what I have tried
I think that there is something wrong with the tensor information and how the image is being read. I am only working with maximum 40 images at a single time so there is no reason the disk resources or RAM on Google Colab are failing. I might be normalizing the data improperly, I have tried different values but nothing has fixed it yet. Perhaps the images are corrupt?
I don't really have a solid grasp of what could be going on, otherwise, I would have already solved it. I think I provided ample resources for it to be a glaring issue for someone of expertise in the area. I put a lot of time into this post, I hope someone is able to help me get to the bottom of the problem.
If there are any other obvious issues with my code and my use of the network and custom dataset please let me know, as this is my first time working with PyTorch.
Thank you!
Additional information that I am not sure if it is relevant:
Custom dataset class:
# ------------ Custom Dataset Class ------------
class PhytoplanktonImageDataset(Dataset):
def __init__(self, annotations_file, img_dir, transform, target_transform):
self.img_labels = pd.read_csv(annotations_file) # Image name and label file loaded into img_labels
self.img_dir = img_dir # directory to find all image names
self.transform = transform # tranforms to apply to images
self.target_transform = target_transform
def __len__(self):
return len(self.img_labels) # get length of csv file
def __getitem__(self, idx):
img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
image = torchvision.io.read_image(path=img_path)
image = image.type(torch.FloatTensor)
label = self.img_labels.iloc[idx,1]
if self.transform:
image = self.transform(image)
if self.target_transform:
label = self.target_transform(label)
return image, label
NN class (only thing changed is nn.Linear() has 10 outputs for MNIST:
class NeuralNetwork(nn.Module):
def __init__(self):
super(NeuralNetwork, self).__init__()
self.flatten = nn.Flatten()
self.linear_relu_stack = nn.Sequential(
nn.Linear(28*28, 512),
nn.ReLU(),
nn.Linear(512, 512),
nn.ReLU(),
nn.Linear(512, 90),
nn.ReLU()
)
def forward(self, x):
x = self.flatten(x)
logits = self.linear_relu_stack(x)
return logits
Args used:
args = parser.parse_args(['--batch-size', '64', '--test-batch-size', '64',
'--epochs', '1', '--lr', '0.01', '--gamma', '0.7', '--seed','4',
'--log-interval', '10'])
Edit: I was able to get the following exit gracefully on one of the runs (this traceback was a ways into the getitem call):
<ipython-input-3-ae5ff8635158> in __getitem__(self, idx)
13 img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0]) # image path
14 print(img_path)
---> 15 image = torchvision.io.read_image(path=img_path) # Reading image to 1 dimensional GRAY Tensor uint between 0-255
16 image = image.type(torch.FloatTensor) # Now a FloatTensor (not a ByteTensor)
17 label = self.img_labels.iloc[idx,1] # getting label from csv
/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in read_image(path, mode)
258 """
259 data = read_file(path)
--> 260 return decode_image(data, mode)
/usr/local/lib/python3.7/dist-packages/torchvision/io/image.py in decode_image(input, mode)
237 output (Tensor[image_channels, image_height, image_width])
238 """
--> 239 output = torch.ops.image.decode_image(input, mode.value)
240 return output
241
RuntimeError: Internal error.
Here is the image path being printed just before the decoding fails: /content/gdrive/My Drive/Colab Notebooks/all_images/sample_10/D20190926T145532_IFCB122_00013.png and here is what that image looks like: image
Information about this image:
Color Model: Gray
Depth: 16
Pixel Height: 50
Pixel Width: 80
Image DPI: 72 pixels per inch
file size: 3,557 bytes
I suggest to take a look at you num workers param in your dataloader. If you have a num_workers param that is too high it may be causing this error. Therefore, I suggest to lower it to zero or until you don't get this error.
Sarthak