I am beginner pytorch user, and I am trying to use dataloader.
Actually, I am trying to implement this into my network but it takes a very long time to load. And so, I debugged my network to see if the network itself has the problem, but it turns out it has something to with my dataloader class. Here is the code:
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
class DiabetesDataset(Dataset):
def __init__(self, csv):
self.xy = pd.read_csv(csv)
def __len__(self):
return len(self.xy)
def __getitem__(self, index):
self.x_data = torch.Tensor(xy.iloc[:, 0:-1].values)
self.y_data = torch.Tensor(xy.iloc[:, [-1]].values)
return self.x_data[index], self.y_data[index]
dataset = DiabetesDataset("trial.csv")
train_loader = DataLoader(dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=2)`
for a in train_loader:
print(a)
To verify that the dataloader causes all the delay, I created a dummy csv file with 2 columns of 1s and 2s, for a total of 10 samples for each columns. Then, I looped over the train_loader object, it has been more than 1 hr and it is still running, considering that the sample size is small and batch size is set to 1.
I am not sure as to what the error to my code is and it is causing this issue.
Any comments/inputs are greatly appreciated!
There are some bugs in your code - could you check if this works (it is working on my computer with your toy example):
from torch.utils.data import Dataset, DataLoader
import numpy as np
import pandas as pd
import torch
class DiabetesDataset(Dataset):
def __init__(self, csv):
self.xy = pd.read_csv(csv)
def __len__(self):
return len(self.xy)
def __getitem__(self, index):
x_data = torch.Tensor(self.xy.iloc[:, 0:-1].values)
y_data = torch.Tensor(self.xy.iloc[:, [-1]].values)
return x_data[index], y_data[index]
dataset = DiabetesDataset("trial.csv")
train_loader = DataLoader(
dataset=dataset,
batch_size=1,
shuffle=True,
num_workers=2)
if __name__ == '__main__':
for a in train_loader:
print(a)
Edit: Your code is not working because you are missing a self
in the __getitem__
method (self.xy.iloc...) and because you do not have a if __name__ == '__main__
at the end of your script. For the second error, see RuntimeError on windows trying python multiprocessing