I have a, maybe small problem but I am stuck for quite a while now. Hope someone can help me with that. I am currently on a Kddcup99 dataset which I like to train via DeepLearning (CNN Network)
I have a "Dataset" Class which includes the Panda Dataframe. Thus i split up into normal and validate dataset. So far, no problem. I load it into a Numpy vector, torch it to Tensor and then direct it to the DataLoader.
The Dataset Class has these two important classes for iterating through:
def __len__(self):
return len(self.val_df)
def __getitem__(self, index):
img, target = self.val_df[index][:-1], self.val_df[index][-1]
return img, target, index
Not in the class is the DataLoader string:
test_dataloader = DataLoader(datat.val_df, batch_size=10, shuffle=True)
In my Trainer Class i have a for loop which should iterate through the Dataloader:
with torch.no_grad():
for data in dataloader:
inputs, labels, idx = data
inputs = inputs.to(self.device)
But it won't. I can't access the labels, index and such.
My question is now: Why? How can I access Labels, Index from the given Dataset via the Dataloader?
Thank you all for your help! Much appreciate it.
The first argument to DataLoader
is the dataset from which you want to load the data, that's usually a Dataset
, but it's not restricted to any instance of Dataset
. As long as it defines the length (__len__
) and can be indexed (__getitem__
allows that) it is acceptable.
You are passing datat.val_df
to the DataLoader
, which is presumably a NumPy array. A NumPy array has a length and can be indexed, so it can be used in the DataLoader
. Since you pass that array directly, your dataset's __getitem__
is never called, but the array itself is indexed, so every item is just data.val_df[index]
.
Instead of using the underlying data for the DataLoader
, you have to use the dataset itself (datat
):
test_dataloader = DataLoader(datat, batch_size=10, shuffle=True)