I am trying to make 1d Dataset from a pandas data frame, however, the output is weird.
I wrote the code to convert dataset from pandas dataframe: size is 8000x512,
# create dataset
class carte_dataset(Dataset):
def __init__(self,root):
self.root = root
self.df = pd.read_csv(root,index_col=0)
self.X = torch.tensor(self.df.iloc[:,1:].values)
self.regi_no = self.df.iloc[:,0].values
def __len__(self):
return len(self.regi_no)
def __getitem__(self,idx):
return self.X[idx],self.regi_no[idx]
Then, I confirmed the tensor size
dataset = carte_dataset(root)
data,_ = dataset.__getitem__(0)
I expected the size was torch.Size([1,512])
, but the shape was torch.Size([512])
Is the way to make 1d dataset from the pandas dataframe appropriate? Also, if this way is incorrect, how I should revise this code?
What you need to do is to wrap the dataset with the dataloader which will have the effect of
retrieving the individual element tuple pairs from the underlying dataset: self.X[idx], self.regi_no[idx]
, shaped (512,)
and (1,)
and collating them to form two batches of input/labels shaped (bs, 512)
and bs, 1)
where bs
is the batch size.
The standard dataloader utility in PyTorch is torch.utils.data.DataLoader
>>> dataloader = data.DataLoader(dataset, batch_size=1, shuffle=False)
Then you can iterate through the dataset via the dataloader:
>>> for x, y in dataloader:
... # x shaped (1, 512), corresponds to [X[0]]
... # y shaped (1, 1), corresponds to [regi_no[0]]