Pytorch: How to prepare 1d dataset from pandas dataframe?

I am trying to make 1d Dataset from a pandas data frame, however, the output is weird.

I wrote the code to convert dataset from pandas dataframe: size is 8000x512,

# create dataset
class carte_dataset(Dataset):
    def __init__(self,root):
        self.root = root
        self.df = pd.read_csv(root,index_col=0)
        self.X = torch.tensor(self.df.iloc[:,1:].values)
        self.regi_no =  self.df.iloc[:,0].values
        
    def __len__(self):
        return len(self.regi_no)

    def __getitem__(self,idx):
        return self.X[idx],self.regi_no[idx]

Then, I confirmed the tensor size

dataset = carte_dataset(root)    
data,_ = dataset.__getitem__(0)
data.size()

I expected the size was torch.Size([1,512]), but the shape was torch.Size([512]).

Is the way to make 1d dataset from the pandas dataframe appropriate? Also, if this way is incorrect, how I should revise this code?

Solution

What you need to do is to wrap the dataset with the dataloader which will have the effect of

retrieving the individual element tuple pairs from the underlying dataset: self.X[idx], self.regi_no[idx], shaped (512,) and (1,) respectively.
and collating them to form two batches of input/labels shaped (bs, 512) and bs, 1) where bs is the batch size.

The standard dataloader utility in PyTorch is torch.utils.data.DataLoader:

>>> dataloader = data.DataLoader(dataset, batch_size=1, shuffle=False)

Then you can iterate through the dataset via the dataloader:

>>> for x, y in dataloader:
...     # x shaped (1, 512), corresponds to [X[0]]
...     # y shaped (1, 1), corresponds to [regi_no[0]]