Search code examples
pythondataframepytorchtorch

Convert list of two dimensional DataFrame to Torch Tensor


Goal: I am working with RNNs in PyTorch, and my data is given by a list of DataFrames, where each DataFrame means one observation like:

import numpy as np
data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]

which means 100 observation, with 50 parameters and 5 timesteps each. For my Model i need a tensor of shape (100,5,50).

Issue: I tried a lot of things but nothing seems to work, does anyone know how this is done? This approaches doesn't work:

import torch
torch.tensor(np.array(data))

I thing the problem is to convert the DataFrames into Arrays and the List into a Tensor at the same time.


Solution

  • I don't think you can convert the list of dataframes in a single command, but you can convert the list of dataframes into a list of tensors and then concatenate the list.

    E.g.

    import pandas as pd
    import numpy as np
    import torch
    
    data = [pd.DataFrame(np.zeros((5,50))) for x in range(100)]
    
    list_of_arrays = [np.array(df) for df in data]
    torch.tensor(np.stack(list_of_arrays))
    
    #or
    
    list_of_tensors = [torch.tensor(np.array(df)) for df in data]
    torch.stack(list_of_tensors)