Search code examples
pandaspytorchtypeerrortensor

Converting a pandas dataframe to a pytorch tensor


I am trying to convert a pandas dataframe read from a CSV file to a pytorch tensor, but am getting a type error.

I tried doing this:

    df = pandas.DataFrame({"spam": [1, 2, 3, 4], "eggs": [5, 6, 7, 8], "ham": [9, 10, 11, 12]})
    print(type(df))
    t = torch.from_numpy(df.values)

    dataframe = pandas.read_csv('dataset.csv')
    print(type(dataframe))
    tens = torch.from_numpy(dataframe.values)

This works perfectly for df, but throws a type error for dataframe

TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.

Both the types are exactly the same

<class 'pandas.core.frame.DataFrame'>

What could be going wrong?


Solution

  • This issue usually arises when your DataFrame contains non-numeric or mixed types. The .values attribute returns a NumPy array, but PyTorch expects a specific type.

    1. Check DataFrame dtypes: print(dataframe.dtypes). Make sure all are numeric.
    2. Convert non-numeric columns: dataframe = dataframe.astype(float) or selectively convert columns.

    Try something like this:

    # For specific columns
    dataframe['some_column'] = dataframe['some_column'].astype(float)
    
    # For all columns
    dataframe = dataframe.astype(float)
    
    # Then convert to tensor
    tens = torch.from_numpy(dataframe.values)
    

    Make sure the conversion is meaningful for your application.