Search code examples
pythonpandasdataframebert-language-model

Convert column of lists to integer


Trying to convert after encoding to integers but they are objects so i first turn them into strings

train_df["labels"] = train_df["labels"].astype(str).astype(int)

I am getting this error

invalid literal for int() with base 10: '[0, 1, 0, 0]

An example of a row from the dataset is

text                        labels
[word1,word2,word3,word4]    [1,0,1,0]

Solution

  • It's because after train_df["labels"].astype(str), this Series became a Series of lists, so you can't convert a list into type int.

    If each element in train_df["labels"] is of type list, you can do:

    train_df["labels"].apply(lambda x: [int(el) for el in x])
    

    If it's of type str, you can do:

    train_df["labels"].apply(lambda x: [int(el) for el in x.strip("[]").split(",")])
    

    You presumably you want to train some model but you can't use pd.Series of lists to do it. You'll need to convert this into a DataFrame. I can't say how to do that without looking at more than 1 line of data.