Search code examples
python-3.xpandaslistdataframeseries

Convert comma separated string type series into a list of integers pandas


input csv file screenshot

My pandas dataframe has a column(series) with comma separated string values looking like this:

col_1,col_2,col_3
abc,123,49
bcd,234,"39,48"

I want to convert this col_3 into a list of integers. I tried

df["col_3"]=[[i] for i in df["col_3"]]

But I end up getting this result

col_1,col_2,col_3
abc,123,['49']
bcd,234,"['39,48']"

The desired output is

col_1,col_2,col_3
abc,123,[49]
bcd,234,[39,48]

I also want it to be fast because I would be using this for batches of 100k rows. Can some one suggest a solution . TIA

    col_1       col_2    col_3
0   F18  9994      495
1   C35  9995  390,483
3   F60  9998  497,468

Solution

  • You can use pandas.Series.str.split with a list comprehension :

    df['col_3'] = [[int(e) for e in x.split(",")] for x in df['col_3']]
    

    # Output :

    print(df)
    
      col_1  col_2     col_3
    0   abc    123      [49]
    1   bcd    234  [39, 48]
    
    print(type(df.loc[0, 'col_3'][0]))
    int
    

    # Edit :

    If you need to save your df to a .csv, there will be always double quotes between all the lists of integers (with a length > 1) since the elements of a list is separated by a comma and the default separator of pandas.DataFrame.to_csv is the comma as well. Double quotes are there to escape the commas inside these lists. So, to get rid of the double quotes, you have to choose another separator like the tab for example :

    df.to_csv(r'path_to_your_new_csv', sep='\t')