My pandas dataframe has a column(series) with comma separated string values looking like this:
col_1,col_2,col_3
abc,123,49
bcd,234,"39,48"
I want to convert this col_3 into a list of integers. I tried
df["col_3"]=[[i] for i in df["col_3"]]
But I end up getting this result
col_1,col_2,col_3
abc,123,['49']
bcd,234,"['39,48']"
The desired output is
col_1,col_2,col_3
abc,123,[49]
bcd,234,[39,48]
I also want it to be fast because I would be using this for batches of 100k rows. Can some one suggest a solution . TIA
col_1 col_2 col_3
0 F18 9994 495
1 C35 9995 390,483
3 F60 9998 497,468
You can use pandas.Series.str.split
with a list comprehension :
df['col_3'] = [[int(e) for e in x.split(",")] for x in df['col_3']]
print(df)
col_1 col_2 col_3
0 abc 123 [49]
1 bcd 234 [39, 48]
print(type(df.loc[0, 'col_3'][0]))
int
If you need to save your df
to a .csv
, there will be always double quotes between all the lists of integers (with a length > 1) since the elements of a list is separated by a comma
and the default separator of pandas.DataFrame.to_csv
is the comma
as well. Double quotes are there to escape the commas inside these lists. So, to get rid of the double quotes, you have to choose another separator like the tab
for example :
df.to_csv(r'path_to_your_new_csv', sep='\t')