I have a pandas DataFrame that looks like the following example:
tags tag1 tag2 tag3
0 [a,b,c] 0 0 0
1 [a,b] 0 0 0
2 [b,d] 0 0 0
...
n [a,b,d] 0 0 0
I want to encade the tags
as 1s in the rows for tag1, tag2, tag3
if they are present in the tags
array for that row index.
However, I can't quite figure out to iterate over properly; my idea so far is as follows:
for i, row in dataset.iterrows():
for tag in row[0]:
for column in range (1,4):
if dataset.iloc[:,column].index == tag:
dataset.set_value(i, column, 1)
However, upon returning the dataset from this method, the columns are still all at 0 value.
Thank you!
It seems you need:
astype
for convert column if contains lists to strings str.strip
for remove []
str.get_dummies
df1 = df['tags'].astype(str).str.strip('[]').str.get_dummies(', ')
print (df1)
'a' 'b' 'c' 'd'
0 1 1 1 0
1 1 1 0 0
2 0 1 0 1
3 1 1 0 1
Last add df1
to original DataFrame
by concat
:
df = pd.concat([df,df1], axis=1)
print (df)
tags tag1 tag2 tag3 'a' 'b' 'c' 'd'
0 [a, b, c] 0 0 0 1 1 1 0
1 [a, b] 0 0 0 1 1 0 0
2 [b, d] 0 0 0 0 1 0 1
3 [a, b, d] 0 0 0 1 1 0 1