Search code examples
pythonpandasiterationindices

Iterating overs select cells in pandas DataFrame and replacing a value


I have a pandas DataFrame that looks like the following example:

      tags      tag1      tag2      tag3
0     [a,b,c]     0         0         0
1     [a,b]       0         0         0
2     [b,d]       0         0         0
...
n     [a,b,d]     0         0         0

I want to encade the tags as 1s in the rows for tag1, tag2, tag3 if they are present in the tags array for that row index.

However, I can't quite figure out to iterate over properly; my idea so far is as follows:

for i, row in dataset.iterrows():
    for tag in row[0]:
        for column in range (1,4):
            if dataset.iloc[:,column].index == tag:
                dataset.set_value(i, column, 1)

However, upon returning the dataset from this method, the columns are still all at 0 value.

Thank you!


Solution

  • It seems you need:


    df1 = df['tags'].astype(str).str.strip('[]').str.get_dummies(', ')
    print (df1)
       'a'  'b'  'c'  'd'
    0    1    1    1    0
    1    1    1    0    0
    2    0    1    0    1
    3    1    1    0    1
    

    Last add df1 to original DataFrame by concat:

    df = pd.concat([df,df1], axis=1)
    print (df)
            tags  tag1  tag2  tag3  'a'  'b'  'c'  'd'
    0  [a, b, c]     0     0     0    1    1    1    0
    1     [a, b]     0     0     0    1    1    0    0
    2     [b, d]     0     0     0    0    1    0    1
    3  [a, b, d]     0     0     0    1    1    0    1