Search code examples
pandascategorical-datadata-processing

categorical variables to binary variables


I have a DataFrame that looks like this : initial dataframe

I have different tags in the 'Concepts_clean' column and I want to automatically fill the other ones like so : resulting dataframe

For example: fourth row, column 'Concepts_clean" I have ['Accueil Amabilité', 'Tarifs'] then I want to fill the columns 'Accueil Amabilité' and 'Tarifs' with ones and all the others with zeros.

What is the most effective way to do it?

Thank you


Solution

  • It's more of a n-hot encoding problem -

    >>> def change_df(x):
    ...  for i in x['Concepts_clean'].replace('[','').replace(']','').split(','):
    ...   x[i.strip()] = 1
    ...  return x
    ...
    >>> df.apply(change_df, axis=1)
    

    Example Output

    Concepts_clean          Ecoute  Informations  Tarifs
    [Tarifs]                 0.0           0.0     1.0
    []                       0.0           0.0     0.0
    [Ecoute]                 1.0           0.0     0.0
    [Tarifs, Informations]   0.0           1.0     1.0