Search code examples
pythondataframeprefixdummy-variable

How to add prefix to column name according to data in another column


There is a dataframe like bellow

import pandas as pd
  
data = {‘ID': [1, 2, 3, 4, 5, 6, 7, 8],
‘LABEL': [’text', ‘logo', ‘logo', ‘person’,’text’,’text’,’person’,’logo'],
        ‘cluster_label': [c_0, c_0, c_0, c_1, c_1, c_2, c_2, c_3]}
df = pd.DataFrame(data)

I want to make dummy columns for the “cluster_label” column

pd.get_dummies(df,columns=[‘cluster_label'])

however I need to add a prefix regraded to the LABEL column.

Basically, the columns must be text_c_0, logo_c_0, … How can I do that

Many thanx in advance


Solution

  • Try this:

    import pandas as pd
    
    data = {
        'ID': [1, 2, 3, 4, 5, 6, 7, 8],
        'LABEL': ['text', 'logo', 'logo', 'person', 'text', 'text', 'person', 'logo'],
        'cluster_label': ['c_0', 'c_0', 'c_0', 'c_1', 'c_1', 'c_2', 'c_2', 'c_3']
    }
    
    df = pd.DataFrame(data)
    
    pd.get_dummies(df,columns=['cluster_label'])
    
    
    
    df['dummy'] = df.apply (lambda row: row['LABEL']+'_'+row['cluster_label'], axis=1)
    
    pd.get_dummies(df['dummy'])
    
    ## If you want to keep ['ID','LABEL','cluster_label'] in your df :
    df = df.join(pd.get_dummies(df['dummy']))