Search code examples
pythonpandasdataframeseries

Split data into columns in pandas


I have a df as

name category   dummy
USA   fx,ft,fe   1
INDIA fx         13

I need to convert this as

name category_fx categoty_ft category_fe  dummy 
USA  True         True        True         1
INDIA True        False       False        13

tried with series.explode() function but not getting this output.


Solution

  • Use Series.str.get_dummies by column category with converting 0,1 to boolean by DataFrame.astype and DataFrame.add_prefix:

    c = df.columns.difference(['category'], sort=False).tolist()
    df = (df.set_index(c)['category']
            .str.get_dummies(',')
            .astype(bool)
            .add_prefix('category_')
            .reset_index())
    print (df)
        name  category_fe  category_ft  category_fx
    0    USA         True         True         True
    1  INDIA        False        False         True
    

    EDIT: If need replace one column by multiple columns you can use:

    df1 = (df['category']
            .str.get_dummies(',')
            .astype(bool)
            .add_prefix('category_'))
    
    pos = df.columns.get_loc('category')
    df = pd.concat([df.iloc[:, :pos], df1, df.iloc[:, pos+1:]], axis=1)
    print (df)
        name  category_fe  category_ft  category_fx  dummy
    0    USA         True         True         True      1
    1  INDIA        False        False         True     13
    

    This solution is modifid for multiple columns:

    print (df)
        name  category  dummy category1
    0    USA  fx,ft,fe      1       a,f
    1  INDIA        fx     13       s,a
    
    cols = ['category','category1']
    
    dfs = [(df[c].str.get_dummies(',').astype(bool).add_prefix(f'{c}_')) for c in cols]
    
    df = pd.concat([df, *dfs], axis=1).drop(cols, axis=1)
    print (df)
        name  dummy  category_fe  category_ft  category_fx  category1_a  \
    0    USA      1         True         True         True         True   
    1  INDIA     13        False        False         True         True   
    
       category1_f  category1_s  
    0         True        False  
    1        False         True