Search code examples
pythonpandas

Rearrange and encode columns in pandas


i have data structured like this (working with pandas):

ID|comp_1_name|comp_1_percentage|comp_2_name|comp_2_percentage|
 1| name_1    | 13              | name_2    | 33              |
 2| name_3    | 15              | name_1    | 46              |

There are six comp_name/comp_percentage couple. Names are not equally distributed in all six name columns. I would like to obtain that kind of trasformation:

ID|name_1|name_2|name_3|
 1| 13   | 33   |  0   |      
 2| 46   | 0    |  15  |

I tried transposing (.T) both the entire dataframe and isolating the comp_name, comp_percentage couples, but to no avail.


Solution

  • You can try using pd.wide_to_long with a little column renaming and shaping dataframe:

    # Renaming columns to move name and percentage to the front for pd.wide_to_long
    dfr = df.rename(columns=lambda x: '_'.join(x.rsplit('_', 1)[::-1]))
    
    (pd.wide_to_long(dfr, ['name_', 'percentage_'], 'ID', 'No', suffix='.*')
      .reset_index()
      .pivot(index='ID', columns='name_', values='percentage_')
      .fillna(0)
      .reset_index()
      .rename_axis(None, axis=1)
      .astype(int))
    

    Output:

       ID  name_1  name_2  name_3
    0   1      13      33       0
    1   2      46       0      15