Search code examples
pythonpandasdataframeheaderunique

How can I create column headers from row values in a slice of columns resulting in the column row values matching the column headers?


I am wanting to convert dataframe row values into column headers, but I would only want one column of that same name. Then I want the row values under each new column to be the same as the column header. I have tried looking at get_dummies, pivot, pivot_table, group_by. Any help would be much appreciated.

Data:

import pandas as pd

In [1]: df = pd.DataFrame([['foo', 'bar', 'ham'], ['baz', 'foo', 'bar'], ['ham', 'baz', 'egg']], columns=['A', 'B', 'C'])


In [2]: df
Out[2]: 
     A    B    C
0  foo  bar  ham
1  baz  foo  bar
2  ham  baz  egg

I want this:

In [3]: 
Out[3]: 
   foo  bar  baz  ham  egg
0  foo  bar       ham
1  foo  bar  baz  
2            baz  ham  egg

Thank you.


Solution

  • The idea is to convert the DataFrame to the long form and then pivot:

    df = df.reset_index()
    df = pd.melt(df, id_vars=['index']).pivot(index = 'index', values='variable', columns='value')
    for c in df.columns:
        df.loc[df[c].notna(), c] = c
    
    value   bar baz egg foo ham
    index                   
    0       bar NaN NaN foo ham
    1       bar baz NaN foo NaN
    2       NaN baz egg NaN ham
    

    Detailed explanation:

    melt would convert the DataFrame to the following form:

       index    variable    value
    0   0   A   foo
    1   1   A   baz
    2   2   A   ham
    3   0   B   bar
    4   1   B   foo
    5   2   B   baz
    6   0   C   ham
    7   1   C   bar
    8   2   C   egg
    

    Then use pivot to make the columns to be all the unique values:

    value   bar baz egg foo ham
    index                   
    0   B   NaN NaN A   C
    1   C   A   NaN B   NaN
    2   NaN B   C   NaN A
    

    Then simply replace all non-na columns to be the column name.