I am wanting to convert dataframe row values into column headers, but I would only want one column of that same name. Then I want the row values under each new column to be the same as the column header. I have tried looking at get_dummies, pivot, pivot_table, group_by. Any help would be much appreciated.
Data:
import pandas as pd
In [1]: df = pd.DataFrame([['foo', 'bar', 'ham'], ['baz', 'foo', 'bar'], ['ham', 'baz', 'egg']], columns=['A', 'B', 'C'])
In [2]: df
Out[2]:
A B C
0 foo bar ham
1 baz foo bar
2 ham baz egg
I want this:
In [3]:
Out[3]:
foo bar baz ham egg
0 foo bar ham
1 foo bar baz
2 baz ham egg
Thank you.
The idea is to convert the DataFrame to the long form and then pivot:
df = df.reset_index()
df = pd.melt(df, id_vars=['index']).pivot(index = 'index', values='variable', columns='value')
for c in df.columns:
df.loc[df[c].notna(), c] = c
value bar baz egg foo ham
index
0 bar NaN NaN foo ham
1 bar baz NaN foo NaN
2 NaN baz egg NaN ham
Detailed explanation:
melt
would convert the DataFrame to the following form:
index variable value
0 0 A foo
1 1 A baz
2 2 A ham
3 0 B bar
4 1 B foo
5 2 B baz
6 0 C ham
7 1 C bar
8 2 C egg
Then use pivot to make the columns to be all the unique values:
value bar baz egg foo ham
index
0 B NaN NaN A C
1 C A NaN B NaN
2 NaN B C NaN A
Then simply replace all non-na columns to be the column name.