I have a dataframe:
df1=pd.DataFrame({
'ID':[101,102],
'Name':['Axel','Bob'],
'US':['GrA','GrC'],
'Europe':['GrB','GrD'],
'AsiaPac':['GrZ','GrF']
})
Which I want to change to this:
df2=pd.DataFrame({
'ID':[101,101,101,102,102,102],
'Name':['Axel','Axel','Axel','Bob','Bob','Bob'],
'Region':['US','Europe','AsiaPac','US','Europe','AsiaPac'],
'Group':['GrA','GrB','GrZ','GrC','GrD','GrF']
})
How do I do it? There is a crosstab function in pandas but it doesn't do this. In Qlik I would simply do
Crosstable(Region,Group,2)
LOAD
ID,
Name,
US,
Europe,
AsiaPac
And I would go from df1 to df2. How can I do this in python (pandas or otherwise)?
This is essentially reshaping your data from a wide format to a long format, as it's known in R parlance. In pandas, you can do this with pd.melt
:
pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group')
# ID Name Region Group
# 0 101 Axel AsiaPac GrZ
# 1 102 Bob AsiaPac GrF
# 2 101 Axel Europe GrB
# 3 102 Bob Europe GrD
# 4 101 Axel US GrA
# 5 102 Bob US GrC
If you need your columns sorted on ID
or Name
and Group
, as in your example output, you can add .sort_values()
to the expression:
pd.melt(df1, id_vars=['ID', 'Name'], var_name='Region', value_name='Group').sort_values(['ID', 'Group'])
# ID Name Region Group
# 4 101 Axel US GrA
# 2 101 Axel Europe GrB
# 0 101 Axel AsiaPac GrZ
# 5 102 Bob US GrC
# 3 102 Bob Europe GrD
# 1 102 Bob AsiaPac GrF