Search code examples
pythonpandasdataframepivotpivot-table

Pandas table reshape puzzle - list all items and fill in blanks with NaN or 0


I have a table I'm trying to reshape in a particular way.

import pandas as pd

df = pd.DataFrame({'race': ['one', 'one', 'one', 'two', 'two', 'two'],
                'type': ['D', 'K', 'G', 'D', 'D', 'K'],
                'item': ['x', 'y', 'z', 'q', 'x', 'y'],
                'level': [1, 2, 1, 6, 2, 3]})
df

The resulting dataframe:

    race    type    item    level
0   one     D       x       1
1   one     K       y       2
2   one     G       z       1
3   two     D       q       6
4   two     D       x       2
5   two     K       y       3

I would like to reshape it in this format:

        D               K               G   
        item    level   item    level   item    level
race
one     x       1       y       2       z       1
two     q       6       y       3       NaN     NaN
two     x       2       NaN     NaN     NaN     NaN
  • My goal is simply to lay out the information in a different format for a human to read.
  • There is no data aggregation.
  • item is unique within race but it can appear in multiple different races.
  • The tricky part is the race column or index must expand to fit the number of items within the race. In the example above, there are two 'D' items in race 'two' so race 'two' repeats twice in 2 rows to accommodate both items. If there were 5 'K' items in race 'two', race 'two' would need to repeat 5 times.
  • The number of 'D' 'K' 'G' items in each race is random and they are not related to each other. When there is not an item available, that cell is filled with 'NaN' or 0.

How can I achieve my desired table shape?

I've already tried:

df.pivot(index='race', columns='type', values=['level', 'item'])

which gives error:

ValueError: Index contains duplicate entries, cannot reshape

Is there another way with pd.pivot, pd.groupby, pd.pivot_table, or pd.crosstab or another pandas or dataframe method that can work?


Solution

  • You must first deduplicate by race/type using groupby.cumcount:

    (df.assign(n=df.groupby(['race', 'type']).cumcount())
       .pivot(index=('race', 'n'), columns='type', values=['level', 'item'])
       .sort_index(level='type', axis=1).swaplevel(axis=1)
     )
    

    Output:

    type      D          G          K      
           item level item level item level
    race n                                 
    one  0    x     1    z     1    y     2
    two  0    q     6  NaN   NaN    y     3
         1    x     2  NaN   NaN  NaN   NaN