Search code examples
pythonpandaspivot-tablemelt

Pivot table to "tidy" data frame in Pandas


I have an array of numbers (I think the format makes it a pivot table) that I want to turn into a "tidy" data frame. For example, I start with variable 1 down the left, variable 2 across the top, and the value of interest in the middle, something like this:

  X Y
A 1 2
B 3 4

I want to turn that into a tidy data frame like this:

V1 V2 value
A  X    1
A  Y    2
B  X    3
B  Y    4

The row and column order don't matter to me, so the following is totally acceptable:

value V1 V2
  2    A  Y
  4    B  Y
  3    B  X
  1    A  X

For my first go at this, which was able to get me the correct final answer, I looped over the rows and columns. This was terribly slow, and I suspected that some machinery in Pandas would make it go faster.

It seems that melt is close to the magic I seek, but it doesn't get me all the way there. That first array turns into this:

   V2 value
0  X    1
1  X    2
2  Y    3
3  Y    4

It gets rid of my V1 variable!

Nothing is special about melt, so I will be happy to read answers that use other approaches, particularly if melt is not much faster than my nested loops and another solution is. Nonetheless, how can I go from that array to the kind of tidy data frame I want as the output?

Example dataframe:

df = pd.DataFrame({"X":[1,3], "Y":[2,4]},index=["A","B"])

Solution

  • Use DataFrame.reset_index with DataFrame.rename_axis and then DataFrame.melt. If you want order columns we could use DataFrame.reindex.

    new_df = (df.rename_axis(index = 'V1')
                .reset_index()
                .melt('V1',var_name='V2')
                .reindex(columns = ['value','V1','V2']))
    print(new_df)
    

    Another approach DataFrame.stack:

    new_df = (df.stack()
                .rename_axis(index = ['V1','V2'])
                .rename('value')
                .reset_index()
                .reindex(columns = ['value','V1','V2']))
    print(new_df)
    

       value V1 V2
    0      1  A  X
    1      3  B  X
    2      2  A  Y
    3      4  B  Y
    

    to names names there is another alternative like commenting @Scott Boston in the comments