Search code examples
pythonpandasdataframegroup-bypandas-melt

Using stack, melt, pivot and other operators together to convert a messy dataframe to an organized one


I'm fairly new to Pandas and often find myself stuck in situations that I have no solution to them rather than asking for help. I have read so many tutorials and checked many Stackoverflow topics but none of them asnwered mine since it is very difficult to search for. Would some please help me?

I have an excel file that I read as df. The df is something like this (I simplified it):

pd.DataFrame({'date': ['type', 'model', '2020', '2021', '2022', '2023'],
              'factor1': ['type_1', 'model_1', '1', '2', '3', '4'],
              'factor2': ['type_2', 'model_1', '5', '6', '7', '8'],
              'factor3': ['type_3', 'model_2', '9', '10', '11', '12']
             })

The result is:

    date  factor1  factor2  factor3
0   type   type_1   type_2   type_3
1  model  model_1  model_1  model_2
2   2020        1        5        9
3   2021        2        6       10
4   2022        3        7       11
5   2023        4        8       12

and the original data in the excel file that I based my dataframe definition off of is: Original excel


What I want to produce is something like this:

                     2020  2021  2022  2023
0  model_1   type_1     1     2     3     4
1            type_2     5     6     7     8
2  model_2   type_3     9    10    11    12

I tried any combo of melt, pivot and stack but the problem is for instance 'type' is horizontal but vertically listed under date! When melting, it looks at 'type' as a 'date' which is clearly wrong. Any help would be appreciated! Thanks!


Solution

  • I tried any combo of melt, pivot and stack..

    Your forgot about T :

    tmp = df.T # or df.transpose()
    
    out = (
        tmp.iloc[1:]
            .set_axis(tmp.iloc[0], axis=1)
            .set_index(["model", "type"])
            .rename_axis(index=[None]*2, columns=None) # optional
    )
    

    Output :

    print(out)
    
                   2020 2021 2022 2023
    model_1 type_1    1    2    3    4
            type_2    5    6    7    8
    model_2 type_3    9   10   11   12