Search code examples
pythonpandasnumpydata-analysis

How do I pivot my dataframe multiple times in pandas while creating a new column merging multiple columns?


I find this a rather complex challenge, as I need to merge multiple columns in a dataframe together and then pivot the table multiple times (I think).

So the provided input is this dataframe which I have filtered out:

                       name      X      Y
0      Mathematics House AB  0.123  0.111
2111   Physical Science Hut  0.124  0.112
9412   Primary Education LO  0.125  0.113
1234   Tertiary Universitas  0.126  0.114
12411  Principle of Physics  0.127  0.115
12373  Chemical Industry A1  0.128  0.116

and the output should be something that looks like this:

                  label  Z      
   Mathematics House AB  X  0.123
                         Y  0.111
   Physical Science Hut  X  0.124
                         Y  0.112
   Primary Education LO  X  0.125
                         Y  0.113
   Tertiary Universitas  X  0.126
                         Y  0.114
   Principle of Physics  X  0.127
                         Y  0.115
   Chemical Industry A1  X  0.128
                         Y  0.116

Where Z stands for a new column that has yet to be created. I'm currently using a very hacky technique that takes some columns as numpy arrays and try to reconstruct it. And the results aren't pretty and what is expected. Is there a way to manipulate the dataframe directly without using numpy? It seems like a tool for pivoting multiple times. The current method I have is df.pivot(index='name', columns='Z').T.unstack().T, where I previously let df['Z'] = '' -- very ugly and hacky, indeed, and it doesn't get what I want to present in.


Solution

  • This is stack not pivot

    df.set_index('name').stack()
    Out[186]: 
    name                  
    MathematicsHouseAB   X    0.123
                         Y    0.111
    PhysicalScienceHut   X    0.124
                         Y    0.112
    PrimaryEducationLO   X    0.125
                         Y    0.113
    TertiaryUniversitas  X    0.126
                         Y    0.114
    PrincipleofPhysics   X    0.127
                         Y    0.115
    ChemicalIndustryA1   X    0.128
                         Y    0.116
    dtype: float64
    

    EDIT:

    df=df.set_index('name').stack()
    df.index.names=['name', 'Z']
    df
    Out[263]: 
                               0
    name                Z       
    MathematicsHouseAB  X  0.123
                        Y  0.111
    PhysicalScienceHut  X  0.124
                        Y  0.112
    PrimaryEducationLO  X  0.125
                        Y  0.113
    TertiaryUniversitas X  0.126
                        Y  0.114
    PrincipleofPhysics  X  0.127
                        Y  0.115
    ChemicalIndustryA1  X  0.128
                        Y  0.116