Search code examples
pandasdataframevectorization

Manipulating a given DataFrame in order to recreate it in a different structure, Pandas Python


Left DataFrame is the given one, and i want to recreate it to the right DataFrame.

Hi, So suppose i have a given DataFrame (the left one), and i want to create a new dataframe (the right one). I created the new DataFrame with the indexes and columns of the right one, and now i want to "fill" the cells. Any ideas how can i do it simplest, with priority of vectorize way? Thank you in advance!

I did it with loops not "classic" way. I would like to write it with apply method or another smart solution.

edit: this is an approcah that i tried:

This is the original DataFrame with filled data 1

This is the DataFrame i want to reach 2

This is a step i tried to do in order to reach the solution3

the output should be (also in context for Scot's answer): in 4

Thank you in advance!


Solution

  • I am little confused by your question, but I will attempt an answer:

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame(data=np.arange(1, 76).reshape(-1,5, order='F'), 
                      index=pd.MultiIndex.from_product([[1,2,3,4,5],[*'XYZ']]), 
                      columns='Shirt Bottomware Shoes Sunglasses Earrings'.split())    
    df = df.rename_axis(['Number', 'Variable'])
    df
    

    Input Dataframe:

                     Shirt  Bottomware  Shoes  Sunglasses  Earrings
    Number Variable                                                
    1      X             1          16     31          46        61
           Y             2          17     32          47        62
           Z             3          18     33          48        63
    2      X             4          19     34          49        64
           Y             5          20     35          50        65
           Z             6          21     36          51        66
    3      X             7          22     37          52        67
           Y             8          23     38          53        68
           Z             9          24     39          54        69
    4      X            10          25     40          55        70
           Y            11          26     41          56        71
           Z            12          27     42          57        72
    5      X            13          28     43          58        73
           Y            14          29     44          59        74
           Z            15          30     45          60        75
    

    Reshape and filter:

    df_out = df.unstack().stack(0, dropna=False).loc[[(1,'Shirt'),(3,'Shoes'),(5,'Earrings')]]    
    df_out
    

    Output dataframe:

    Variable          X   Y   Z
    Number                     
    1      Shirt      1   2   3
    3      Shoes     37  38  39
    5      Earrings  73  74  75