Search code examples
pythonpandasgroup-bypivot-tablereshape

Reshape Pandas Dataframe and group by 2 level columns


I have a dataframe with flat structure of having unique Rows as follow. Original Df

I need to reshape it as shown below. Desired reshaped format

Using Pivot table and swapping levels, I managed to obtain somewhat closer to the result, but it has inadvertently sorted the level1 sub columns. incorrect output

data = {
    "Case_ID": ["1-1 Max Export", "1-1 Max Export", "1-2 Max Export", "1-2 Max Export", "1-3 Max Export", "1-3 Max Export"],
    "Item": ["3-Winding TX", "R1-SUT1", "3-Winding TX", "R1-SUT1", "3-Winding TX", "R1-SUT1"],
    "HV Current": [0.5, 0.1, 0.4, 0.1, 0.5, 0.1],
    "Total Power": [114.5, 2.2, 113.4, 2.2, 100.0, 1.8],
    "Tap Pos.": [15, 3, 1, 3, 20, 3]
}


df = pd.DataFrame(data)   # Original Dataframe Format with Flat Structure

item_order = list (df.columns[2:])  # The second Level columns must be in same order as per the original df

# Pivot the DataFrame
reshaped_df = df.pivot_table(index='Case_ID', 
                             columns='Item', 
                             values=list (df.columns[2:]), 
                             aggfunc='first')

# Swap level 0 and level 1 columns
reshaped_df.columns = reshaped_df.columns.swaplevel(0, 1)

# Without.sort_index(axis=1) the code doesn't work. 
# The Level 0 and Level 1 colums shallbe in the same order as per the original df
reshaped_df = reshaped_df.sort_index(axis=1)

reshaped_df

The Tap Pos. sub column needs to be the last in each category The Sub columns sequence should be as per the original df (ie HV Current, Total Power , Tap Pos.).

  • a) I'm looking to fix the above code.

  • b) Also interested to see there is another way to achieve this instead of using pivot table.


Solution

  • Code

    using sort_index with key parameter, you can use category dtype when sort.

    out = (df
           .pivot_table(index='Case_ID', columns='Item', aggfunc='first')
           .swaplevel(0, 1, axis=1)
           .sort_index(
               axis=1, 
               key=lambda x: pd.Categorical(
                   x, 
                   categories=df.get(x.name, df.columns[2:]).unique()
               )
           )
    )
    

    out:

    enter image description here