Search code examples
pythonpandaspivotstackmulti-index

Unstack multilpe columns to rows - multindex


I have a dataframe as per the below with a multi index

mx_dict = pd.read_excel('ABC Bookings.xlsx', header=[1,2], sheet_name=None)

Excel file if useful: excel file

mx = pd.concat(mx_dict.values(), axis=0)
print(mx)

Unnamed: 0_level_0 Unnamed: 1_level_0 January                 February  \
              Client   Client Full Name  Gross     Net Billable   Gross    
0                ABC         Client ABC   500.0  400.0      800      NaN   
1                ABC         Client ABC     NaN    NaN        0    300.0   

                   
     Net Billable  
0    NaN      NaN  
1  100.0    600.0  

dict of dataframe below

mx.to_dict('list')

{('Unnamed: 0_level_0', 'Client'): ['ABC', 'ABC'],
 ('Unnamed: 1_level_0', 'Client Full Name'): ['Client ABC', 'Client ABC'],
 ('January', 'Gross '): [500.0, nan],
 ('January', 'Net'): [400.0, nan],
 ('January', 'Billable'): [800, 0],
 ('February', 'Gross '): [nan, 300.0],
 ('February', 'Net'): [nan, 100.0],
 ('February', 'Billable'): [nan, 600.0]}

Level 0 - The first 2 columns dont have names, but the rest will be months of the year - Jan - December

Level 1 - this will include client code and names as well as Gross/Net/Billable for each month of the year

I want to 'unstack' both the month and values and add as extra rows, as per the below

|Client | Client Full Name | Month   | Gross | Net | Billable
|ABC    | Client ABC       | January | 500   | 400 | 800
|ABC    | Client ABC       | February| 300   | 100 | 600

I have tried the below as suggested but it isnt working, probably because I didnt include how my df was structured on my initial post

mx1 = mx.stack(level=0).reset_index().rename(columns={"level_1": "Month"})

I have tried various other things including unstack and pivot but can't seem to get my desired result. Any ideas?

This is just a sample as I cant share the full details but the excel file will end up with hundreds of different clients and values

(Let me know if you need me to share any more code)

Thanks


Solution

  • Assuming this is your dataframe:

    nan = np.nan
    
    d = {
        ("Unnamed: 0_level_0", "Client"): ["ABC", "ABC"],
        ("Unnamed: 1_level_0", "Client Full Name"): ["Client ABC", "Client ABC"],
        ("January", "Gross "): [500.0, nan],
        ("January", "Net"): [400.0, nan],
        ("January", "Billable"): [800, 0],
        ("February", "Gross "): [nan, 300.0],
        ("February", "Net"): [nan, 100.0],
        ("February", "Billable"): [nan, 600.0],
    }
    
    df = pd.DataFrame(d)
    
      Unnamed: 0_level_0 Unnamed: 1_level_0 January                 February                
                  Client   Client Full Name  Gross     Net Billable   Gross     Net Billable
    0                ABC         Client ABC   500.0  400.0      800      NaN    NaN      NaN
    1                ABC         Client ABC     NaN    NaN        0    300.0  100.0    600.0
    

    You can first .set_index with the first two columns, that relate to the client name. Then use .stack and .reset_index:

    client_tuple_col = ("Unnamed: 0_level_0", "Client")
    client_full_tuple_col = ("Unnamed: 1_level_0", "Client Full Name")
    
    df = df.set_index([client_tuple_col, client_full_tuple_col])
    df = (
        df.stack(level=0)
        .reset_index()
        .rename(
            columns={
                "level_2": "Month",
                client_tuple_col: client_tuple_col[1],
                client_full_tuple_col: client_full_tuple_col[1],
            }
        )
        .dropna()
    )
    
      Client Client Full Name     Month  Gross     Net  Billable
    0    ABC       Client ABC   January   500.0  400.0     800.0
    1    ABC       Client ABC  February   300.0  100.0     600.0