I have a dataframe as per the below with a multi index
mx_dict = pd.read_excel('ABC Bookings.xlsx', header=[1,2], sheet_name=None)
Excel file if useful: excel file
mx = pd.concat(mx_dict.values(), axis=0)
print(mx)
Unnamed: 0_level_0 Unnamed: 1_level_0 January February \
Client Client Full Name Gross Net Billable Gross
0 ABC Client ABC 500.0 400.0 800 NaN
1 ABC Client ABC NaN NaN 0 300.0
Net Billable
0 NaN NaN
1 100.0 600.0
dict of dataframe below
mx.to_dict('list')
{('Unnamed: 0_level_0', 'Client'): ['ABC', 'ABC'],
('Unnamed: 1_level_0', 'Client Full Name'): ['Client ABC', 'Client ABC'],
('January', 'Gross '): [500.0, nan],
('January', 'Net'): [400.0, nan],
('January', 'Billable'): [800, 0],
('February', 'Gross '): [nan, 300.0],
('February', 'Net'): [nan, 100.0],
('February', 'Billable'): [nan, 600.0]}
Level 0 - The first 2 columns dont have names, but the rest will be months of the year - Jan - December
Level 1 - this will include client code and names as well as Gross/Net/Billable for each month of the year
I want to 'unstack' both the month and values and add as extra rows, as per the below
|Client | Client Full Name | Month | Gross | Net | Billable
|ABC | Client ABC | January | 500 | 400 | 800
|ABC | Client ABC | February| 300 | 100 | 600
I have tried the below as suggested but it isnt working, probably because I didnt include how my df was structured on my initial post
mx1 = mx.stack(level=0).reset_index().rename(columns={"level_1": "Month"})
I have tried various other things including unstack and pivot but can't seem to get my desired result. Any ideas?
This is just a sample as I cant share the full details but the excel file will end up with hundreds of different clients and values
(Let me know if you need me to share any more code)
Thanks
Assuming this is your dataframe:
nan = np.nan
d = {
("Unnamed: 0_level_0", "Client"): ["ABC", "ABC"],
("Unnamed: 1_level_0", "Client Full Name"): ["Client ABC", "Client ABC"],
("January", "Gross "): [500.0, nan],
("January", "Net"): [400.0, nan],
("January", "Billable"): [800, 0],
("February", "Gross "): [nan, 300.0],
("February", "Net"): [nan, 100.0],
("February", "Billable"): [nan, 600.0],
}
df = pd.DataFrame(d)
Unnamed: 0_level_0 Unnamed: 1_level_0 January February
Client Client Full Name Gross Net Billable Gross Net Billable
0 ABC Client ABC 500.0 400.0 800 NaN NaN NaN
1 ABC Client ABC NaN NaN 0 300.0 100.0 600.0
You can first .set_index
with the first two columns, that relate to the client name. Then use .stack
and .reset_index
:
client_tuple_col = ("Unnamed: 0_level_0", "Client")
client_full_tuple_col = ("Unnamed: 1_level_0", "Client Full Name")
df = df.set_index([client_tuple_col, client_full_tuple_col])
df = (
df.stack(level=0)
.reset_index()
.rename(
columns={
"level_2": "Month",
client_tuple_col: client_tuple_col[1],
client_full_tuple_col: client_full_tuple_col[1],
}
)
.dropna()
)
Client Client Full Name Month Gross Net Billable
0 ABC Client ABC January 500.0 400.0 800.0
1 ABC Client ABC February 300.0 100.0 600.0