I have a MultiIndex
dataframe in pandas. I can't seem to get rid of the blank row under the column headers in the resulting Excel file while maintaining the multi-indices.
Here is what I have:
3 Month Total | 3 Month Total | 3 Month Total | August | August | August | July | July | July | June | June | June | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
variable | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP | |||
Division | Region | Store | Employee | ||||||||||||
Commerical | Texas | 0001 | Bob | 1000 | 600 | 400 | 450 | 200 | 250 | 150 | 100 | 50 | 400 | 300 | 100 |
Retail | West | 0406 | Sally | 2400 | 1700 | 700 | 1100 | 800 | 300 | 750 | 500 | 250 | 550 | 400 | 150 |
Here is what I want:
3 Month Total | 3 Month Total | 3 Month Total | August | August | August | July | July | July | June | June | June | ||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Division | Region | Store | Employee | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP |
Commerical | Texas | 0001 | Bob | 1000 | 600 | 400 | 450 | 200 | 250 | 150 | 100 | 50 | 400 | 300 | 100 |
Retail | West | 0406 | Sally | 2400 | 1700 | 700 | 1100 | 800 | 300 | 750 | 500 | 250 | 550 | 400 | 150 |
Can someone please point me in the right direction ?
I found this but hard coding values won't work for me as the columns headers change.
Edit: here is the df.head().to_dict('tight') as requested.
{'index': [('Commerical',
'Service',
426,
'Los Angeles, CA',
2767,
78910,
'Steve'),
('Retail', 'Service', 1, 'Houston, TX', 2706, 123456, 'Bob')],
'columns': [('3 Months Combined', 'Adjusted Hrs'),
('3 Months Combined', 'Clocked Hrs'),
('3 Months Combined', 'GP'),
('3 Months Combined', 'GP-Supp'),
('3 Months Combined', 'Mobile'),
('3 Months Combined', 'Prd Hrs'),
('3 Months Combined', 'Sum of Days Worked'),
('3 Months Combined', 'Supp $'),
('3 Months Combined', 'Total Billed Hrs'),
('3 Months Combined', 'Total Billed Revenue'),
('3 Months Combined', 'Total Clocked Hrs'),
('3 Months Combined', 'Total External Cost'),
('3 Months Combined', 'Total Non Billed Hrs'),
('3 Months Combined', 'Rev/Day'),
('3 Months Combined', 'Hrs Billed/Day'),
('3 Months Combined', 'Prof%'),
('3 Months Combined', 'GP/Day'),
('3 Months Combined', 'GP-Supp/Day'),
('3 Months Combined', 'Avg Supp/Day'),
('3 Months Combined', 'Eff Rate'),
('3 Months Combined', 'Prod%'),
('3 Months Combined', 'Eff%'),
('August', 'Title'),
('August', 'Level'),
('August', '40-30-20-10'),
('August', 'Rev/Day'),
('August', 'Hrs Billed/Day'),
('August', 'Prof%'),
('August', 'GP/Day'),
('August', 'GP-Supp/Day'),
('August', 'GP%'),
('August', 'Total Supp $/Day'),
('August', 'Eff Rate'),
('August', 'Prod%'),
('August', 'Eff%'),
('August', 'Labor Rate'),
('August', 'Tech Count'),
('July', 'Title'),
('July', 'Level'),
('July', '40-30-20-10'),
('July', 'Rev/Day'),
('July', 'Hrs Billed/Day'),
('July', 'Prof%'),
('July', 'GP/Day'),
('July', 'GP-Supp/Day'),
('July', 'GP%'),
('July', 'Total Supp $/Day'),
('July', 'Eff Rate'),
('July', 'Prod%'),
('July', 'Eff%'),
('July', 'Labor Rate'),
('July', 'Tech Count'),
('June', 'Title'),
('June', 'Level'),
('June', '40-30-20-10'),
('June', 'Rev/Day'),
('June', 'Hrs Billed/Day'),
('June', 'Prof%'),
('June', 'GP/Day'),
('June', 'GP-Supp/Day'),
('June', 'GP%'),
('June', 'Total Supp $/Day'),
('June', 'Eff Rate'),
('June', 'Prod%'),
('June', 'Eff%'),
('June', 'Labor Rate'),
('June', 'Tech Count')],
'data': [[-81.30000000000001,
501.06,
32378.5,
31716.28,
0,
473.83,
60,
662.22,
497.8,
48308.100000000006,
579.3,
15929.6,
2.5,
805.1350000000001,
8.296666666666667,
0.8556204881402545,
539.6416666666667,
528.6046666666666,
11.037,
97.0431900361591,
0.9409603622209866,
0.8284194753392566,
'VEH MOD TECH 4',
4,
0.863313125913714,
422.890476190476,
5.03809523809524,
1.01584253480557,
261.671428571429,
261.671428571429,
0.618768790748477,
0.0,
83.9385633270321,
0.954909470752089,
1.02035855363112,
32,
1,
'VEH MOD TECH 4',
4,
1.9189405558250532,
1547.32470588235,
15.2764705882353,
0.829235583370586,
1058.47764705882,
1019.52352941176,
0.684069505925217,
38.9541176470588,
101.288101655757,
0.976361031518625,
0.625972120322817,
32,
1,
'VEH MOD TECH',
4,
0.9339213695893974,
596.494545454546,
6.01363636363636,
0.804402018605217,
404.058181818182,
404.058181818182,
0.677387890463069,
0.0,
99.1903250188964,
0.902629931120852,
0.805353220025436,
32,
1],
[18.8,
462.45,
27412.2,
27214.64,
'XXX',
462.45,
32,
197.56,
476.01,
45859.7,
470.95,
18447.5,
1.0,
1433.115625,
14.8753125,
1.0086026062082847,
856.63125,
850.4575,
6.17375,
96.3418835738745,
0.9978422699320315,
1.040653043572278,
'MBL SVC TECH 5',
5,
1.0367635962028665,
758.888888888889,
9.61111111111111,
1.02366863905325,
278.333333333333,
256.382222222222,
0.366764275256223,
21.9511111111111,
78.9595375722543,
0.992994746059545,
1.0,
50,
1,
'MBL SVC TECH 5',
5,
1.5748790164805966,
1449.45,
16.0227272727273,
1.0,
1149.45,
1149.45,
0.79302494049467,
0.0,
90.4621276595745,
1.0,
1.0,
50,
1,
'MBL SVC TECH 5',
5,
1.8559248654501908,
1923.8125,
17.7716666666667,
1.00975378787879,
1021.9375,
1021.9375,
0.531204314349761,
0.0,
108.251664634718,
1.0,
1.1544782251438,
50,
1]],
'index_names': ['Division',
'Department',
'Profit Center',
'Branch',
'Tech ID',
'Employee No',
'Tech Name'],
'column_names': [None, 'variable']}
Edit 2: I am reading several data files into data frames, merging them, and do a melt so I can do some calculations and group by month. I then take the dataframe and put it in a pivot table so everything is grouped by month. This new pivot table is df1. I do another pivot table, from the original data frame, where it is the 3 month combined totals, called df2. I then merge these two pivot tables to create a new pivot table, df3, which is what you see here.
If you expect such output for visual purposes, you can try something like below :
def shift_header(df, cols):
return df.T.reset_index().T.reset_index(drop=True).set_axis(cols, axis=1)
l0, l1 = zip(*df.columns)
left = df.index.to_frame(index=False).pipe(shift_header, cols=[""]*len(df.index.names))
right = (
pd.DataFrame(df.reset_index(drop=True).to_numpy(), columns=l1)
.pipe(shift_header, cols=l0)
)
out = pd.concat([left, right], axis=1)
Output :
3 Month Total | 3 Month Total | 3 Month Total | August | August | August | July | July | July | June | June | June | |||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Division | Region | Store | Employee | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP | Sales | Cost | GP |
1 | Commerical | Texas | 1 | Bob | 1000 | 600 | 400 | 450 | 200 | 250 | 150 | 100 | 50 | 400 | 300 | 100 |
2 | Retail | West | 406 | Sally | 2400 | 1700 | 700 | 1100 | 800 | 300 | 750 | 500 | 250 | 550 | 400 | 150 |
Used input :
df = pd.read_clipboard(header=[0, 1]).rename_axis([None, "variable"], axis=1)
UPDATE :
If you need to make an Excel spreadsheet, you can use :
df = df.rename_axis([None]*df.columns.nlevels, axis=1)
CORNER = len(df.index.names)
borders = {"border": "1px solid black"}
fill_data = pd.DataFrame(df.to_numpy()).style.set_properties(**borders)
with pd.ExcelWriter("output.xlsx", engine="openpyxl") as writer:
df.reset_index(drop=True).to_excel(writer, startcol=CORNER-1) # bottom part
df.index.to_frame().iloc[:, 0:0].to_excel(writer, startrow=0) # left part
fill_data.to_excel(writer, startrow=2, startcol=CORNER, header=False, index=False)
writer.sheets["Sheet1"].delete_rows(len(df)+3) # clean up
Output (output.xlsx) :