Search code examples
pythonpandasexceldataframemulti-index

Pandas Remove Blank Row in Column Multi Index


I have a MultiIndex dataframe in pandas. I can't seem to get rid of the blank row under the column headers in the resulting Excel file while maintaining the multi-indices.

Here is what I have:

3 Month Total 3 Month Total 3 Month Total August August August July July July June June June
variable Sales Cost GP Sales Cost GP Sales Cost GP Sales Cost GP
Division Region Store Employee
Commerical Texas 0001 Bob 1000 600 400 450 200 250 150 100 50 400 300 100
Retail West 0406 Sally 2400 1700 700 1100 800 300 750 500 250 550 400 150

Here is what I want:

3 Month Total 3 Month Total 3 Month Total August August August July July July June June June
Division Region Store Employee Sales Cost GP Sales Cost GP Sales Cost GP Sales Cost GP
Commerical Texas 0001 Bob 1000 600 400 450 200 250 150 100 50 400 300 100
Retail West 0406 Sally 2400 1700 700 1100 800 300 750 500 250 550 400 150

Can someone please point me in the right direction ?

I found this but hard coding values won't work for me as the columns headers change.

Edit: here is the df.head().to_dict('tight') as requested.

{'index': [('Commerical',
   'Service',
   426,
   'Los Angeles, CA',
   2767,
   78910,
   'Steve'),
  ('Retail', 'Service', 1, 'Houston, TX', 2706, 123456, 'Bob')],
 'columns': [('3 Months Combined', 'Adjusted Hrs'),
  ('3 Months Combined', 'Clocked Hrs'),
  ('3 Months Combined', 'GP'),
  ('3 Months Combined', 'GP-Supp'),
  ('3 Months Combined', 'Mobile'),
  ('3 Months Combined', 'Prd Hrs'),
  ('3 Months Combined', 'Sum of Days Worked'),
  ('3 Months Combined', 'Supp $'),
  ('3 Months Combined', 'Total Billed Hrs'),
  ('3 Months Combined', 'Total Billed Revenue'),
  ('3 Months Combined', 'Total Clocked Hrs'),
  ('3 Months Combined', 'Total External Cost'),
  ('3 Months Combined', 'Total Non Billed Hrs'),
  ('3 Months Combined', 'Rev/Day'),
  ('3 Months Combined', 'Hrs Billed/Day'),
  ('3 Months Combined', 'Prof%'),
  ('3 Months Combined', 'GP/Day'),
  ('3 Months Combined', 'GP-Supp/Day'),
  ('3 Months Combined', 'Avg Supp/Day'),
  ('3 Months Combined', 'Eff Rate'),
  ('3 Months Combined', 'Prod%'),
  ('3 Months Combined', 'Eff%'),
  ('August', 'Title'),
  ('August', 'Level'),
  ('August', '40-30-20-10'),
  ('August', 'Rev/Day'),
  ('August', 'Hrs Billed/Day'),
  ('August', 'Prof%'),
  ('August', 'GP/Day'),
  ('August', 'GP-Supp/Day'),
  ('August', 'GP%'),
  ('August', 'Total Supp $/Day'),
  ('August', 'Eff Rate'),
  ('August', 'Prod%'),
  ('August', 'Eff%'),
  ('August', 'Labor Rate'),
  ('August', 'Tech Count'),
  ('July', 'Title'),
  ('July', 'Level'),
  ('July', '40-30-20-10'),
  ('July', 'Rev/Day'),
  ('July', 'Hrs Billed/Day'),
  ('July', 'Prof%'),
  ('July', 'GP/Day'),
  ('July', 'GP-Supp/Day'),
  ('July', 'GP%'),
  ('July', 'Total Supp $/Day'),
  ('July', 'Eff Rate'),
  ('July', 'Prod%'),
  ('July', 'Eff%'),
  ('July', 'Labor Rate'),
  ('July', 'Tech Count'),
  ('June', 'Title'),
  ('June', 'Level'),
  ('June', '40-30-20-10'),
  ('June', 'Rev/Day'),
  ('June', 'Hrs Billed/Day'),
  ('June', 'Prof%'),
  ('June', 'GP/Day'),
  ('June', 'GP-Supp/Day'),
  ('June', 'GP%'),
  ('June', 'Total Supp $/Day'),
  ('June', 'Eff Rate'),
  ('June', 'Prod%'),
  ('June', 'Eff%'),
  ('June', 'Labor Rate'),
  ('June', 'Tech Count')],
 'data': [[-81.30000000000001,
   501.06,
   32378.5,
   31716.28,
   0,
   473.83,
   60,
   662.22,
   497.8,
   48308.100000000006,
   579.3,
   15929.6,
   2.5,
   805.1350000000001,
   8.296666666666667,
   0.8556204881402545,
   539.6416666666667,
   528.6046666666666,
   11.037,
   97.0431900361591,
   0.9409603622209866,
   0.8284194753392566,
   'VEH MOD TECH 4',
   4,
   0.863313125913714,
   422.890476190476,
   5.03809523809524,
   1.01584253480557,
   261.671428571429,
   261.671428571429,
   0.618768790748477,
   0.0,
   83.9385633270321,
   0.954909470752089,
   1.02035855363112,
   32,
   1,
   'VEH MOD TECH 4',
   4,
   1.9189405558250532,
   1547.32470588235,
   15.2764705882353,
   0.829235583370586,
   1058.47764705882,
   1019.52352941176,
   0.684069505925217,
   38.9541176470588,
   101.288101655757,
   0.976361031518625,
   0.625972120322817,
   32,
   1,
   'VEH MOD TECH',
   4,
   0.9339213695893974,
   596.494545454546,
   6.01363636363636,
   0.804402018605217,
   404.058181818182,
   404.058181818182,
   0.677387890463069,
   0.0,
   99.1903250188964,
   0.902629931120852,
   0.805353220025436,
   32,
   1],
  [18.8,
   462.45,
   27412.2,
   27214.64,
   'XXX',
   462.45,
   32,
   197.56,
   476.01,
   45859.7,
   470.95,
   18447.5,
   1.0,
   1433.115625,
   14.8753125,
   1.0086026062082847,
   856.63125,
   850.4575,
   6.17375,
   96.3418835738745,
   0.9978422699320315,
   1.040653043572278,
   'MBL SVC TECH 5',
   5,
   1.0367635962028665,
   758.888888888889,
   9.61111111111111,
   1.02366863905325,
   278.333333333333,
   256.382222222222,
   0.366764275256223,
   21.9511111111111,
   78.9595375722543,
   0.992994746059545,
   1.0,
   50,
   1,
   'MBL SVC TECH 5',
   5,
   1.5748790164805966,
   1449.45,
   16.0227272727273,
   1.0,
   1149.45,
   1149.45,
   0.79302494049467,
   0.0,
   90.4621276595745,
   1.0,
   1.0,
   50,
   1,
   'MBL SVC TECH 5',
   5,
   1.8559248654501908,
   1923.8125,
   17.7716666666667,
   1.00975378787879,
   1021.9375,
   1021.9375,
   0.531204314349761,
   0.0,
   108.251664634718,
   1.0,
   1.1544782251438,
   50,
   1]],
 'index_names': ['Division',
  'Department',
  'Profit Center',
  'Branch',
  'Tech ID',
  'Employee No',
  'Tech Name'],
 'column_names': [None, 'variable']}

Edit 2: I am reading several data files into data frames, merging them, and do a melt so I can do some calculations and group by month. I then take the dataframe and put it in a pivot table so everything is grouped by month. This new pivot table is df1. I do another pivot table, from the original data frame, where it is the 3 month combined totals, called df2. I then merge these two pivot tables to create a new pivot table, df3, which is what you see here.


Solution

  • If you expect such output for visual purposes, you can try something like below :

    def shift_header(df, cols):
        return df.T.reset_index().T.reset_index(drop=True).set_axis(cols, axis=1)
    
    l0, l1 = zip(*df.columns)
    
    left = df.index.to_frame(index=False).pipe(shift_header, cols=[""]*len(df.index.names))
    right = (
        pd.DataFrame(df.reset_index(drop=True).to_numpy(), columns=l1)
            .pipe(shift_header, cols=l0)
    )
      
    out = pd.concat([left, right], axis=1)
    

    Output :

    3 Month Total 3 Month Total 3 Month Total August August August July July July June June June
    0 Division Region Store Employee Sales Cost GP Sales Cost GP Sales Cost GP Sales Cost GP
    1 Commerical Texas 1 Bob 1000 600 400 450 200 250 150 100 50 400 300 100
    2 Retail West 406 Sally 2400 1700 700 1100 800 300 750 500 250 550 400 150

    Used input :

    df = pd.read_clipboard(header=[0, 1]).rename_axis([None, "variable"], axis=1)
    

    UPDATE :

    If you need to make an Excel spreadsheet, you can use :

    df = df.rename_axis([None]*df.columns.nlevels, axis=1)
    
    CORNER = len(df.index.names)
            
    borders = {"border": "1px solid black"}
    fill_data = pd.DataFrame(df.to_numpy()).style.set_properties(**borders)
    
    with pd.ExcelWriter("output.xlsx", engine="openpyxl") as writer: 
        df.reset_index(drop=True).to_excel(writer, startcol=CORNER-1) # bottom part
        df.index.to_frame().iloc[:, 0:0].to_excel(writer, startrow=0) # left part
        fill_data.to_excel(writer, startrow=2, startcol=CORNER, header=False, index=False)
        writer.sheets["Sheet1"].delete_rows(len(df)+3) # clean up
    

    Output (output.xlsx) :

    enter image description here