Search code examples
pythonpandasdataframeopenpyxl

Delete empty rows before column names start on excel spreadsheet using python


My column names on a spreadsheet start on row 2, row 1 is completely blank.
When I try to delete it, it also deletes column names on row 2 and only keeps first column name.
Any idea of what I am doing wrong?

Original file

|          |          |
| Column A | Column B |
| Cell 1   | Cell 2   |
| Cell 3   | Cell 4   |

Expected output

Column A Column B
Cell 1 Cell 2
Cell 3 Cell 4

Current output

Column A
Cell 1
Cell 3

I have tried:
openpyxl

sheet.delete_rows(1)

pandas

df = df.drop(index=0)

Solution

  • If you have the following data layout and want to delete the first row;
    enter image description here

    The following code shown for using Openpyxl and Pandas, is all that's needed.

    from openpyxl import load_workbook
    import pandas as pd
    
    # Openpyxl
    wb = load_workbook('foo.xlsx')
    
    wb['Sheet1'].delete_rows(1)
    
    wb.save('foo_openpyxl.xlsx')
    
    
    # Pandas
    df = pd.read_excel('foo.xlsx')
    df.to_excel('foo_pandas.xlsx', header=False, index=False)
    
    

    For Pandas since row 1 will be the default header, writing back to Excel without the header will remove row 1.

    The output from both is the same;
    enter image description here