Search code examples
pythonexcelpandasdataframegroup-by

pandas combine specific excel sheets into one


I have an excel sheet named output.xlsx with multiple sheets in it.

Example, the sheets within it are named as P1,P2,P3,P4

I would like to do the below

a) Combine sheet P1 and sheet P2 into one single sheet name it as P1&P2

b) retain the P3 and P4 sheets in output.xlsx as it is. No changes

c) Overwrite the same file output.xlsx but don't touch P3 and P4

So, I tried the below

df = pd.concat(pd.read_excel('output.xlsx', sheet_name=['P1','P2']), ignore_index=True)

with pd.ExcelWriter('output.xlsx', engine='xlsxwriter') as writer:
    df.to_excel(writer, sheet_name="P1&P2",index=False)

But the above code ovcerwrites the file and deletes other sheet

How can I combine only the sheets P1 and P2 and keep P3 and P4 as it is


Solution

  • You can use openpyxl instead of pandas:

    from openpyxl import load_workbook
    
    # Load workbook
    wb = load_workbook('output.xlsx')
    
    # Copy rows from P2 to P1
    for row in wb['P2'].iter_rows(2):  # skip header
        wb['P1'].append([cell. value for cell in row])
    
    # Remove P2
    wb.remove(wb['P2'])
    
    # Rename P1 to P1P2
    wb['P1'].title = 'P1P2'
    
    # Save to another file, just to be sure
    wb.save('output2.xlsx')