Search code examples
openpyxlexport-to-excelpandas.excelwriter

save dataframe to an existing excel sheet without overwriting other information on that sheet


How can I save many dataframes to different existing excel sheets without overwriting other information on those sheets and also change the date format

I tried to save first dataframe

       with pd.ExcelWriter('nameofmyexistingexcel.xlsx', mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
       mydataframe.to_excel(writer, sheet_name="nameofexistingsheet")

(I tried other various options too from pandas doc and from this forum)

It does some interaction with excel, but just breaks excel file , so it can't be open anymore.

When I choose simple option of saving to a new excel, it works fine.


Solution

  • sheet["A1"] would define a cell object for cell 'A1' (first cell in excel).
    To use it you need to define 'sheet' first as a Openpyxl worksheet object like below. Then you can set its attributes like 'number_format'.

    import pandas as pd
    
    
    mydataframe = pd.DataFrame({'Headers': ['Date1', 'Date2'],
                                'DatesA': ['20.10.2023', '21/10/2023'],
                                'DatesB': ['20.10.2023', '21/10/2023']
                                })
    
    mydataframe["DatesA"] = mydataframe["DatesA"].apply(pd.to_datetime, dayfirst=True)
    print(mydataframe.dtypes)
    
    with pd.ExcelWriter('nameofmyexistingexcel.xlsx', mode='a', engine='openpyxl', if_sheet_exists='overlay') as writer:
        mydataframe.to_excel(writer, sheet_name="nameofexistingsheet")
    
        sheet = writer.sheets['nameofexistingsheet'] # Openpyxl worksheet
        # sheet["A1"].number_format = 'dd.mm.yyyy'
        for cell in sheet['B']: # Loop column B 
            if cell.row == 1:  # Skip row 1 
                continue
            ### Apply number format to all cells in Column B to max used row
            cell.number_format = 'dd.mm.yyyy' 
    
    

    UPDATE
    I've added the 'mydataframe' dataframe creation to my code example and updated to include two date columns. Column 'DatesA' is datetime whereas the same dates in Column "DatesB' are default (strings)

           DatesA      DatesB
    0  20.10.2023  20.10.2023
    1  21/10/2023  21/10/2023
    

    The data type for the columns is a follows

    DatesA    datetime64[ns]
    DatesB            object
    
    

    Although I didn't include it in the example code all the Dates in columns B and C are formatted as cell.number_format = 'dd.mm.yyyy' using the column loop.

    When the dataframe is written to Excel, the 'DatesA' Column is interpreted as dates by Excel so they take on the format of the date 'dd.mm.yyyy' whether they were that originally or using forward slash as the separator.
    The 'DatesB' column is string so its recognised by Excel as a string (you can see the orientation is to the left of the cell) and the number formatting does not change the date '21/10/2023' even though its applied.

    enter image description here