Search code examples
pythonexcelpandasdataframexlsxwriter

Putting many python pandas dataframes to one excel worksheet


It is quite easy to add many pandas dataframes into excel work book as long as it is different worksheets. But, it is somewhat tricky to get many dataframes into one worksheet if you want to use pandas built-in df.to_excel functionality.

# Creating Excel Writer Object from Pandas  
writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   
workbook=writer.book
worksheet=workbook.add_worksheet('Validation') 
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 

The above code won't work. You will get the error of

 Sheetname 'Validation', with case ignored, is already in use.

Now, I have experimented enough that I found a way to make it work.

writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   # Creating Excel Writer Object from Pandas  
workbook=writer.book
df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 

This will work. So, my purpose of posting this question on stackoverflow is twofold. Firstly, I hope this will help someone if he/she is trying to put many dataframes into a single work sheet at excel.

Secondly, Can someone help me understand the difference between those two blocks of code? It appears to me that they are pretty much the same except the first block of code created worksheet called "Validation" in advance while the second does not. I get that part.

What I don't understand is why should it be any different ? Even if I don't create the worksheet in advance, this line, the line right before the last one,

 df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)  

will create a worksheet anyway. Consequently, by the time we reached the last line of code the worksheet "Validation" is already created as well in the second block of code. So, my question basically, why should the second block of code work while the first doesn't?

Please also share if there is another way to put many dataframes into excel using the built-in df.to_excel functionality !!


Solution

  • To create the Worksheet in advance, you need to add the created sheet to the sheets dict:

    writer.sheets['Validation'] = worksheet

    Using your original code:

    # Creating Excel Writer Object from Pandas  
    writer = pd.ExcelWriter('test.xlsx',engine='xlsxwriter')   
    workbook=writer.book
    worksheet=workbook.add_worksheet('Validation')
    writer.sheets['Validation'] = worksheet
    df.to_excel(writer,sheet_name='Validation',startrow=0 , startcol=0)   
    another_df.to_excel(writer,sheet_name='Validation',startrow=20, startcol=0) 
    

    Explanation

    If we look at the pandas function to_excel, it uses the writer's write_cells function:

    excel_writer.write_cells(formatted_cells, sheet_name, startrow=startrow, startcol=startcol)
    

    So looking at the write_cells function for xlsxwriter:

    def write_cells(self, cells, sheet_name=None, startrow=0, startcol=0):
        # Write the frame cells using xlsxwriter.
        sheet_name = self._get_sheet_name(sheet_name)
        if sheet_name in self.sheets:
            wks = self.sheets[sheet_name]
        else:
            wks = self.book.add_worksheet(sheet_name)
            self.sheets[sheet_name] = wks
    

    Here we can see that it checks for sheet_name in self.sheets, and so it needs to be added there as well.