Search code examples
pythonexcelxmllxmlopenpyxl

openpyxl: get the xml source code of a worksheet WITHOUT zipfile


from openpyxl import load_workbook

wb = load_workbook('file.xlsx')
ws = wb['Sheet1']

Is there any way to retrieve the xml code representing the ws object? NOTE: I want to avoid using the zipfile module. Instead, I'm trying to extract the xml directly from ws.

I read the openpyxl source code and was playing around with lxml - with no success.


Solution

  • I figured it out myself. Instead of extracting the xml by unzipping the saved workbook, you can capture it while the workbook is being saved. The wb.save method makes use of the ExcelWriter class, which I modified to suit this purpose:

    import openpyxl
    from openpyxl.writer.excel import *
    
    class MyExcelWriter(openpyxl.writer.excel.ExcelWriter):
        def write_worksheet(self, ws):
            ws._drawing = SpreadsheetDrawing()
            ws._drawing.charts = ws._charts
            ws._drawing.images = ws._images
            if self.workbook.write_only:
                if not ws.closed:
                    ws.close()
                writer = ws._writer
            else:
                writer = WorksheetWriter(ws)
                writer.write()
            ws._rels = writer._rels
            
            # my addition starts here
            if ws.title == 'My Sheet':
                with open(writer.out, 'r', encoding='utf8') as file:
                    xml_code = file.read()
    
                # my code continues here...
            # my addition ends here
    
            self._archive.write(writer.out, ws.path[1:])
            self.manifest.append(ws)
            writer.cleanup()
    
    openpyxl.writer.excel.ExcelWriter = MyExcelWriter
    

    The write_worksheet function creates a temporary xml file, whose path is stored in writer.out.
    Remember that from <module> import * is considered bad practice – use with caution.