from openpyxl import load_workbook
wb = load_workbook('file.xlsx')
ws = wb['Sheet1']
Is there any way to retrieve the xml code representing the ws
object?
NOTE: I want to avoid using the zipfile
module. Instead, I'm trying to extract the xml directly from ws
.
I read the openpyxl
source code and was playing around with lxml
- with no success.
I figured it out myself. Instead of extracting the xml by unzipping the saved workbook, you can capture it while the workbook is being saved. The wb.save
method makes use of the ExcelWriter
class, which I modified to suit this purpose:
import openpyxl
from openpyxl.writer.excel import *
class MyExcelWriter(openpyxl.writer.excel.ExcelWriter):
def write_worksheet(self, ws):
ws._drawing = SpreadsheetDrawing()
ws._drawing.charts = ws._charts
ws._drawing.images = ws._images
if self.workbook.write_only:
if not ws.closed:
ws.close()
writer = ws._writer
else:
writer = WorksheetWriter(ws)
writer.write()
ws._rels = writer._rels
# my addition starts here
if ws.title == 'My Sheet':
with open(writer.out, 'r', encoding='utf8') as file:
xml_code = file.read()
# my code continues here...
# my addition ends here
self._archive.write(writer.out, ws.path[1:])
self.manifest.append(ws)
writer.cleanup()
openpyxl.writer.excel.ExcelWriter = MyExcelWriter
The write_worksheet
function creates a temporary xml file, whose path is stored in writer.out
.
Remember that from <module> import *
is considered bad practice – use with caution.