Search code examples
pythonexcelms-officeopenpyxl

openpyxl and MS Excel: Problem with content, recover as much as we can


I have an excel with a named/excel-table (B22:O25) in it with header row + 3 rows of empty data-cells. I make a copy of this file, then open the copy using openpyxl.open(), insert 17 more rows (ws.insert_rows(idx=23, amount=17)), update table.ref (+copy formats, update autofilters etc), write a dataframe into it (dataframe_to_rows()).

When I save it and open it using MS Excel (Professional Plus 2016), Excel says: We found some problem with some content in 'my file'. Do you want us to try to recover as much as we can? If you trust the source of this workbook, click Yes.

enter image description here

On clicking yes, it shows:

enter image description here

Linked log file (error0340...xml) has same info: <repairedRecord>Repaired Records: Table from /xl/tables/table1.xml part (Table)</repairedRecord> and nothing else useful.

How do I debug this? How do I find out what was wrong/messed-up in my xlsx file? Question is more generic, where do I find some debug info from Excel?

I can post my code and example xlsx files if it helps.


edit: after using Open XML SDK Tools to compare the xlsx file with error and the one after being repaired by MS Excel. Ignoring the <xml> tag, only diff is missing headerRowCount="1" in repaired xlsx's /xl/tables/table1.xml

-  
- <table id="1" name="hit_ratio" displayName="hit_ratio" ref="B22:O42" headerRowCount="1" totalsRowShown="0" headerRowDxfId="17" headerRowBorderDxfId="16" tableBorderDxfId="15" totalsRowBorderDxfId="14" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">
+ <?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
+ <table id="1" name="hit_ratio" displayName="hit_ratio" ref="B22:O42" totalsRowShown="0" headerRowDxfId="17" headerRowBorderDxfId="16" tableBorderDxfId="15" totalsRowBorderDxfId="14" xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">

Solution

  • If it helps anyone. Use Open XML SDK to view/compare xlsx files. It's tedious, but seems to be the only way.

    More options -> Ignore attribute can go a long way. enter image description here