Search code examples
.netms-wordopenxmloffice-interop

Fixing "Word found unreadable content in corrupt..." programmatically


I'm getting a OpenXml generated docx file from another system. When try using open the file in my application using Microsoft.Office.Interop.Word.Application.Open(filename) I get a The file appears to be corrupted exception.

When I manually open the docx file I'm greeted with a Word found unreadable content in corrupt xxx.docx. Do you want to recover the contents of this document? If you trust the source of this document, click Yes. prompt. When I click Yes, it is able to recover the document in a new unsaved Word file.

I have tried comparing the previous corrupt.docx file's document.xml with the recovered.docx file's document.xml. While there are many of formatting changes between the two document.xmls (extra space between closing xml-tags), the main difference was the AltChunk actually was embedded into the recovered.docx and there were several empty "run" tags that got removed. I'm not sure what would be causing the file to be considered corrupt as those don't seem like they should.

That said, is there a way to run whatever process happens when I click Yes to that ...Do you want to recover the contents of this document?... prompt programatically through my application; this would be the ideal? Less preferably, is there a way to tell what parts of the xml is actually corrupting in a word doc?


Solution

  • That said, is there a way to run whatever process happens when I click Yes to that ...Do you want to recover the contents of this document?... prompt programnatically through my application; this would be the ideal? Less preferably, is there a way to tell what parts of the xml is actually corrupting in a word doc?

    1. No, that's not exposed to the outside
    2. Theoretically, validation could be possible. But given there's an AltChunk involved, that might not turn up the problem. The content of AltChunk isn't integrated until Word processes the document, at which time it's integrated. And if what's coming in "breaks" something, the validation won't pick that up.

    In this particular case, I might try removing the AltChunk manually (the pieces are in a few places in the zip file) and see if the file can open without it. But if you're not intimately familiar with the Word Open XML zip package it might be better to ask the producer/source of the document.