Search code examples
xmlzipdocx

Reconstructing docx from xml files


From what I have read, docx files are zipped collections of xml file. On Windows 7, (the only OS on which I have tried this), if I save a file, say f.docx from Word, then exit Word and change the file name to f.zip, I can unzip the bundle and read the component files. But if I then remove then re-zip the f folder (without any modifications) and change the extension back to docx, I then get an error saying that "The file f.docx cannot be opened because there are problems with the contents". And when I look at the details, it says, "Microsoft Office cannot open this file because some parts are missing or invalid."

Question: Why is that? And how can the component pieces be reassembled into a valid docx file?

A similar question is asked here but the offered solution does not work. As I've noted above, I'm not altering anything in the folders, nor even opening the files. Although I cannot see why it would be of relevance, my method for rezipping the file is to use the context-menu command "Send to compressed (zipped) folder".


Solution

  • As @Pawel noted in his comment, the thing to do is ensure that the rezipping is done from the command line. In the absence of a built-in zip command in Windows 7 (I was unable to get the PowerShell solution mentioned here to work) one can use 7-zip for the recreation of the zipped archive; unzipping with Windows 7 context menu appears not to be the problem. There is something to be careful of using 7-zip. Assume that foo.docx has been renamed to foo.zip and uncompressed with the context menu to folder foo. Then, when it comes time to rezip the component files with 7-zip, do not zip the foo folder. Instead, descend into the foo folder, select the component files and folders, and the use 7-zip to zip those components into a foo.zip folder than can be renamed back to foo.docx.