Search code examples
macosms-wordwordml

Word for Mac not generating the expected WordML


I'm writing a parser for WordML. Going through the spec I read that the way to count the number of pages in a document is to read the element Pages in DocumentProperties. If I read the spec correctly, DocumentProperties should always be there.

While creating a test document on my Mac I noticed that there is no Pages or DocumentProperties element in the generated xml. I have a w:document and inside it a w:body with content.

Is DocumentProperties mandatory or is this a Mac thing?


Solution

  • There are two different Word XML formats - the old Word 2003 XML format, and the Office Open XML format, which can be saved either as a .docx, where it is saved as a set of XML and potentially other file types in a .zip container, and the "Flat OPC" format, which is a single-file XML representationof the same thing.

    Each format stores properties in a different place.

    If you are seeing an element called w:document then you are actually saving in the OOXML format. In that format, the "built-in" properties are saved in at least two "parts". You would normally find the element within a element in a pkg:part named /docProps/app.xml.

    There are at least three complications:

    1. the page count is really the last page count that Word saved (assuming that it was Word that saved the file). That is only correct for a particular paper size, printer driver etc.
    2. I don't think this Element is mandatory in either of the two XML representations I mentioned. Not sure, though. But as far as I know, Word will always save it.
    3. In the general case, you can't assume that thet this properties part is actually going to be called /docProps/app.xml. In practice, Word should always save it with that name. But in theory, you have to look either for an Element with a particular URI, or follow a relationship with a particular type. I forget the details in this case.