Search code examples
c#openxmllibreoffice

Open XML parts are missing in dynamically created Word document


I'm creating WordprocessingDocuments in C# with the Open XML SDK and then converting them to pdf. Initially, I was using Interop to save the document in PDF format, but now that is not an option. I found that LibreOffice can convert documents calling soffice.exe from cmd, and I had wonderful results with normal documents. Still, then, when I tested LibreOffice converter with my dynamic documents, the converter crashed.

I copied one of these documents and opened it with LibreOffice Writer, its structure was wrong, then I opened the same document with Microsoft Word and its structure was fine. Finally, I saved it with Microsoft Word and opened both documents as ZIP files as below:

This is the good one:

Good document structure

And this is the bad one:

Bad document structure

I noticed that when I save the document in Microsoft Word, these Open XML parts (which I called "files" in an earlier version of this question) are appearing. When I open the document previously saved with Microsoft Word in LibreOffice, the document is fine again.

Thus, is there a way to generate these Open XML parts (inside the Word document) without opening Microsoft Word?

I use the following code (to check if it is creating all the files):

        using (MemoryStream mem = new MemoryStream())
        {
            // Create Document
            using (WordprocessingDocument wordDocument =
                WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
            {
                // Add a main document part. 
                MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();

                // Create the document structure and add some text.
                mainPart.Document = new Document();
                Body docBody = new Body();

                // Add your docx content here
                CreateParagraph(docBody);
                CreateStyledParagraph(docBody);
                CreateTable(docBody);
                CreateList(docBody);

                Paragraph pImg = new Paragraph();
                ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
                string imgPath = "https://cdn.pixabay.com/photo/2019/11/15/05/23/dog-4627679_960_720.png";
                HttpWebRequest req = (HttpWebRequest)WebRequest.Create(imgPath);
                req.UseDefaultCredentials = true;
                req.PreAuthenticate = true;
                req.Credentials = CredentialCache.DefaultCredentials;
                HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
                imagePart.FeedData(resp.GetResponseStream());

                // 1500000 and 1092000 are img width and height
                Run rImg = new Run(DrawingManager(mainPart.GetIdOfPart(imagePart), "PictureName", 1500000, 1092000, string.Empty));
                pImg.Append(rImg);
                docBody.Append(pImg);

                Paragraph pLink = new Paragraph();
                // For the mainpart see above
                pLink.Append(HyperLinkManager("http://YourLink", "My awesome link", mainPart));
                docBody.Append(pLink);

                mainPart.Document.Append(docBody);
                mainPart.Document.Save();
                wordDocument.Close();
            }

            result = Convert.ToBase64String(mem.ToArray());
        }

The code above creates a Word document named Result.docx with the following structure:

Result.docx structure

But there aren't any other Open XML parts (like app.xml or styles.xml)


Solution

  • You need to make a difference between:

    • the Open XML standard and its minimum requirements on a WordprocessingDocument and
    • the "minimum" document created by Microsoft Word or other applications.

    As per the standard, the minimum WordprocessingDocument only needs a main document part (MainDocumentPart, document.xml) with the following content:

    <w:document xmlns:w="...">
      <w:body>
        <w:p />
      </w:body>
    </w:document>
    

    Further parts such as the StyleDefinitionsPart (styles.xml) or the NumberingDefintionsPart (numbering.xml) are only required if you have styles or numbering, in which case you must explicitly create them in your code.

    Next, looking at your sample code, it seems you are creating:

    1. paragraphs that reference styles (see CreateStyledParagraph(docBody)), which would have to be defined in the StyleDefinitionsPart (styles.xml); and
    2. numbered lists (e.g., CreateList(docBody)), which would have to be defined in the NumberingDefinitionsPart (numbering.xml).

    However, your code neither creates a StyleDefinitionsPart nor a NumberingDefintionsPart, which means your document is likely not a valid Open XML document.

    Now, Word is very forgiving and fixes various issues silently, ignoring parts of your Open XML markup (e.g., the styles you might have assigned to your paragraphs).

    By contrast, depending on how fault-tolerant LibreOffice is, invalid Open XML markup might lead to a crash. For example, if LibreOffice simply assumes that a StyleDefinitionsPart exists when it finds an element like <w:pStyle w:val="MyStyleName" /> in your w:document and then does not check whether it gets a null reference when asking for the StyleDefinitionsPart, it could crash.

    Finally, to add parts to your Word document, you would use the Open XML SDK as follows:

    [Fact]
    public void CanAddParts()
    {
        const string path = "Document.docx";
        const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
    
        using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
    
        // Create minimum main document part.
        MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
        mainDocumentPart.Document = new Document(new Body(new Paragraph()));
    
        // Create empty style definitions part.
        var styleDefinitionsPart = mainDocumentPart.AddNewPart<StyleDefinitionsPart>();
        styleDefinitionsPart.Styles = new Styles();
    
        // Create empty numbering definitions part.
        var numberingDefinitionsPart = mainDocumentPart.AddNewPart<NumberingDefinitionsPart>();
        numberingDefinitionsPart.Numbering = new Numbering();
    }