I'm creating WordprocessingDocument
s in C# with the Open XML SDK and then converting them to pdf. Initially, I was using Interop to save the document in PDF format, but now that is not an option. I found that LibreOffice can convert documents calling soffice.exe from cmd, and I had wonderful results with normal documents. Still, then, when I tested LibreOffice converter with my dynamic documents, the converter crashed.
I copied one of these documents and opened it with LibreOffice Writer, its structure was wrong, then I opened the same document with Microsoft Word and its structure was fine. Finally, I saved it with Microsoft Word and opened both documents as ZIP files as below:
This is the good one:
And this is the bad one:
I noticed that when I save the document in Microsoft Word, these Open XML parts (which I called "files" in an earlier version of this question) are appearing. When I open the document previously saved with Microsoft Word in LibreOffice, the document is fine again.
Thus, is there a way to generate these Open XML parts (inside the Word document) without opening Microsoft Word?
I use the following code (to check if it is creating all the files):
using (MemoryStream mem = new MemoryStream())
{
// Create Document
using (WordprocessingDocument wordDocument =
WordprocessingDocument.Create(mem, WordprocessingDocumentType.Document, true))
{
// Add a main document part.
MainDocumentPart mainPart = wordDocument.AddMainDocumentPart();
// Create the document structure and add some text.
mainPart.Document = new Document();
Body docBody = new Body();
// Add your docx content here
CreateParagraph(docBody);
CreateStyledParagraph(docBody);
CreateTable(docBody);
CreateList(docBody);
Paragraph pImg = new Paragraph();
ImagePart imagePart = mainPart.AddImagePart(ImagePartType.Jpeg);
string imgPath = "https://cdn.pixabay.com/photo/2019/11/15/05/23/dog-4627679_960_720.png";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(imgPath);
req.UseDefaultCredentials = true;
req.PreAuthenticate = true;
req.Credentials = CredentialCache.DefaultCredentials;
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
imagePart.FeedData(resp.GetResponseStream());
// 1500000 and 1092000 are img width and height
Run rImg = new Run(DrawingManager(mainPart.GetIdOfPart(imagePart), "PictureName", 1500000, 1092000, string.Empty));
pImg.Append(rImg);
docBody.Append(pImg);
Paragraph pLink = new Paragraph();
// For the mainpart see above
pLink.Append(HyperLinkManager("http://YourLink", "My awesome link", mainPart));
docBody.Append(pLink);
mainPart.Document.Append(docBody);
mainPart.Document.Save();
wordDocument.Close();
}
result = Convert.ToBase64String(mem.ToArray());
}
The code above creates a Word document named Result.docx with the following structure:
But there aren't any other Open XML parts (like app.xml
or styles.xml
)
You need to make a difference between:
WordprocessingDocument
andAs per the standard, the minimum WordprocessingDocument
only needs a main document part (MainDocumentPart
, document.xml
) with the following content:
<w:document xmlns:w="...">
<w:body>
<w:p />
</w:body>
</w:document>
Further parts such as the StyleDefinitionsPart
(styles.xml
) or the NumberingDefintionsPart
(numbering.xml
) are only required if you have styles or numbering, in which case you must explicitly create them in your code.
Next, looking at your sample code, it seems you are creating:
CreateStyledParagraph(docBody)
), which would have to be defined in the StyleDefinitionsPart
(styles.xml
); andCreateList(docBody)
), which would have to be defined in the NumberingDefinitionsPart
(numbering.xml
).However, your code neither creates a StyleDefinitionsPart
nor a NumberingDefintionsPart
, which means your document is likely not a valid Open XML document.
Now, Word is very forgiving and fixes various issues silently, ignoring parts of your Open XML markup (e.g., the styles you might have assigned to your paragraphs).
By contrast, depending on how fault-tolerant LibreOffice is, invalid Open XML markup might lead to a crash. For example, if LibreOffice simply assumes that a StyleDefinitionsPart
exists when it finds an element like <w:pStyle w:val="MyStyleName" />
in your w:document
and then does not check whether it gets a null
reference when asking for the StyleDefinitionsPart
, it could crash.
Finally, to add parts to your Word document, you would use the Open XML SDK as follows:
[Fact]
public void CanAddParts()
{
const string path = "Document.docx";
const WordprocessingDocumentType type = WordprocessingDocumentType.Document;
using WordprocessingDocument wordDocument = WordprocessingDocument.Create(path, type);
// Create minimum main document part.
MainDocumentPart mainDocumentPart = wordDocument.AddMainDocumentPart();
mainDocumentPart.Document = new Document(new Body(new Paragraph()));
// Create empty style definitions part.
var styleDefinitionsPart = mainDocumentPart.AddNewPart<StyleDefinitionsPart>();
styleDefinitionsPart.Styles = new Styles();
// Create empty numbering definitions part.
var numberingDefinitionsPart = mainDocumentPart.AddNewPart<NumberingDefinitionsPart>();
numberingDefinitionsPart.Numbering = new Numbering();
}