Search code examples
ms-wordopenxmlopenxml-sdk

Adding Runs to Paragraphs


I'm trying to convert xml formatted with tags to a DOCX file. I'm not generating a new document, but inserting text in a template document.

<p id="_fab91699-6d85-4ce5-b0b5-a17197520a7f">This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, <em>Information and documentation</em>, WG 3 <em>Conversion of written languages</em>.</p>

I collected the text fragments in an array, then tried to process them with code like this:

foreach (var bkmkStart in wordDoc.MainDocumentPart.RootElement.Descendants<BookmarkStart>())
{
        if (bkmkStart.Name == "ForewordText")
        {
                forewordbkmkParent = bkmkStart.Parent;
                for (var y = 0; y <= ForewordArray.Length / (double)2 - 1; y++)
                {
                        if (ForewordArray[0, y] == "Normal")
                        {
                                if (y < ForewordArray.Length / (double)2 - 1)
                                {
                                        if (ForewordArray[0, y + 1] == "Normal")
                                        {
                                                forewordbkmkParent.InsertBeforeSelf(new Paragraph(new Run(new Text(ForewordArray[1, y]))));
                                        }
                                        else
                                        {
                                                fPara = forewordbkmkParent.InsertBeforeSelf(new Paragraph(new Run(new Text(ForewordArray[1, y]))));
                                        }
                                }
                                else
                                {
                                        fPara.InsertAfter(new Run(new Text(ForewordArray[1, y])), fPara.GetFirstChild<Run>());
                                }
                        }
                        else
                        {
                                NewRun = forewordbkmkParent.InsertBeforeSelf(new Run());
                                NewRunProps = new RunProperties();
                                NewRunProps.AppendChild<Italic>(new Italic());
                                NewRun.AppendChild<RunProperties>(NewRunProps);
                                NewRun.AppendChild(new Text(ForewordArray[1, y]));
                        }
                }
        }
}

but I end up with malformed XML because the runs are inserted after the paragraphs instead of inside them:

<w:p>
    <w:r>
        <w:t>This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, </w:t>
    </w:r>
</w:p>
<w:r>
    <w:rPr>
        <w:i />
    </w:rPr>
    <w:t>Information and documentation</w:t>
</w:r>
<w:p>
    <w:r>
        <w:t>, WG 3 </w:t>
    </w:r>
    <w:r>
        <w:t>.</w:t>
    </w:r>
</w:p>
<w:r>
    <w:rPr>
        <w:i />
    </w:rPr>
    <w:t>Conversion of written languages</w:t>
</w:r>

Doing this the right way, using the SDK, would be best. As an alternative, I was able to create a string with all the correct XML and text using regexes, but I can't find a WordprocessingDocument method to turn that into an XML fragment that I can insert.


Solution

  • The solution for this kind of problem is to perform a pure functional transformation, as shown in the following code example.

    The code example uses the sample XML element <p> given in the question (see Xml constant below). It transforms it into a corresponding Open XML w:p element, i.e., a Paragraph instance in terms of the strongly-typed classes provided by the Open XML SDK. The expected outer XML of that w:p or Paragraph is defined by the OuterXml constant.

    using System;
    using System.Linq;
    using System.Xml.Linq;
    using DocumentFormat.OpenXml;
    using DocumentFormat.OpenXml.Wordprocessing;
    using Xunit;
    
    namespace CodeSnippets.Tests.OpenXml.Wordprocessing
    {
        public class XmlTransformationTests
        {
            private const string Xml =
                @"<p id=""_fab91699-6d85-4ce5-b0b5-a17197520a7f"">" +
                @"This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, " +
                @"<em>Information and documentation</em>" +
                @", WG 3 " +
                @"<em>Conversion of written languages</em>" +
                @"." +
                @"</p>";
    
            private const string OuterXml =
                @"<w:p xmlns:w=""http://schemas.openxmlformats.org/wordprocessingml/2006/main"">" +
                @"<w:r><w:t xml:space=""preserve"">This document is amongst a series of International Standards dealing with the conversion of systems of writing produced by Technical Committee ISO/TC 46, </w:t></w:r>" +
                @"<w:r><w:rPr><w:i /></w:rPr><w:t>Information and documentation</w:t></w:r>" +
                @"<w:r><w:t xml:space=""preserve"">, WG 3 </w:t></w:r>" +
                @"<w:r><w:rPr><w:i /></w:rPr><w:t>Conversion of written languages</w:t></w:r>" +
                @"<w:r><w:t>.</w:t></w:r>" +
                @"</w:p>";
    
            [Fact]
            public void CanTransformXmlToOpenXml()
            {
                // Arrange, creating an XElement based on the given XML.
                var xmlParagraph = XElement.Parse(Xml);
    
                // Act, transforming the XML into Open XML.
                var paragraph = (Paragraph) TransformElementToOpenXml(xmlParagraph);
    
                // Assert, demonstrating that we have indeed created an Open XML Paragraph instance.
                Assert.Equal(OuterXml, paragraph.OuterXml);
            }
    
            private static OpenXmlElement TransformElementToOpenXml(XElement element)
            {
                return element.Name.LocalName switch
                {
                    "p" => new Paragraph(element.Nodes().Select(TransformNodeToOpenXml)),
                    "em" => new Run(new RunProperties(new Italic()), CreateText(element.Value)),
                    "b" => new Run(new RunProperties(new Bold()), CreateText(element.Value)),
                    _ => throw new ArgumentOutOfRangeException()
                };
            }
    
            private static OpenXmlElement TransformNodeToOpenXml(XNode node)
            {
                return node switch
                {
                    XElement element => TransformElementToOpenXml(element),
                    XText text => new Run(CreateText(text.Value)),
                    _ => throw new ArgumentOutOfRangeException()
                };
            }
    
            private static Text CreateText(string text)
            {
                return new Text(text)
                {
                    Space = text.Length > 0 && (char.IsWhiteSpace(text[0]) || char.IsWhiteSpace(text[^1]))
                        ? new EnumValue<SpaceProcessingModeValues>(SpaceProcessingModeValues.Preserve)
                        : null
                };
            }
        }
    }
    

    The above sample deals with <p> (paragraph), <em> (emphasis / italic), and <b> (bold) elements. Adding further formatting elements (e.g., underlining) is easy.

    Note that the sample code makes the simplifying assumption that <em>, <b>, and potentially further formatting elements are not nested. Adding the capability to nest those elements would make the sample code a little more complicated (but it's obviously possible).