Search code examples
c#xmlopenxmlcpu-word

Replace ContentControl With Contents OpenXML


I am using C# and the Open XML SDK to write XML data into a Word document. I am currently using XML mapping to populate content controls in the document with data from the XML file. However, after the content controls have been populated, I want to get rid of them in the final document, since they are no longer needed and cause errors when trying to parse the document (using another software).

Here's the relevant for loop:

foreach (var control in mainPart.RootElement.Descendants<SdtElement>().ToList())
{
    var text = control.Descendants<Text>().FirstOrDefault()?.Text;

    if (!string.IsNullOrEmpty(text))
    {
        var run = control.Descendants<Run>().FirstOrDefault();
        var newRun = new Run(new Text(text));
        if (run != null)
        {
            // Replace the content control with the new run containing the text
            run.Parent.ReplaceChild(newRun, run);
        }
        else
        {
            // Insert the new run after the content control
            var paragraph = control.Ancestors<Paragraph>().FirstOrDefault();
            if (paragraph != null)
            {
                paragraph.InsertAfter(newRun, control);
            }
        }
    }

    // Remove the content control
    control.Remove();
}

The full function:

public static void CreateWordDocument(string xmlData)
        {
            Console.WriteLine("Setting up paths...");

            // Set up output file name and path
            int index = 1;
            string outputFileName;
            string outputDocumentPath;
            do
            {
                outputFileName = $"GeneratedCV_{index}.docx";
                outputDocumentPath = Path.Combine(Path.GetDirectoryName(outputDirectoryPath), outputFileName);

                index++;
            } while (File.Exists(outputDocumentPath));

            // Copy template document to output path and replace custom XML parts with new XML data

            using var sourceDoc = WordprocessingDocument.Open(templateDocumentPath, false);
            File.Copy(templateDocumentPath, outputDocumentPath);
            using var newDoc = WordprocessingDocument.Open(outputDocumentPath, true);
            newDoc.MainDocumentPart.DocumentSettingsPart.Settings = (DocumentFormat.OpenXml.Wordprocessing.Settings)sourceDoc.MainDocumentPart.DocumentSettingsPart.Settings.Clone();
            newDoc.MainDocumentPart.StyleDefinitionsPart.Styles = (DocumentFormat.OpenXml.Wordprocessing.Styles)sourceDoc.MainDocumentPart.StyleDefinitionsPart.Styles.Clone();
            newDoc.MainDocumentPart.StyleDefinitionsPart.Styles.Save();

            var mainPart = newDoc.MainDocumentPart;
            mainPart.DeleteParts<CustomXmlPart>(mainPart.CustomXmlParts);

            // Write the XML data to a file
            string xmlFilePath = Path.Combine(Path.GetDirectoryName(outputDocumentPath), "CustomXml.xml");
            File.WriteAllText(xmlFilePath, xmlData);

            var customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml, "rId1");
            using (var stream = new FileStream(xmlFilePath, FileMode.Open))
            {
                customXmlPart.FeedData(stream);

                
                foreach (var control in mainPart.RootElement.Descendants<SdtElement>().ToList())
                {
                    var text = control.Descendants<Text>().FirstOrDefault()?.Text;

                    if (!string.IsNullOrEmpty(text))
                    {
                        var run = control.Descendants<Run>().FirstOrDefault();
                        var newRun = new Run(new Text(text));
                        if (run != null)
                        {
                            // Replace the content control with the new run containing the text
                            run.Parent.ReplaceChild(newRun, run);
                        }
                        else
                        {
                            // Insert the new run after the content control
                            var paragraph = control.Ancestors<Paragraph>().FirstOrDefault();
                            if (paragraph != null)
                            {
                                paragraph.InsertAfter(newRun, control);
                            }
                        }
                    }

                    // Remove the content control
                    control.Remove();
                }

                



            }

            newDoc.Save();
            newDoc.Close();

            Console.WriteLine("CV generated!");
        }

The problem is that this loop isn't properly replacing the content controls with their text contents. Instead, it just removes the content controls entirely.

The template document I use

How can I modify this loop to properly replace the content controls with their text contents?

I have tried using the "Remove Content Control when Populated" feature on Word and copying the setting over from the template document. However, streaming the XML does not count as "populating" so the content controls remain. I have also tried using a for loop to iterate through each content control and delete it while retaining the text inside, but this either removes all information.


Solution

  • To resolve the issue of content controls causing errors when parsing the final Word document, it may be possible to replace the placeholders used in the document with the appropriate values from the XML file. This can be achieved using XPath to locate the data in the XML file, and then replacing the placeholder text with this data.

    Here is an example of how you could modify the existing code to replace placeholders in the Word document with data from the XML file:

    using var doc = WordprocessingDocument.Open(documentPath, true); var mainPart = doc.MainDocumentPart;

            // Get the XML string from the main part and trim the white spaces
            string xmlString = mainPart.Document.InnerXml.TrimStart();
    
            // Load the trimmed XML string into an XmlDocument
            var xmlDocument = new XmlDocument();
            xmlDocument.LoadXml(xmlString);
    
            var namespaceManager = new XmlNamespaceManager(xmlDocument.NameTable);
            namespaceManager.AddNamespace("w", "http://schemas.openxmlformats.org/wordprocessingml/2006/main");
    
    
    
    
            foreach (var control in mainPart.RootElement.Descendants<SdtElement>().ToList())
            {
                var placeholdertext = control.Descendants<Text>().FirstOrDefault()?.Text;
                var text = "";
                //Console.WriteLine($"Content control before processing: {control.OuterXml}");
    
                var dataBinding = control.Descendants<DataBinding>().FirstOrDefault();
                if (dataBinding != null)
                {
                    var xpath = dataBinding.XPath.Value;
                   // Console.WriteLine($"Data binding XPath: {xpath}");
                    var node = xmlDocument.SelectSingleNode(xpath, namespaceManager);
                    var value = node?.InnerText;
                    text = node?.InnerText;
                  // Console.WriteLine($"Found content control with text: {value}");
                    //Console.WriteLine($"Value found for XPath '{xpath}': {value}");
                }
                else
                {
                   // Console.WriteLine("Content control has no data binding.");
                }
                foreach (var t in control.Descendants<Text>())
                {
                    t.Text = text;
                }
            }
    

    In this code,The WordprocessingDocument.Open() method is called to open the Word document at documentPath in edit mode (true flag). This returns a WordprocessingDocument object, which provides access to the main document part of the Word document through its MainDocumentPart property.

    The InnerXml property of the Document property of the MainDocumentPart is used to obtain the XML string representation of the Word document's contents. This XML string is then trimmed to remove any leading white spaces before being loaded into an XmlDocument object.

    An XmlNamespaceManager object is created to manage the namespaces used in the XML document. In this case, it adds the namespace "w" with the value "http://schemas.openxmlformats.org/wordprocessingml/2006/main".

    A foreach loop iterates through all SdtElement descendants of the root element of the main part. SdtElement represents a content control in the Word document.

    For each content control, the code checks whether it has a DataBinding child element. If it does, it extracts the XPath expression from the DataBinding element, then uses the SelectSingleNode() method of the XmlDocument object to retrieve the XML node that matches the XPath expression. The inner text of that node is then assigned to the text variable. If the content control has no DataBinding child, then text is left empty.

    Finally, the code iterates through all Text descendants of the content control and sets their Text property to the text variable. This effectively populates the content control with data from the XML file.