Search code examples
c#.netms-wordopenxml

Replace text in docx file with content of another docx file


I'm trying to use OpenXml to replace a text "Veteran" in file A.docx with content in B.docx . If B.docx contains text or paragraph , it works fine and I get modified A.docx file. However, if B.docx contains a table, then the code doesn't work.

        static void Main(string[] args)
        {
            SearchAndReplace(@"C:\A.docx", @"C:\B.docx");
        }

        public static void SearchAndReplace(string docTo, string docFrom)
        {
            List<WordprocessingDocument> docList = new List<WordprocessingDocument>();
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(docTo, true))
            using (WordprocessingDocument wordDoc1 = WordprocessingDocument.Open(docFrom, true))
            {
                var parts = wordDoc1.MainDocumentPart.Document.Descendants().FirstOrDefault();
                docList.Add(wordDoc);
                docList.Add(wordDoc1);

                if (parts != null)
                {
                    foreach (var node in parts.ChildElements)
                    {
                        if (node is Table)
                        {
                            ParseTable(docList, (Table)node, textBuilder);
                        }
                    }
                }
            }
        }

        public static void ParseText(List<WordprocessingDocument> wpd, Paragraph node, StringBuilder textBuilder)
        {
            Body body = wpd[0].MainDocumentPart.Document.Body;
            Body body1 = wpd[1].MainDocumentPart.Document.Body;

            string content = body1.InnerXml;
            var paras = body.Elements<Paragraph>();

            foreach (var para in paras)
            {
                foreach (var run in para.Elements<Run>())
                {
                    foreach (var text in run.Elements<Text>())
                    {
                        if (text.Text.Contains("Veteran"))
                        {
                            run.InnerXml.Replace(run.InnerXml, content);
                            break;
                        }
                    }
                }
            }
        }

        public static void ParseTable(List<WordprocessingDocument> wpd, Table node, StringBuilder textBuilder)
        {
            foreach (var row in node.Descendants<TableRow>())
            {
                textBuilder.Append("| ");
                foreach (var cell in row.Descendants<TableCell>())
                {
                    foreach (var para in cell.Descendants<Paragraph>())
                    {
                        ParseText(wpd, para, textBuilder);
                    }
                    textBuilder.Append(" | ");
                }
                textBuilder.AppendLine("");
            }
        }
    }
}

How to make this work ? Is there a better way to replace content with another docx file?


Solution

  • Not having enough detail for a specific answer, here's how you solve such problems in general:

    1. Ensure you understand the Open XML specification and valid Open XML markup on an appropriate level of detail.

    2. Understand that most Open XML-related code transforms some source markup into some target markup. Therefore, you must:

      • understand the source and target markup first and then
      • define the transformation required to create the target from the source.

    Depending on what you need to do, the Open XML Productivity Tool can help create the transforming code. If you have a source and target document, you can use the Productivity Tool to compare those documents. This shows the difference in the markup, so you see what markup is created, deleted, or changed. It even shows you the Open XML SDK-based code required to effect the change.

    In my own use cases, I typically prefer to write recursive, pure functional transformations. While you need to wrap your head around the concept, this is an extremely powerful approach.

    In your case, you should:

    • take a few representative, manually-created samples of source (A.docx with "Vetaran" still to be replaced) and target (A.docx with "Veteran" replaced as desired) documents;
    • look at the Open XML markup of the source and target documents; and
    • write code that creates the target markup.

    Once you have created code that at least tries to create valid target Open XML markup, you could come back with further questions in case you identify further issues.