Search code examples
xmlopenoffice.org

Office Open XML SDK word replacing


For creating word documents based on data from an SQL database, I'm using Office Open XML SDK to avoid using interop. This speeds up the process and it eliminates the requirement for a Microsoft office suite installed on the client system.

While this works very well, there is a problem I'm having when replacing certain text in the document. To keep customization of the final document an option, I've created a document with some tags in it as a template. This template contains tags such as [TagHere]. Since the tag names should be easy readable, they could be used throughout the document, which is why I've surrounded the tag with braces [].

This works quite well, but sometimes, an issue comes up. When you're typing in a docx document, the text can be split up into multiple tags, even in the same word. A tag like [TagHere] can be split up into

<tag>[</tag><tag>TagHere</tag><tag>]</tag>

When this happens, the replacement won't work.

Now the docx format has some alternative options to do this kind of operations, such as Content Controls, but these make the process of creating the template more complex. Furtermore, it is not uncommon in these documents to get one row of a table with tags and copy it multiple of times, which would probably break the content tag principle. Hence I've chosen to not use this option.

It would be great if someone has a solution to this problem.


Solution

  • instead of typing plain text "taghere", insert a merge field. (in word, click insert > quick parts > field. choose "mergefield" and type "TagHere" in the "Field name" field.)

    then instead of doing a text find-replace, scan the document for merge fields and set the inner texts.

    class Program
    {
        static void Main(string[] args)
        {
            string document = args[0];
            using (WordprocessingDocument wordDoc = WordprocessingDocument.Open(document, true))
            {
                Dictionary<string, string> replaceOperations = new Dictionary<string, string>();
    
                replaceOperations.Add("company", "alex's applications");
                replaceOperations.Add("first_name", "alexander");
                replaceOperations.Add("last_name", "taylor");
                //etc
    
                Replace(wordDoc, replaceOperations);
            }
        }
    
        public static char[] splitChar = new char[] {' '};
        public static void Replace(WordprocessingDocument document, Dictionary<string, string> replaceOperations)
        {
            //find all the fields
            foreach (var field in document.MainDocumentPart.Document.Body.Descendants<SimpleField>())
            {
                //parse the instruction
                string[] instruction = field.Instruction.Value.Split(splitChar, StringSplitOptions.RemoveEmptyEntries);
    
                //check if it's a merge field, and if so...
                if (instruction[0].ToLower().Equals("mergefield"))
                {
                    //get the field name
                    string fieldname = instruction[1];
    
                    //find the text inside (there will only be one)
                    foreach (var fieldtext in field.Descendants<Text>())
                    {
                        //see if we know what to set this value to
                        string value = replaceOperations.ContainsKey(fieldname) ? replaceOperations[fieldname] : null;
    
                        //if we found the replace value, set the text to this value
                        if (value != null)
                            fieldtext.Text = value;
    
                        //should only be one text inside
                        break;
                    }
                }
            }
        }
    }