Search code examples
c#openxmlwordprocessingml

How to highlight text in a sentence using OpenXML?


I am using below code to search and highlight text in a MS Word document, it works fine for point 1 but not point 2:

1. John Alter 

I search for Alter or John, it highlights John/Alter - works.

2. I am going to school

I search for going, it highlights going but it changes its order as I am to school going - does not work.

How to fix point 2? Below is my code.

private void HighLightText(Paragraph paragraph, string text)
{
    string textOfRun = string.Empty;
    var runCollection = paragraph.Descendants<DocumentFormat.OpenXml.Wordprocessing.Run>();
    DocumentFormat.OpenXml.Wordprocessing.Run runAfter = null;

    //find the run part which contains the characters
    foreach (DocumentFormat.OpenXml.Wordprocessing.Run run in runCollection)
    {
        if (!string.IsNullOrWhiteSpace(paragraph.InnerText) &&  paragraph.InnerText != "\\s")
            textOfRun = run.GetFirstChild<DocumentFormat.OpenXml.Wordprocessing.Text>().Text;                                  

         if (textOfRun.IndexOf(text, StringComparison.OrdinalIgnoreCase) >= 0)
         {    
             //remove the character from this run part
             run.GetFirstChild<DocumentFormat.OpenXml.Wordprocessing.Text>().Text = Regex.Replace(textOfRun, text, string.Empty, RegexOptions.IgnoreCase);//textOfRun.Replace(text, string.Empty);
             runAfter = run;
             break;    
         }    
     }

     //create a new run with your customization font and the character as its text
     DocumentFormat.OpenXml.Wordprocessing.Run HighLightRun = new DocumentFormat.OpenXml.Wordprocessing.Run();
     DocumentFormat.OpenXml.Wordprocessing.RunProperties runPro = new DocumentFormat.OpenXml.Wordprocessing.RunProperties();
     Highlight highlight = new Highlight() { Val = HighlightColorValues.Yellow };
     DocumentFormat.OpenXml.Wordprocessing.Text runText = new DocumentFormat.OpenXml.Wordprocessing.Text() { Text = text };

     runPro.Append(highlight);
     HighLightRun.Append(runPro);
     HighLightRun.Append(runText);

     //insert the new created run part
     paragraph.InsertAfter(HighLightRun, runAfter);    
}

Solution

  • You need to split-up your Run if you want to highlight some text in the middle of that Run. So replacing the search text with an empty string won't work.

    Your original text structure looks like:

    <Run>
        <Text>
            I am going to school
        </Text>
    </Run>
    

    If you want to highlight the going word, you need to make a more complex structure out of it:

    <Run>
        <Text>
            I am 
        </Text>
    </Run>
    <Run>
        <Text>
            going
        </Text>
    </Run>
    <Run>
        <Text>
             to school
        </Text>
    </Run>
    

    Then, the Run in the middle can be set-up for highlighting.

    Here is a working code sample. Please note, there's no error handing in this code! It should give you some idea how to solve your task. Implement the proper exception handing for production usage!

    Also note that this sample only searches for the first occurrence, as it is in your code. If you need to highlight multiple search matches, you will have to improve this code.

    void HighLightText(Paragraph paragraph, string text)
    {
        // Search for a first occurrence of the text in the text runs
        var found = paragraph
            .Descendants<Run>()
            .Where(r => !string.IsNullOrEmpty(r.InnerText) && r.InnerText != "\\s")
            .Select(r =>
            {
                var runText = r.GetFirstChild<Text>();
                int index = runText.Text.IndexOf(text, StringComparison.OrdinalIgnoreCase);
    
                // 'Run' is a reference to the text run we found,
                // TextNode is a reference to the run's Text object,
                // 'TokenIndex` is the index of the search string in run's text
                return new { Run = r, TextNode = runText, TokenIndex = index };
            })                    
            .FirstOrDefault(o => o.TokenIndex >= 0);
    
        // Nothing found -- escape
        if (found == null)
        {
            return;
        }
    
        // Create a node for highlighted text as a clone (to preserve formatting etc)
        var highlightRun = found.Run.CloneNode(true);
    
        // Add the highlight node after the found text run and set up the highlighting
        paragraph.InsertAfter(highlightRun, found.Run);
        highlightRun.GetFirstChild<Text>().Text = text;
        RunProperties runPro = new RunProperties();
        Highlight highlight = new Highlight { Val = HighlightColorValues.Yellow };
    
        runPro.AppendChild(highlight);
        highlightRun.InsertAt(runPro, 0); 
    
        // Check if there's some text in the text run *after* the found text
        int remainderLength = found.TextNode.Text.Length - found.TokenIndex - text.Length;
        if (remainderLength > 0)
        {
            // There is some text after the highlighted section --
            // insert it in a separate text run after the highlighted text run
            var remainderRun = found.Run.CloneNode(true);
            paragraph.InsertAfter(remainderRun, highlightRun);  
            var textNode = remainderRun.GetFirstChild<Text>();
            textNode.Text = found.TextNode.Text.Substring(found.TokenIndex + text.Length);
    
            // We need to set up this to preserve the spaces between text runs
            textNode.Space = new EnumValue<SpaceProcessingModeValues>(SpaceProcessingModeValues.Preserve);
        }
    
        // Check if there's some text *before* the found text
        if (found.TokenIndex > 0)
        {
            // Something is left before the highlighted text,
            // so make the original text run contain only that portion
            found.TextNode.Text = found.TextNode.Text.Remove(found.TokenIndex);
    
            // We need to set up this to preserve the spaces between text runs
            found.TextNode.Space = new EnumValue<SpaceProcessingModeValues>(SpaceProcessingModeValues.Preserve);  
        }
        else
        {
            // There's nothing before the highlighted text -- remove the unneeded text run
            paragraph.RemoveChild(found.Run);
        }
    }
    

    This code works for highlighting the I, going, or school words in the I am going to school sentence.