Search code examples
c#regexrtf

Using a regular expression in C# to match rtf-formatted text


I need to use a regexp to extract bold text from rtf formatted text. For example: The \b brown fox\b0 jumped over the \b lazy dog\b0.

How can I get only the text enclosed between \b and \b0? I tried this expression but it returned only the first match: (\\b.+\b0[^\\b])


Solution

  • string s = @"The \b brown fox\b0 jumped over the \b lazy dog\b0";
    
    Regex rgx = new Regex(@"\\b(.*?)\\b0");
    foreach (Match m in rgx.Matches(s))
    {
        Console.WriteLine(m.Groups[1].Value);
    }
    

    Alternatively you can use captures:

    string s = @"The \b brown fox\b0 jumped over the \b lazy dog\b0";
    
    Regex rgx = new Regex(@"(.*?\\b(.*?)\\b0)*");
    foreach (Capture c in rgx.Match(s).Groups[2].Captures)
    {
        Console.WriteLine(c.Value);
    }