Search code examples
c#regexstringstring-search

C# Regex find string between two different pairs of strings


Using C# RegEx, I am trying to find text enclosed by two distinct pairs of words, say, start1....end1, and start2...end2. In my example below I would like to get: text1, text2, text11, text22.

string str = "This start1 text1 end1. And start2 text2 end2 is a test. This start1 text11 end1. And start2 text22 end2 is a test.";

Regex oRegEx = new Regex(@"start1(.*?)end1|start2(.*?)end2", RegexOptions.IgnoreCase);
MatchCollection oMatches = oRegEx.Matches(sHTML);
if (oMatches.Count > 0)
{
    foreach (Match mt in oMatches)
    {
        Console.WriteLine(mt.Value);     //the display includes the start1 and end1 (or start2 and end2)
        Console.WriteLine(mt.Groups[1].Value); //the display excludes the start1 and end1 (or start2 and end2) or displays an empty string depending on the order of pattern.
    }
}

mt.Groups[1].Value in the above code correctly displays text1, text11 if the pattern is @"start1(.*?)end1|start2(.*?)end2" but it displays empty strings for text2, and text22. On the other hand if I change order in the pattern to @"start2(.*?)end2|start1(.*?)end1", it correctly displays text2, text22 but displays empty strings for text1 and text11. What needs to change in my code? This MSDN article explains something about when a group returns empty string but I am still not getting the desired results.


Solution

  • Give name to group.

    start1(?<val>.*?)end1|start2(?<val>.*?)end2
    

    And get value as:

    mt.Groups["val"].Value
    

    The original problem is that without names the group between start1 and end1 has index 1, and group between start2 and end2 has index 2, as you can see from the following picture: Regular expression visualization

    Or another solution is to use regex like:

    (?<=start([12])).*?(?=end\1)
    

    Regular expression visualization

    Debuggex Demo

    And then in your code:

    Console.WriteLine(mt.Value);
    

    will display the required content.