Using C# RegEx, I am trying to find text enclosed by two distinct pairs of words, say, start1....end1, and start2...end2. In my example below I would like to get: text1, text2, text11, text22.
string str = "This start1 text1 end1. And start2 text2 end2 is a test. This start1 text11 end1. And start2 text22 end2 is a test.";
Regex oRegEx = new Regex(@"start1(.*?)end1|start2(.*?)end2", RegexOptions.IgnoreCase);
MatchCollection oMatches = oRegEx.Matches(sHTML);
if (oMatches.Count > 0)
{
foreach (Match mt in oMatches)
{
Console.WriteLine(mt.Value); //the display includes the start1 and end1 (or start2 and end2)
Console.WriteLine(mt.Groups[1].Value); //the display excludes the start1 and end1 (or start2 and end2) or displays an empty string depending on the order of pattern.
}
}
mt.Groups[1].Value
in the above code correctly displays text1, text11 if the pattern is @"start1(.*?)end1|start2(.*?)end2"
but it displays empty strings for text2, and text22. On the other hand if I change order in the pattern to @"start2(.*?)end2|start1(.*?)end1"
, it correctly displays text2, text22 but displays empty strings for text1 and text11. What needs to change in my code?
This MSDN article explains something about when a group returns empty string but I am still not getting the desired results.
Give name to group.
start1(?<val>.*?)end1|start2(?<val>.*?)end2
And get value as:
mt.Groups["val"].Value
The original problem is that without names the group between start1
and end1
has index 1
, and group between start2
and end2
has index 2
, as you can see from the following picture:
Or another solution is to use regex like:
(?<=start([12])).*?(?=end\1)
And then in your code:
Console.WriteLine(mt.Value);
will display the required content.