I just cannot seem to get it.... I have a string of text that I need to extract a repeating pattern from but I can only get a small part of it, or I get a single match of the whole string...
The string is a concatenation of "markers" plus "content" and I need to extract each marker, and its content.
string s = "T: 2 YE I: 4 YE";
Match m = Regex.Match(s, "(?'marker'(T|I)):(?'content'.+)");
while (m.Success)
{
string Marker = m.Groups["marker"].value; // (T: or I:)
string Content = m.Groups["content"].value; // (2 YE or 4 YE)
m = m.NextMatch();
}
I've tried both ".+" and ".+?" for max/min capture but I either get 2 matches that have markers but no content, or one match with WHOLE input string.
Any pointers please :)
(?'marker'(T|I)):(?'content'.+)
Won't work because .+
will consume the entire rest of the line (it's matches greedily, and there's nothing preventing it consuming the rest of the line).
(?'marker'(T|I)):(?'content'.+?)
The .+?
will only consume a single character and then stop, since it reluctantly matches.
You'll need to be able to specify when "content" ends. I really don't understand the format you've provided well enough to be sure I know the correct way to do this, but assuming that any number of capital letter followed by a colon (like "T:", "ST:", or "ORANGUTANS:") qualify as a marker, this should work:
([A-Z]+:)(((?![A-Z]+:).)+)
Which uses a negative lookahead to recognize where the next marker begins. The first and second capturing groups should capture the marker and content respectively.
I'm not so familiar with the syntax your using to name the capturing groups, but I believe this should work:
(?'marker'[A-Z]+:)(?'content'((?![A-Z]+:).)+)