Given the following string
1080s: 33, 6'2" meg: test. 1748s: I THINK IM GONNA <span class="highlight" >PICK</span> 1749s: TWO COMPLETE OPPOSITES.
I want to do regex operation on it, and want the following matches
1st match : 1080s: 33, 6'2" meg: test.
2nd match : 1748s: I THINK IM GONNA <span class="highlight" >PICK</span>
3rd match : 1749s: TWO COMPLETE OPPOSITES.
I am using the following regular expression in ASP.NET to perform the match
MatchCollection mcs = Regex.Matches(txtData, "(\\d*)(s:)([^(\\d*)](s:){0})*");
The regex will match, but the captures are incorrect. The regex skips the text as soon as it find \d*
or s:
. I want it to skip if and only if \d*s:
is found together.
I tried it a few different ways but still haven't found how to combine both \d*
and s:
in a not operator.
You can use regex positive lookahead as suggested by @Ilya,
var pattern = @"\b(?=\s*\d{0,4}s:)";
var lines = new Regex(pattern).Split(input).Where(
s =>
string.IsNullOrEmpty(s.Trim()) == false).ToArray();
Explanation
\b(?=\s*\d{0,4}s:)
-> Starting with a word boundary, Match a suffix but exclude it from capture. The suffix is defined as 'Any number of whitespace characters followed by digit of length 0 to 4, followed by s, and then followed by :
.
Once the input has been split, then clean the input to remove empty entries.