Search code examples
asp.netregexnot-operator

Not operator in regular expression


Given the following string

1080s: 33, 6&apos;2&quot; meg: test. 1748s: I THINK IM GONNA <span class="highlight" >PICK</span> 1749s: TWO COMPLETE OPPOSITES.

I want to do regex operation on it, and want the following matches

1st match : 1080s: 33, 6&apos;2&quot; meg: test. 
2nd match : 1748s: I THINK IM GONNA <span class="highlight" >PICK</span> 
3rd match : 1749s: TWO COMPLETE OPPOSITES.

I am using the following regular expression in ASP.NET to perform the match

MatchCollection mcs = Regex.Matches(txtData, "(\\d*)(s:)([^(\\d*)](s:){0})*");

The regex will match, but the captures are incorrect. The regex skips the text as soon as it find \d* or s:. I want it to skip if and only if \d*s: is found together.

I tried it a few different ways but still haven't found how to combine both \d* and s: in a not operator.


Solution

  • You can use regex positive lookahead as suggested by @Ilya,

    var pattern = @"\b(?=\s*\d{0,4}s:)";
    var lines = new Regex(pattern).Split(input).Where(
        s => 
        string.IsNullOrEmpty(s.Trim()) == false).ToArray();
    

    Explanation
    \b(?=\s*\d{0,4}s:) -> Starting with a word boundary, Match a suffix but exclude it from capture. The suffix is defined as 'Any number of whitespace characters followed by digit of length 0 to 4, followed by s, and then followed by :.
    Once the input has been split, then clean the input to remove empty entries.