Search code examples
sqlregex

Regex to match up to 9 num. chars followed by 'ml' and those 9 num. chars don't fall between 1900-2199, but not a Quote char followed by 2 num. digit


I'm using the below regex to find most of the matches I'm looking for where I'm trying to mark any 'ml' proceeded by up to a possible 9-digit number, as long as that number isn't in the 1900-2100 range. The issue I'm having, and I don't even know if this is possible in the same regex is '19750ml'. Any time there is a QUOTE character followed by a 2-digit number, I don't want that part selected. This regex is matching on the '0ml' as the value 1975 proceeds it, but in this single quote followed by the 2-digit number, I would like it to mark '750ml'. In other words, can the QUOTE followed by the 2-digit number have the highest priority? This is being used in Microsoft SQL 2019 as a function.
(?i)(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d{2}))(?<!\b(?:1(?=9\d{2}))|2(?=[01]\d{2}))(?!\b(?:19|2[01])\d\d)\d{0,9}ml$

Here are some examples and what the above regex is matching on:

Mary had a little lamb 1980750ml
Test 19819ml
Test 198218ml
Test 2123456ml
Test 20349876ml
Test 209912345ml
Test 1999123456ml
Test 987654321ml
Test '19750ml <--- This guy is my issue
Test 1988ml
Test 9999ml
Test 2000ml
Test 100ml
Test '2529ml <--- I would like it to mark '29ml' keeping the quote followed by 2 numeric digits


Solution

  • You could extend the regex with these additional restrictions:

    • The match should not start immediately after a quote when the match starts with two digits.

    • The match should not start immediately after a quote and digit when the match starts with a digit.

    For encoding these two constraints in the regex we can use (?<!'(?=\d\d)|'\d(?=\d)) as an additional look-behind to the ones already in the regex.

    We could add this allowance (as exception to already existing rules):

    • The match may start when it is preceded by a quote and two digits.

    We can encode this with (?<='\d\d) as an alternative to all other look-behind restrictions.

    This leads to this regex:

    (?i)(?:(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d\d))(?<!\b(?:1(?=9\d\d))|2(?=[01]\d\d))(?!\b(?:19|2[01])\d\d)(?<!'(?=\d\d)|'\d(?=\d))|(?<='\d\d))\d{0,9}ml$
    

    See it on regex101