I'm using the below regex to find most of the matches I'm looking for where I'm trying to mark any 'ml' proceeded by up to a possible 9-digit number, as long as that number isn't in the 1900-2100 range. The issue I'm having, and I don't even know if this is possible in the same regex is '19750ml'. Any time there is a QUOTE character followed by a 2-digit number, I don't want that part selected. This regex is matching on the '0ml' as the value 1975 proceeds it, but in this single quote followed by the 2-digit number, I would like it to mark '750ml'. In other words, can the QUOTE followed by the 2-digit number have the highest priority? This is being used in Microsoft SQL 2019 as a function.
(?i)(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d{2}))(?<!\b(?:1(?=9\d{2}))|2(?=[01]\d{2}))(?!\b(?:19|2[01])\d\d)\d{0,9}ml$
Here are some examples and what the above regex is matching on:
Mary had a little lamb 1980750ml
Test 19819ml
Test 198218ml
Test 2123456ml
Test 20349876ml
Test 209912345ml
Test 1999123456ml
Test 987654321ml
Test '19750ml <--- This guy is my issue
Test 1988ml
Test 9999ml
Test 2000ml
Test 100ml
Test '2529ml <--- I would like it to mark '29ml' keeping the quote followed by 2 numeric digits
You could extend the regex with these additional restrictions:
The match should not start immediately after a quote when the match starts with two digits.
The match should not start immediately after a quote and digit when the match starts with a digit.
For encoding these two constraints in the regex we can use (?<!'(?=\d\d)|'\d(?=\d))
as an additional look-behind to the ones already in the regex.
We could add this allowance (as exception to already existing rules):
We can encode this with (?<='\d\d)
as an alternative to all other look-behind restrictions.
This leads to this regex:
(?i)(?:(?<!\b(?:19|2[01])\d(?=\d))(?<!\b(?:19|2[01])(?=\d\d))(?<!\b(?:1(?=9\d\d))|2(?=[01]\d\d))(?!\b(?:19|2[01])\d\d)(?<!'(?=\d\d)|'\d(?=\d))|(?<='\d\d))\d{0,9}ml$
See it on regex101