Given the following text:
somerandomtext06251/750/somerandomtext/21399/10 79/20 8301
how do I extract 06251/750, 79/20, 8301 and ignore 21399/10 ?
The general rules:
I started with the following match pattern:
(?<invnr>\d{2,}/?\d{2,})
In general, it works, but it has just one problem: it takes also 21399/10. So, I added a negative lookbehind:
(?<!/)(?<invnr>\d{2,}/?\d{2,})
Now it ignores the first digit of 21399/10 (because it is preceded by /), but still it captures all the following characters, that is 1399/10. But I need to skip 21399/10 entirely.
How do I make the lookbehind to make dropping entire match and skipping to the next one instead of skipping just one digit?
You may add a digit pattern inside the negative lookbehind (by combining it with /
using a character class, [/\d]
) to make sure a match can't occur if it immediately follows a digit:
(?<![/\d])\d{2,}(?:/\d{2,})?
See the regex demo
Details
(?<![/\d])
- a negative lookbehind that fails the match if there is /
or a digit immediately to the left of the current location\d{2,}
- two or more digits(?:/\d{2,})?
- an optional sequence of a /
and two or more digits.If you need to make sure you only match ASCII digits, pass the RegexOptions.ECMAScript
option to the regex compiler inside the .NET method, or use [0-9]
instead of \d
.
Note your \d{2,}/?\d{2,}
is a bit off since it won't match 2 or 3 digit sequences, only 4+ digit sequences.