I have currently the following regex
ZL[^0-9].{16}_.{3}PAD_N.{26}\.PIC
which matches filenames like
ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC
but would like to change the regex so that the 9 characters at the position SALP09217
can not take the ranges
SALP00000-00899, SALP01000-03099 and SALP05000-06999
(note that SALT00000-00899
, or any other substring other than SALP
are allowed only those that start with SALP
are to be excluded)
The following regex works partially
ZL[^0-9].{16}_.{3}PAD_N.{7}(?!(SALP00[0-8][0-9][0-9])|(SALP0[1-3]0[0-9][0-9])|(SALP0[5-6][0-9][0-9][0-9])).*\.PIC
but will allow strings larger than the original regex would allow. For example it allows
ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC
which is correct but also
ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01LARGER.PIC
which is not
The "ideal" regex would be
ZL[^0-9].{16}_.{3}PAD_N.{7}(?!(SALP00[0-8][0-9][0-9])|(SALP0[1-3]0[0-9][0-9])|(SALP0[5-6][0-9][0-9][0-9])).{10}\.PIC
but
ZLF_1177_0771428479_534PAD_N0530130SALP09217_1100LMV01.PIC
will not be a match.
Any suggestions?
The negative lookahead can be much simpler than Wiktor's answer.
Given exclusion ranges:
SALP00000-00899
SALP01000-03099
SALP05000-06999
it is clear that all start SALP0
and end [0-9]{2}
.
Then the remaining two digits are:
0[0-8]
1[0-9] 2[0-9] 30
5[0-9] 6[0-9]
which can be regrouped:
0[0-8]
1[0-9] 2[0-9] 5[0-9] 6[0-9]
30
and combined into: 0[0-8]|[1256][0-9]|30
.
So the whole negative lookahead is just:
(?!SALP0(0[0-8]|[1256][0-9]|30)[0-9]{2})
It is incorporated by splitting .{26}
in two at the appropriate offset, as stated. Note that lookarounds consume no characters, so total length does not change:
ZL[^0-9].{16}_.{3}PAD_N.{7}(?!SALP0(?:0[0-8]|[1256][0-9]|30)[0-9]{2}).{19}\.PIC