Search code examples
regexnotepad++

Match specific string from the first 3 characters and specific string anywhere from the 4th characters


Problem I want to capture the string if the first 3 char is YZA, YZB, and YZB and another string is NUM12345 located anywhere after the first 3 chars.

Regex I am using that does not match the strings with the format described above:

^YZ[ABC]\sNUM12345

Here are the sample lines:

YZC/GH/A1/M,KNUM12345
YZB/M,SD-GG,K*NUM12345/A2
YZA/A1/M,SD-GG,KNUM12345/A2
YZB/M,SD-GG,K*NUM12345/A2/AA
YZA/A1/M,SD-GG,KNUM12345/A2/A2A
YZW/GH/A1/M,KNUM12345
YZR/M,SD-GG,K*NUM12345/A2
YZS/A1/M,SD-GG,KNUM12345/A2
YZT/M,SD-GG,K*NUM12345/A2/AA
YZJ/A1/M,SD-GG,KNUM12345/A2/A2A

The lines I am trying to match:

YZC/GH/A1/M,KNUM12345
YZB/M,SD-GG,K*NUM12345/A2
YZA/A1/M,SD-GG,KNUM12345/A2
YZB/M,SD-GG,K*NUM12345/A2/AA
YZA/A1/M,SD-GG,KNUM12345/A2/A2A

Anyone can help correct the pattern below?


Solution

  • You can use

    ^YZ[ABC].*NUM12345.*
    

    See the regex demo:

    enter image description here

    If you want to remove all lines other than the ones that match the above pattern, you can use

    ^(?!YZ[ABC].*NUM12345).*\R*
    

    See the regex demo:

    enter image description here

    Details:

    • ^YZ[ABC].*NUM12345.* - start of string, YZ, A / B or C, then any zero or more chars (other than line break chars) as many as possible, NUM12345 and then any zero or more chars (other than line break chars) as many as possible.
    • ^(?!YZ[ABC].*NUM12345).*\R* - start of string, then a negative lookahead that fails the match if the above pattern is matched, and then any zero or more chars (other than line break chars) as many as possible and then any zero or more line break sequences.