Search code examples
regexnotepad++

Regex to find instances of a string pattern that is not preceded by a carriage return/line feed


I'm working in Notepad++

In the file that I'm working with there is a string pattern of [0-9][0-9]-[0-9][0-9][0-9][0-9]| that every line should start with, immediately followed by a pipe. (A caveat there: the pattern can have up to three capital letters following the four digits. E.g. 00-1324A| or 12-3456STR|).

There are instances in the file where that pattern is in the middle of a line, and needs to be moved to the next line.

Example:

00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2

As I noted within the example, 99-8656ST needs to be moved to the next line, resulting in this:

00-1234REV|The quick brown fox jumped over the lazy dog|Test
11-6544|FooBar|text
99-8656ST|This needs to be on the next line|some text
45-8737|Peter pipe picked a peck of pickled peppers|TEST2

I currently have this regex: (?<=[^\d\r\n])\d{2}-\d{4}(?!\d) but that is matching on parts of social security numbers in the middle of a line:

123-45-6789

My regex will on 45-6789.


Solution

  • Since purely numeric boundaries do not work here, you can add up a check for a digit + hyphen on the left. The right-hand boundary is clear, it is zero to three uppercase letters followed with a pipe.

    That means, you can use

    (?<=[^\d\r\n])(?<!\d-)\d{2}-\d{4}(?=[A-Z]{0,3}\|)
    

    See the regex demo. Details:

    • (?<=[^\d\r\n]) - immediately on the left, there must be a char other than a digit, CR, LF
    • (?<!\d-) - immediately on the left, there should be no digit + -
    • \d{2}-\d{4} - two digits, -, four digits
    • (?=[A-Z]{0,3}\|) - immediately followed with 0 to 3 uppercase letters and then a literal | char.

    If the left-hand boundary can be a single hyphen or digit, then replace (?<=[^\d\r\n])(?<!\d-) with (?<=[^\r\n\d-]).