Search code examples
regexnotepad++

Regex in Notepad++ to remove certain CRLFs


Given this sample data:

00-1234T|`CRLF`
Data|Commments|`CRLF`
12-3456|Some data|Notes|`CRLF`
65-8436ZZ|Data|`CRLF`
|`CRLF`
45-4576AA|Some data|Comments|`CRLF`
98-4392REV|Data|`CRLF`
|`CRLF`
00-5432|Some Data|Some Comments|

(I added the "CRLF"s to each line to more clearly illustrate what is there and what needs to be replaced)

Each record should only have three pipes in a line, with a CRLF after the third pipe. So lines 1, 4, and 7 (pre-find/replace) need to be fixed, which means any CRLFs before the third pipe needs to be replaced with a "placeholder", which will be "#CRLF#".

The closest I've been able to come up with is ^((?:[^\v|]*\|){3})(.+), which will match (highlight) lines 3 & 4, 6 & 7, and 9 & 10. My expectation (requirement) is to find the CRLFs in lines 2, 5, & 8 and replace those with "#CRLF#".

[UPDATE]

After sleeping on this question, I woke up realizing that, for the purpose of more accurately finding the beginning of a given record - whether on one line or multiple - I should add that the first column will always start with the pattern [0-9][0-9]-[0-9][0-9][0-9][0-9] and possibly have up to three alphanumeric characters after that.

I modified the sample data above to reflect that.


Solution

    • Ctrl+H
    • Find what: \R(?!\d\d-\d{4}\w{0,3}\|)
    • Replace with: #CRLF#
    • CHECK Wrap around
    • CHECK Regular expression
    • Replace all

    Explanation:

    \R              # any kind of linebreak (i.e. \r, \n, \r\n), 
                         if you want to match only windows EOL, use \r\n
    (?!             # negative lookahead, make sure we haven't after:
        \d\d-\d{4}      # 2 digit dash 4 digit
        \w{0,3}         # word character from 0 upto 3
        \|              # a pipe
    )               # end lookahead
    

    Screenshot (before):

    enter image description here

    Screenshot (after):

    enter image description here