Given this sample data:
00-1234T|`CRLF`
Data|Commments|`CRLF`
12-3456|Some data|Notes|`CRLF`
65-8436ZZ|Data|`CRLF`
|`CRLF`
45-4576AA|Some data|Comments|`CRLF`
98-4392REV|Data|`CRLF`
|`CRLF`
00-5432|Some Data|Some Comments|
(I added the "CRLF
"s to each line to more clearly illustrate what is there and what needs to be replaced)
Each record should only have three pipes in a line, with a CRLF
after the third pipe. So lines 1, 4, and 7 (pre-find/replace) need to be fixed, which means any CRLF
s before the third pipe needs to be replaced with a "placeholder", which will be "#CRLF#".
The closest I've been able to come up with is ^((?:[^\v|]*\|){3})(.+)
, which will match (highlight) lines 3 & 4, 6 & 7, and 9 & 10. My expectation (requirement) is to find the CRLF
s in lines 2, 5, & 8 and replace those with "#CRLF#".
[UPDATE]
After sleeping on this question, I woke up realizing that, for the purpose of more accurately finding the beginning of a given record - whether on one line or multiple - I should add that the first column will always start with the pattern [0-9][0-9]-[0-9][0-9][0-9][0-9]
and possibly have up to three alphanumeric characters after that.
I modified the sample data above to reflect that.
\R(?!\d\d-\d{4}\w{0,3}\|)
#CRLF#
Explanation:
\R # any kind of linebreak (i.e. \r, \n, \r\n),
if you want to match only windows EOL, use \r\n
(?! # negative lookahead, make sure we haven't after:
\d\d-\d{4} # 2 digit dash 4 digit
\w{0,3} # word character from 0 upto 3
\| # a pipe
) # end lookahead
Screenshot (before):
Screenshot (after):