I am trying to search for a regex with lookahead its not working in pcregrep or grep
I want to search for bits of sections
Generall i use the following in sublime text find and works well
(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)
Now i want to find the count of such occurences so i am trying to use grep or pcergrep, both are not working.
pcregrep -c "(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)" file.txt
zsh: event not found: PQXY|OFEJ).)
and with grep
$ grep -c -zoP "(?s)(^PQXY(?:(?!PQXY|OFEJTRANS).)*OFEJTRANS\n)" CB_raw_testing_21_feb_CORRECTIONS_0002.txt
zsh: event not found: PQXY|OFEJTRANS).)
How can i do this
Answer based on @paxdiablo and @anubha.
The main error was the single quotes as addressed by @paxdiablo
$ pcregrep -c -M '(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)' file.txt
0
The regex solution is to add (?s) based on @anubha. Ofcourse \n
also works instead of (\R|\z)
$ pcregrep -c -M '(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)' file.txt
11726
zsh: event not found: PQXY|OFEJ).)
Since this is zsh
raising the error, it's almost certainly because it's trying to process the stuff within the double quotes. To protect it from that, you should use single quotes, such as:
pcregrep -c '(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ\n)' file.txt
I don't have pcregrep
installed but here's a transcript showing the problem with just echo
:
pax> echo "(?s)(^PQXY(?:(?!PQXY|OFEJ).)*OFEJ)"
zsh: event not found: PQXY|OFEJ).)
pax> echo '(?s)(^PQXY(?:(?OFEJ)'
(?s)(^PQXY(?:(?OFEJ)
In terms of solving the problem rather than using a specific tool, I would actually opt for awk
(a) in this case. You can do something like:
awk '/^PQXY/ { s = $0; c = 1; next}
/OFEJ$/ { if (c == 1) { print s""ORS""$0; c = 0 }; next }
/OFEJ|PQXY/ { c = 0; next }
c == 1 { s = s""ORS""$0 }' inputFile
This works by using a string and flag to control lines collected and state, initially they are an empty string and zero.
Then, for each line:
OFEJ
and you're collecting, output the collected section and stop collecting, then go to next input line.I've tested this with some limited test data and it seems to work okay. Here's the bash
script(b) I used for testing, you can add as many test cases as you need to be comfortable it solves your problem.
for i in \
"PQXY 1\nabc\n2 OFEJ\n" \
"PQXY 1\nabc\n2 OFEJx\n" \
"PQXY 1\nabc\n PQXY \n2 OFEJ\n" \
"PQXY 1\nabc\n OFEJ \n2 OFEJ\n" \
"PQXY 1\nabc\ndef\nPQXY 2\n2 OFEJ\n" \
; do
echo "$i:"
printf "$i" | awk '
/^PQXY/ { s = $0; c = 1; next}
/OFEJ$/ { if (c == 1) { print s""ORS""$0; c = 0 }; next }
/OFEJ|PQXY/ { c = 0; next }
c == 1 { s = s""ORS""$0 }' | sed 's/^/ /
'
done
Here's the output so you can see it in action:
PQXY 1\nabc\n2 OFEJ\n:
PQXY 1
abc
2 OFEJ
PQXY 1\nabc\n2 OFEJx\n:
PQXY 1\nabc\n PQXY \n2 OFEJ\n:
PQXY 1\nabc\n OFEJ \n2 OFEJ\n:
PQXY 1\nabc\ndef\nPQXY 2\n2 OFEJ\n:
PQXY 2
2 OFEJ
(a) In my experience, if you've tried three things with a grep
-style regex without success, it's usually faster to move to a more advanced tool :-)
(b) Yes, I know it's written in bash
rather than zsh
but that's because:
awk
works, hence the language used is irrelevant; andbash
tahn zsh
:-)