Search code examples
regexpcreregex-lookaroundspcregrep

Why is the positive lookahead in my pcregrep Regex not working?


I wrote a Regex using pcregrep, and everything behaved as expected until I added a positive lookahead.

Scenario:

I have the following text file:

a
b
c
a
c

Goal:

I want to use a Regex with pcregrep to return a line containing a and a line containing c with a line containing b between them that is not captured. So it would capture the first three lines (a, b, c) and return the first (a) and third (c) line. It would not capture the fourth and fifth line because there is no b line between them. So the output would be:

a
c

What I've tried

If I run pcregrep -M 'a\nb\nc\n' (command 1), this captures and returns:

a
b
c

as expected. So I now want to modify this to capture the b line with a positive lookahead. I tried this: pcregrep -M 'a\n(?=(b\n))c\n' (command 2). However, this returns nothing.

My question:

Why does command 2 not return the expected output, where command 1 does? How can I return the desired result? I know there are ways to do this other than pcregrep, but please note that I want to use pcregrep because I'll be extending the functionality to solve similar problems.


Solution

  • You can use 2 capture groups with -o option:

    pcregrep -M -o1 -o2 '(a\n)b\n(c)\n' file
    

    a
    c
    

    Details:

    • (...): In regex it is used for capturing groups
    • -o1 -o2: prints only capture group #1 and #2

    Note that your regex a\n(?=(b\n))c\n won't work because lookahead is just assertion with zero-width match. Your regex asserts presence of b\n after a\n which is fine but it attempts to match c\n right after a\n and this is where matching fails.