Search code examples
regexpcre

Use of capture groups within lookarounds


Suppose we are given the following string:

a, b, c, d
e, f, g, h
i, j, k, l

I wish to convert that to the following string using a PCRE regex:

ab, ac, ad
ef, eg, eh
ij, ik, il

More generally, each of these letters can be regarded as a placeholder for a string of word characters, and there can be an arbitrary number of them per line and an arbitrary number of lines.

If this cannot be done, can the following string be produced?

a, ab, ac, ad
e, ef, eg, eh
i, ij, ik, il

Please demonstrate your regex using the "SUBSTITUTION" facility (which can include back-references such as $1) at regex101.com. I would particularly appreciate an explanation of how the PCRE engine is stepping through the string.

If this cannot be done with a PCRE regex I would like an explanation of why it cannot be done.

I am asking this question to improve my understanding of how capture groups within lookarounds work.


Solution

  • This can only be done with a regex engine that supports variable-width lookbehind patterns, which PCRE isn't. A variable-width lookbehind is required to reference the word at the beginning of each line for every subsequent word.

    If variable-width lookbehind patterns are supported by your regex engine, you can then search for:

    (?<=(\w+),.*)(\w+)|^\w+,\s*
    

    and replace the matches with:

    $1$2
    

    Demo: https://regex101.com/r/XZhZyW/5/