Search code examples
regexpcrebackreference

PCRE2 - Match every word whose suffix matches a backreference


Given the string below,

ay bee ceefooh deefoo38 ee 37 ef gee38 aitch 38 eye19 jay38 kay 99 el88 em38 en 29 ou38 38 pee 12 q38 arr 999 esss 555

the goal is to match every word such that the suffix is a number that matches the number that appears after foo (which happens to be 38 in this case).

There is only one substring that begins with foo and ends with a number. The expected matches all exist after said substring.

Expected matches:

gee38
jay38
em38
ou38
q38

I've tried foo(\d+).*?(\w+\1)\b and foo(\d+).*(\w+\1)\b, but they fail to match all, because they either match the first one (gee38) or the last one (q38).

Is it possible to match all with just a single regex and, importantly, in just a single run?

The PCRE2 engine that I use behaves in the same way as https://regex101.com/r/uFEDOE/1. So, if the regex can match multiple substrings on regex101, then the engine that I use can too.


Solution

  • (?:foo|\G(?!^))(\d+).*?(?=(\w+))\w+(?=\1\b)

    Demo

    It could be some size or performance optimization.

    @Niko Gambt, say if any optimization is important for you.