Search code examples
regexpcrepcregrep

PCRE regex behaves differently when moved to subroutine


Using PCRE v8.42, I am trying to abstract a regex into a named subroutine, but when it's in a subroutine, it seems to behave differently.

This outputs 10/:

echo '10/' | pcregrep '(?:0?[1-9]|1[0-2])\/' 

This outputs nothing:

echo '10/' | pcregrep '(?(DEFINE)(?<MONTHNUM>(?:0?[1-9]|1[0-2])))(?&MONTHNUM)\/'

Are these two regular expressions not equivalent?


Solution

  • In versions of PCRE2 prior to 10.30, all subroutine calls are always treated as atomic groups. Your (?(DEFINE)(?<MONTHNUM>(?:0?[1-9]|1[0-2])))(?&MONTHNUM)\/ regex is actually equal to (?>0?[1-9]|1[0-2])\/. See this regex demo, where 10/ does not match as expected.

    There is no match because 0?[1-9] matched the 1 in 10/ and since there is no backtracking allowed, the second alternative was not tested ("entered"), and the whole match failed as there is no / after 1.

    You need to make sure the longer alternative comes first:

    (?(DEFINE)(?<MONTHNUM>(?:1[0-2]|0?[1-9])))(?&MONTHNUM)/
    

    See the regex demo. Note that in the pcregrep pattern, you do not need to escape /.

    Alternatively, you can use PCRE2 v10.30 or newer.