Search code examples
regexrecursionpattern-matchingraku

Using Recursive Regexes in Raku: how to limit recursion-levels?


Raku has an interesting and exciting recursive-regex notation: <~~>.

So in the REPL, we can do this:

[0] > 'hellohelloworldworld' ~~ m/ helloworld /;
「helloworld」
[1] > 'hellohelloworldworld' ~~ m/ hello <~~>? world /;
「hellohelloworldworld」

Going directly from the Raku Docs for Recursive Regexes, we can capture/count various levels of nesting:

~$ raku -pe '#acts like cat here' nest_test.txt
not nested

previous blank
nestA{1}
nestB{nestA{1}2}
nestC{nestB{nestA{1}2}3}
~$ raku -ne 'my $cnt = 0; say m:g/  \{  [  <( <-[{}]>*  )> | <( <-[{}]>* <~~>*? <-[{}]>* )>  ] \} {++$cnt} /, "\t  $cnt -levels nested";'  nest_test.txt
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
(「1」)     1 -levels nested
(「nestA{1}2」)     2 -levels nested
(「nestB{nestA{1}2}3」)     3 -levels nested

(Above, change say to put to only return the captured string).

But I recently ran into an issue trying to solve a Unix & Linux question, which is: how to limit the recursion? Let's say we want to only capture below nestB. Is there anyway to do this using the <~~> recursive regex syntax?

~$ raku -ne 'my $cnt = 0; say m:g/ nestB  \{  [  <( <-[{}]>*  )> | <( <-[{}]>* <~~>*? <-[{}]>* )>  ] \} {++$cnt} /, "\t  $cnt -levels nested";'  nest_test.txt
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested
()    0 -levels nested

NOTE: Above I've tried to force some sort of 'frugal recursive behavior' by using <~~>*?. The truth is <~~> (standard recursive notation), <~~>?, <~~>*, and <~~>*? all give identical results (rakudo-moar-2024.09-01).

What is the correct Raku recursive regex syntax?


Solution

  • Using Recursive Regexes in Raku: how to limit recursion-levels?

    Increment a dynamic variable inside a <?{ ...}> conditional. For example:

    my $*cnt;
    say 'a' x 100 ~~ / <?{++$*cnt <= 5}> a <~~>? /; # 「aaaaa」