Search code examples
grammarraku

How to pin a Raku Grammar token to only match when at the end of a string


I have written this - it works fine:

use Grammar::Tracer;

my grammar Lambda {
    token  TOP       { <signature> <body> ' as ' <r-type> }
    rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
    rule  a-sig     { 'a:' <a-type> }
    rule  b-sig     { 'b:' <b-type> }
    token body      { '(' <expr> ')' <?before ' as '> }
    token expr      { <-[()]>* }
    token a-type    { @types }
    token b-type    { @types }
    token r-type    { @types }
}

Lambda.parse("|a: i32, b: i32| (a + b) as i32");

gives what I need:

TOP
|  signature
|  |  a-sig
|  |  |  a-type
|  |  |  * MATCH "i32"
|  |  * MATCH "a: i32"
|  |  b-sig
|  |  |  b-type
|  |  |  * MATCH "i32"
|  |  * MATCH "b: i32"
|  * MATCH "|a: i32, b: i32| "
|  body
|  |  expr
|  |  * MATCH "a + b"
|  * MATCH "(a + b)"
|  r-type
|  * MATCH "i32"
* MATCH "|a: i32, b: i32| (a + b) as i32"

BUT I would like to do this string (and similar): |a: str, b: i32| (a.len() as i32 + b) as i32

  • this fails since it exit the body match on the len() parens
  • even when I fix that it exits on the first as i32

I would like to find some way to "pin" the match to be the last valid match for 'as type' before the end of the string

And how to match but not capture only the other parens.

please


Solution

  • After some trial and error, I managed to work this out (Grammar::Tracer is soooo helpful!)

    Here's the working Grammar

    my @types  = <bool i32 i64 u32 u64 f32 f64 str>;
    
    my grammar Lambda {
        rule  TOP       { <signature> <body> <as-type> }
        rule  signature { '|' <a-sig> [',' <b-sig>]? '|' }
        rule  a-sig     { 'a:' <a-type> }
        rule  b-sig     { 'b:' <b-type> }
        rule  as-type   { 'as' <r-type> }
        rule  body      { '(' <expr> ')' <?before <as-type>> }
        rule  expr      { .* <?before ')'> }
        token a-type    { @types }
        token b-type    { @types }
        token r-type    { @types }
    }
    

    The changes I made were:

    • swap a bunch of tokens to rules (best way to ignore whitespace)
    • <as-type> to bundle the return type as a single matcher in TOP so that it always matches at the end
    • <body> has a lookahead assertion so is always before an <as-type>
    • <expr> has a lookahead assertion so is always before an ')'
    • but otherwise greedy with .* so that it hoovers up the whole expr and does not stop on the first ')'