Search code examples
pegkit

PEGKit Keep trying rules


Suppose I have a rule:

myCoolRule:
     Word
     | 'myCoolToken' Word otherRule

I supply as input myCoolToken something else now it attempts to parse it greedily matches myCoolToken as a word and then hits the something and says uhhh I expected EOF, if I arrange the rules so it attempts to match myCoolToken first all is good and parses perfectly, for that input.

I am wondering if it is possible for it to keep trying all the rules in that statement to see if any works. So it matches Word fails, comes back and then tries the next rule.

Here is the actual grammar rules causing problems:

columnName = Word;
typeName = Word;

//accepts CAST and cast
cast =  { MATCHES_IGNORE_CASE(LS(1), @"CAST") }? Word ;

checkConstraint = 'CHECK' '('! expr ')'!;

expr        = requiredExp optionalExp*;

requiredExp =  (columnName
               | cast '(' expr as typeName ')'
               ... more but not important
optionalExp ...not important

The input CHECK( CAST( abcd as defy) ) causes it to fail, even though it is valid

Is there a construct or otherwise to make it verify all rules before giving up.


Solution

  • Creator of PEGKit here.

    If I understand your question, No, this is not possible. But this is a feature of PEGKit, not a bug.

    Your question is related to "determinacy" vs "nondeterminacy". PEGKit is a "deterministic" toolkit (which is widely considered a desirable feature for parsing programming languages).

    It seems you are looking for a more "nondeterministic" behavior in this case, but I don't think you should be :).

    PEGKit allows you to specify the priority of alternate options via the order in which the alternate options are listed. So:

    foo = highPriority
        | lowerPriority
        | lowestPriority
        ;
    

    If the highPriority option matches the current input, the lowerPriority and lowestPriority options will not get a chance to try to match, even if they are somehow a "better" match (i.e. they match more tokens than highPriority).

    Again, this is related to "determinacy" (highPriority is guaranteed to be given primacy) and is widely considered a desirable feature when parsing programming languages.

    So if you want your cast() expressions to have a higher priority than columnName, simply list the cast() expression as an option before the columnName option.

    requiredExp =  (cast '(' expr as typeName ')'
                   | columnName
                   ... more but not important
    


    OK, so that takes care of the syntactic details. However, if you have higher-level semantic constraints which can affect parsetime decisions about which alternative should have the highest priority, you should use a Semantic Predicate, like:

    foo = { shouldChooseOpt1() }? opt1
        | { shouldChooseOpt2() }? opt2
        | defaultOpt
        ;
    

    More details on Semantic Predicates here.