Search code examples
pegtatsu

How to handle semantic failures in TatSu when parsing is correct?


I am trying to create a TatSu parser for a language containing C-like expressions. I have the following grammar rules for the expressions:

identifier =
    /[a-zA-Z][A-Za-z0-9_]*/
    ;

expression =
    or_expr
    ;

or_expr =
    '||'<{and_expr}+
    ;

and_expr =
    '&&'<{bitwise_or_expr}+
    ;

bitwise_or_expr =
    '|'<{bitwise_xor_expr}+
    ;

bitwise_xor_expr =
    '^'<{bitwise_and_expr}+
    ;

bitwise_and_expr =
    '&'<{equality_expr}+
    ;

equality_expr =
    ('==' | '!=')<{comparison_expr}+
    ;

comparison_expr =
    ('<' | '<=' | '>' | '>=')<{bitshift_expr}+
    ;

bitshift_expr =
    ('<<' | '>>')<{additive_expr}+
    ;

additive_expr =
    ('+' | '-')<{multiplicative_expr}+
    ;

multiplicative_expr =
    ('*' | '/' | '%')<{unary_expr}+
    ;

unary_expr =
    '+' ~ atom
    | '-' ~ atom
    | '~' ~ atom
    | '!' ~ atom
    | atom
    ;

atom =
    literal
    | helper_call
    | parenthesized
    | var_or_param
    ;

literal =
    value:float type:`float`
    | value:integer type:`int`
    | value:char type:`char`
    | value:string type:`string`
    | value:bool type:`int`
    | value:null type:`null`
    ;

helper_call =
    function:identifier '(' ~ params:expression_list ')'
    ;

var_or_param =
    identifier
    ;

parenthesized =
    '(' ~ @:expression ')'
    ;

I was running into trouble with the atom rule. When parsing the following (the expression being the part between the = and ;):

lastTime = ts + interval;

I got this exception:

tatsu.exceptions.FailedToken: (27:41) expecting '(' :
                lastTime = ts + interval;
                                        ^
helper_call
atom
unary_expr
multiplicative_expr
...

It was failing trying to make it fit the helper_call rule, when the var_or_param rule should have matched just fine. It turns out, the cause was an erroneous FailedSemantics raised by the semantic actions for var_or_param. Once I fixed that, the parsing worked as expected.

This raises a question: If FailedSemantics affects the parsing logic, what is the proper way to alert the user when there is a semantic error, but the parse logic is otherwise correct and should not attempt different choices or rules? For example, type mismatches or variable usage before declaration? (Ideally in a way that would still show the line number where the error occurred.)


Solution

  • FailedSemantics does affect the parsing. It gets translated to a FailedParse in the parse logic.

    If the parsing should stop, then keep using FailedSemantics.

    In other scenarios it's up to you.

    TatSu is designed so most of semantic checks are done after the parse succeeded, through a walker or other means.