Search code examples
expressionoperator-precedencerascaldisambiguation

Expression precedence and disambiguation


I'm currently working on making Rust-lang parsable with Rascal. The originally used syntax is made for Bison and so I'm translating it to be usable. The stumbling blocks that I have reached are the expressions. Rust has four types of expressions in its Bison file and those are nonblock_expr, expr, nonparen_expr and expr_nostruct. My existing translations are full of ambiguities but even after reading the Rascal documentation about the topic and the available syntax for Rascal and Java, I'm not sure how to solve this problem. I would like to fix the ambiguities and better understand how this problem could be fixed further.

Here is an example of my translation:

syntax Expression
    = Literal
    > Path_expression
    | "self"
    | Macro_expression
    | Path_expression "{" Structure_expression_fields "}"
    | Expression "." Path_generic_args_with_colons
    //> left Expression "." Literal_integer
    | Expression "[" Expression? "]"
    | Expression "(" (Expressions ","?)? ")"
    | "(" (Expressions ","?)? ")"
    | "[" Vector_expression "]"
    | "continue"
    | "continue" Identifier
    | "return"
    | "return" Expression
    | "break"
    | "break" Identifier
    > left  ( Expression "*" Expression
            | Expression "/" Expression
            | Expression "%" Expression
            )
    > left  ( Expression "+" Expression
            | Expression "-" Expression
            > Expression "\<\<" Expression
            | Expression "\>\>" Expression
            > Expression "&" Expression
            > Expression "^" Expression
            > Expression "|" Expression
            > Expression "\<" Expression
            | Expression "\>" Expression
            | Expression "\<=" Expression
            | Expression "\>=" Expression
            > Expression "==" Expression
            | Expression "!=" Expression
            > Expression "||" Expression
            > Expression "&&" Expression
            )
    > right Expression "\<-" Expression
    > right ( Expression "=" Expression
            | Expression "\<\<=" Expression
            | Expression "\>\>=" Expression
            | Expression "-=" Expression
            | Expression "&=" Expression
            | Expression "|=" Expression
            | Expression "+=" Expression
            | Expression "*=" Expression
            | Expression "/=" Expression
            | Expression "^=" Expression
            | Expression "%=" Expression
            )
    | Expression ".."
    | Expression ".." Expression
    | ".." Expression
    | ".."
    | Expression "as" Type
    | "box" Nonparen_expression
    > "box" "(" Expression? ")" Expression
    | Expression_qualified_path
    | Block_expression
    | Block
    | Nonblock_prefix_expression
    ;

Sources:

Rust Bison file: Github Rust

Oxidize Rascal file: Github Oxidize

Added ambiguity example This is an example of an input file (just the ambiguous part of a file being in the parameters of parse).

fn main() {
    let mut config = Config::parse(&flags.build, flags.config.clone());
}

The output parse tree as seen bellow seems to be confused about the construction of the prefixed expression (&). This ambiguity can't decide between the Nonblock_prefix_expression and Expression "." Path_generic_args_with_colons. I expect it to parse through the Nonblock_prefix_expression. enter image description here


Solution

    • Have a look at http://www.meta-environment.org/doc/books/syntax/sdf-disambiguation/sdf-disambiguation.pdf for the general concept of disambiguation and how to approach resolving ambiguity in a step-by-step fashion. This manual was written for SDF2, which has many commonalities with Rascal. Watch out for the differences though, {prefer} and {avoid} are not present in Rascal, but can be simulated with a library (not advised).
    • The DrAmbiguity tool can help automating diagnostics for ambiguity: import analysis::grammars::Ambiguity; and diagnose(yourAmbiguousTree), it explains in a pretty raw form what the differences and commonalities are between different alternative parse trees for the same sub-string. The differences are the cause of the ambiguity: rules and orders of rule application which are used in the one alternative but not in the other. The cause can be removed by applying disambiguation filters which remove only one of the two alternatives (so they focus on the differences)
    • If you have concrete ambiguities with example input sentence and output trees (as small as possible), we can help on stackoverflow.com