Search code examples
parsingdata-structuresrustebnf

How to represent recursive EBNF grammar in Rust data structures?


Let's say I have the following example EBNF grammar. This is not a perfect grammar, but it should demonstrate the problem correctly.

Statement = FunctionDefinition | Assignment | Expr ;
Expr = { Term | "(" , Expr , ")" } ;

Assignment = Word , "=" , Expr ;
FunctionDefinition = Word , { Word } , "=" , Expr ;

Term = Word | Number

Where a Word is some number of letters and numbers and a Number is a valid numeric literal.

I can start to represent this in Rust like so:

enum Statement {
    FunctionDefinition {
        name: String,
        params: Vec<String>,
        body: Expr,
    },
    Assignment {
        name: String,
        body: Expr,
    },
    //TODO: Expr
}

There is already a problem here though. How do I add Expr? Expr should probably have its own definition since it is used in several other places as well. Giving Expr its own separate definition and then adding it to this enum will redefine it.

If I continue anyway and start trying to define Expr, I run into even more problems:

type Expr = Vec<...?...>;
// or maybe...
struct Expr {
    terms: Vec<Expr>, // but what about Term??
}

I tried to use type because Expr does not necessarily need to be its own struct or enum since it is just a collection of Terms or other Exprs. It is difficult to recursively define this though. If I try to use an enum to emulate a union type for Expr and Term, then I have to redefine Expr in that enum and define Term within the enum which makes Term unavailable in the other structures I will need.


Solution

  • Expr can be a type alias, but you need to define an enum to represent the alternation. Term also needs to be a separate enum.

    enum Statement {
        FunctionDefinition {
            name: String,
            params: Vec<String>,
            body: Expr,
        },
        Assignment {
            name: String,
            body: Expr,
        },
        Expr(Expr),
    }
    
    type Expr = Vec<ExprItem>;
    
    enum ExprItem {
        Term(Term),
        Parenthesized(Expr),
    }
    
    enum Term {
        Word(String),
        Number(f64),
    }