Search code examples
pythonalgorithmparsingprogramming-languages

What is the general way to implement operators precedence in Python


Suppose I would like to write a fairly simple programming language, and I want to implement operators such like 2 + 3 * 2 = 8

What is the general way to implement things like this?


Solution

  • I'm not sure how much detail you're interested in, but it sounds like you're looking to implement a parser. There's typically two steps:

    The lexer reads over the text and converts it to tokens. For example, it might read "2 + 3 * 2" and convert it to INTEGER PLUS INTEGER STAR INTEGER

    The parser reads in the tokens and tries to match them to rules. For example, you might have these rules:

    Expr := Sum | Product | INTEGER;
    Sum := Expr PLUS Expr;
    Product := Expr STAR Expr;
    

    It reads the tokens and tries to apply the rules such that the start rule maps to the tokens its read in. In this case, it might do:

    Expr := Sum
    Expr := Expr PLUS Expr
    Expr := INTEGER(2) PLUS Expr
    Expr := INTEGER(2) PLUS Product
    Expr := INTEGER(2) PLUS Expr STAR Expr
    Expr := INTEGER(2) PLUS Integer(3) STAR Expr
    Expr := INTEGER(2) PLUS Integer(3) STAR Integer(2)
    

    There are many types of parsers. In this example I read from left to right, and started from the initial expression, working down until I'd replaced everything with a token, so this would be an LL parser. As it does this replacement, it can generate an abstract syntax tree that represents the data. The tree for this might look something like:

    Screenshot of the AST

    You can see that the Product rule is a child of the Sum rule, so it will end up happening first: 2 + (3 * 2). If the expression had been parsed differently we might've ended up with this tree:

    Screenshot of the AST

    Now we're calculating (2 + 3) * 2. It all comes down to which way the parser generates the tree


    If you actually want to parse expressions, odds are you don't want to write the parser by hand. There are parser generators that take a configuration (called a grammar) similar to the one I used above, and generate the actual parser code. Parser generators will let you specify which rule should take priority, so for example:

    Expr := Sum | Product | INTEGER;
    Sum := Expr PLUS Expr; [2]
    Product := Expr STAR Expr; [1]
    

    I labeled the Product rule as priority 1, and Sum as priority 2, so given the choice the generated parser will favor Product. You can also design the grammar itself such that the priority is built-in (this is the more common approach). For example:

    Expr := Sum | INTEGER;
    Sum := Expr PLUS Product;
    Product := Term STAR INTEGER;
    

    This forces the Products to be under the Sums in the AST. Naturally this grammar is very limited (for example, it wouldn't match 2 * 3 + 2), but a comprehensive grammar can be written that still embeds an order of operations automatically