Search code examples
bnf

How do you abstract some expression to BNF?


For example :

waldo:=fern+alpha/-beta^gamma;

The above arithmetical expression may be abstracted by this BNF(there may be some difference from standard BNF,but lets ignore it for now):

AEXP = AS $AS ;

AS = .ID ':=' EX1 ';' ;

EX1 = EX2 $( '+' EX2 / '-' EX2 ) ;

EX2 = EX3 $( '*' EX3 / '/' EX3 ) ;

EX3 = EX4 $( '^' EX3 ) ;

EX4 = '+' EX5 / '-' EX5 / EX5 ;

EX5 = .ID / .NUMBER / '(' EX1 ')' ;

.END

But the EX1~EX5 abstraction is not so intuitive to me.(I don't quite understand how they are crafted in the first place)

Is there any steps to follow when normalizing such expressions?


Solution

  • You can translate this notation to EBNF directly.

    Naming categories EX1 through EX5 is not an uncommon way of specifying operator precedence. In fact it is a good one, IMHO, especially in some languages that have 15 or more precedence levels, like C and C++ do. :)

    You can rename them to expression, term, factor, primary, etc. (or whatever terms make sense to you).

    ADDENDUM

    If you need a translation of the above into more traditional EBNF, here is how I would do it:

    AEXP => AS+
    AS   => id ':=' EX1 ';'
    EX1  => EX2 (('+' | '-') EX2)*
    EX2  => EX3 (('*' | '/') EX3)*
    EX3  => EX4 ('^' EX3)*
    EX4  => ('+'|'-')? EX5
    EX5  => id | number | '(' EX1 ')'
    

    I use '*' for zero or more, '+' for one or more, and '?' for optional. It is pretty cool how operator precedence is handled here, I think.

    ADDENDUM 2:

    Please note: It appears that the rule for EX3 is wrong. The way it stands now you can get parse trees like this

                      EX3
                       |
         +---+----+----+----+---------+
         |   |    |    |    |    |    |
        EX4  ^   EX3   ^   EX3   ^   EX3
                / | \               / | \
             EX4  ^  EX3         EX4  ^  EX3
    

    So writing a^b^c^d^e^f could mean a^(b^c)^d^(e^f). But in fact there are other ways to make this tree. The grammar is ambiguous.

    It appears the designer of the grammar wanted to make the ^ operator right-associative. But to do so, the rule should have been

    EX3 => EX4 ('^' EX3)?
    

    Now the grammar is no longer ambiguous. Look how the derivation of a^b^c^d^e^f MUST now proceed:

              EX3
             / | \
          EX4  ^  EX3
                 / | \
              EX4  ^  EX3
                     / | \
                  EX4  ^  EX3
                         / | \
                      EX4  ^  EX3
                             / | \
                          EX4  ^  EX3
    

    Now a^b^c^d^e^f can ONLY parse as a^(b^(c^(d^(e^f))))

    An alternative is to rewrite the rule as EX3 => EX4 ('^' EX4)* and have a side rule saying "OBTW the caret is right associative."