Search code examples
parsingvb6grammarrecursive-descentleft-recursion

Practical solution to fix a Grammar Problem


We have little snippets of vb6 code (the only use a subset of features) that gets wirtten by non-programmers. These are called rules. For the people writing these they are hard to debug so somebody wrote a kind of add hoc parser to be able to evaluete the subexpressions and thereby show better where the problem is.

This addhoc parser is very bad and does not really work woll. So Im trying to write a real parser (because im writting it by hand (no parser generator I could understand with vb6 backends) I want to go with recursive decent parser). I had to reverse-engineer the grammer because I could find anything. (Eventully I found something http://www.notebar.com/GoldParserEngine.html but its LALR and its way bigger then i need)

Here is the grammer for the subset of VB.

<Rule>                 ::=  expr rule | e
<Expr>                 ::= ( expr )
                           | Not_List CompareExpr <and_or> expr
                           | Not_List CompareExpr

<and_or>                   ::= Or | And

<Not_List>             ::= Not Not_List | e

<CompareExpr>          ::= ConcatExpr comp CompareExpr
                           |ConcatExpr

<ConcatExpr>           ::= term  term_tail & ConcatExpr
                           |term term_tail

<term>                 ::= factor factor_tail
<term_tail>            ::= add_op term term_tail | e

<factor>               ::= add_op Value | Value
<factor_tail>          ::= multi_op  factor factor_tail | e

<Value>                ::= ConstExpr | function | expr

<ConstExpr>            ::= <bool> | number | string | Nothing
<bool>                 ::= True | False
<Nothing>              ::= Nothing | Null | Empty

<function>             ::= id | id ( ) | id ( arg_list )
<arg_list>             ::= expr , arg_list | expr

<add_op>               ::= + | -
<multi_op>             ::= * | /
<comp>                 ::= > | < | <= | => |  =< | >= |  = | <>

All in all it works pretty good here are some simple examples:

my_function(1, 2 , 3)  

looks like

(Programm
 (rule
  (expr
   (Not_List)
   (CompareExpr
    (ConcatExpr
     (term
      (factor
       (value
        (function
         my_function
         (arg_list
          (expr
           (Not_List)
           (CompareExpr
            (ConcatExpr (term (factor (value 1))) (term_tail))))
          (arg_list
           (expr
            (Not_List)
            (CompareExpr
             (ConcatExpr (term (factor (value 2))) (term_tail))))
           (arg_list
            (expr
             (Not_List)
             (CompareExpr
              (ConcatExpr (term (factor (value 3))) (term_tail))))
            (arg_list))))))))
     (term_tail))))
  (rule)))

Now whats my problem?

if you have code that looks like this (( true OR false ) AND true) I have a infinit recursion but the real problem is that in the (true OR false) AND true (after the first ( expr ) ) is understood as only (true or false).

Here is the Parstree: Parse Tree

So how to solve this. Should I change the grammer somehow or use some implmentation hack?

Something hard exmplale in case you need it.

(( f1 OR f1 ) AND (( f3="ALL" OR f4="test" OR f5="ALL" OR f6="make" OR f9(1, 2) ) AND ( f7>1 OR f8>1 )) OR f8 <> "")

Solution

  • You have several issues that I see.

    You are treating OR and AND as equal precedence operators. You should have separate rules for OR, and for AND. Otherwise you will the wrong precedence (therefore evaluation) for the expression A OR B AND C.

    So as a first step, I'd revise your rules as follows:

    <Expr>  ::= ( expr )                        
                | Not_List AndExpr Or  Expr   
                | Not_List AndExpr
    
    <AndExpr>  ::=                        
                | CompareExpr And  AndExpr   
                | Not_List CompareExpr
    

    Next problem is that you have ( expr ) at the top level of your list. What if I write:

     A AND (B OR C)
    

    To fix this, change these two rules:

    <Expr>  ::= Not_List AndExpr Or Expr   
                | Not_List AndExpr
    
    <Value> ::= ConstExpr | function | ( expr )
    

    I think your implementation of Not is not appropriate. Not is an operator, just with one operand, so its "tree" should have a Not node and a child which is the expression be Notted. What you have a list of Nots with no operands. Try this instead:

    <Expr>  ::= AndExpr Or Expr   
                | AndExpr
    
    <Value> ::= ConstExpr | function | ( expr ) | Not Value
    

    I haven't looked, but I think VB6 expressions have other messy things in them.

    If you notice, the style of Expr and AndExpr I have written use right recursion to avoid left recursion. You should change your Concat, Sum, and Factor rules to follow a similar style; what you have is pretty complicated and hard to follow.