Search code examples
javascriptsqlparsingjison

Jison: produce AST node with multiple children for AND and OR


I am working on a simple SQL to Mongo query criteria generation in JavaScript. I am using Jison to parse the SQL's where clause.

The following grammar returns an AST in form of binary tree where ORs and ANDs are nested. What I want is to obtain an AST where the OR node has all of the terms in single node (flat tree).

/* lexical grammar */
/* http://stackoverflow.com/questions/8467150/how-to-get-abstract-syntax-tree-ast-out-of-jison-parser */
%lex
%%

\s+                   /* skip whitespace */
[0-9]+("."[0-9]+)?\b                return 'NUMBER'
'AND'                 return 'AND'
'OR'                  return 'OR'
'NOT'                 return 'NOT'
'BETWEEN'             return 'BETWEEN'
L?\"(\\.|[^\\"])*\"                    return 'STRING_LITERAL'
'('                   return 'LPAREN'
')'                   return 'RPAREN'
'!='                  return 'NEQ'
'>='                  return 'GE'
'<='                  return 'LE'
'='                   return 'EQ'
'>'                   return 'GT'
'<'                   return 'LT'
'IN'                  return 'IN'
'NIN'                 return 'NIN'
'+'                   return 'PLUS'
'-'                   return 'MINUS'
','                   return 'COMMA'
[_a-zA-Z][_\.a-zA-Z0-9]{0,30}            return 'IDEN'
<<EOF>>               return 'EOF'
.                     return 'INVALID'

/lex

%left OR
%left AND
%right NOT
%left NEQ EQ
%left GT LE LT GE
$left PLUS MINUS

%start start
%% /* language grammar */

start
    :  search_condition EOF
        {return $1;}
    ;

search_condition
    : search_condition OR boolean_term
        {$$ = {
            'or': [ $1, $3 ]
            };
        }
    | boolean_term
    ;

boolean_term
    : boolean_factor
    | boolean_term AND boolean_factor
        {$$ = {
            'and': [ $1, $3 ]
            };
        }
    ;

boolean_factor
    : boolean_test
    ;

boolean_test
    : boolean_primary
    ;

boolean_primary
    : predicate
    | LPAREN search_condition RPAREN
        {$$ = $2}
    ;

predicate
    : comparison_predicate
    | in_predicate
    | nin_predicate
    | between_predicate
    ;

comparison_predicate
    : IDEN comp_op value_expression
        {$$ = {
            var: $1,
            op: $2,
            val: $3
            };
        }
    ;

value_expression
    : NUMBER
    | STRING_LITERAL
    ;

comp_op
    : EQ
    | NEQ
    | GT
    | GE
    | LT
    | LE
    ;

in_predicate
    : IDEN IN in_predicate_value
    {$$ = {
            in: $3
            };
        }
    ;

nin_predicate
    : IDEN NIN in_predicate_value
    {$$ = {
            nin: $3
            };
        }
    ;

in_predicate_value
    : LPAREN in_value_list RPAREN
    {$$ = [$2];}
    ;

in_value_list
    : in_value_list_element
        {$$ = []; $$.push($1); }
    | in_value_list COMMA in_value_list_element
        {$1.push($3); $$ = $1; }
    ;

in_value_list_element
    : value_expression
        {$$ = $1;}
    ;

between_predicate
    : IDEN BETWEEN value_expression AND value_expression
    {$$ = {
            between: {
                from: $3,
                to: $5
            }

            };
        }
    ;

And when I parse the following

var ast = parser.parse('a=1 OR b=2 OR c=3 OR d=4 ');

It returns

{
  "or": [
    {
      "or": [
        {
          "or": [
            {
              "var": "a",
              "op": "=",
              "val": "1"
            },
            {
              "var": "b",
              "op": "=",
              "val": "2"
            }
          ]
        },
        {
          "var": "c",
          "op": "=",
          "val": "3"
        }
      ]
    },
    {
      "var": "d",
      "op": "=",
      "val": "4"
    }
  ]
}

But I want it to return

{
  "or": [
    {
      "var": "a",
      "op": "=",
      "val": "1"
    },
    {
      "var": "b",
      "op": "=",
      "val": "2"
    },
    {
      "var": "c",
      "op": "=",
      "val": "3"
    },
    {
      "var": "d",
      "op": "=",
      "val": "4"
    }
  ]
}

Is that possible using Jison? If so, what changes is needed?


Solution

  • You just need to fix the actions.

    First, change the actions in the search_condition rule as follows:

    search_condition
        : search_condition OR boolean_term
            { $1['or'].push($3); $$ = $1; }
        | boolean_term
            { $$ = { 'or': [ $1 ] }; }
        ;
    

    This ensures that a search_condition always produces an or node, even if that node includes only one element. Since the base production creates a (singular) or node, the recursive production can freely append to it.

    If you want to get rid of the degenerate or nodes (in the case that the search_condition did not contain an OR operator), you could do so in a wrapper (or directly in the start production):

    start
        :  simplified_search_condition EOF
           { return $1; }
        ;
    
    simplified_search_condition
        :  search_condition EOF
           { $$ = $1['or'].length == 1 ? $1['or'] : $1; }
        ;