Search code examples
parsingocamlocamlyacc

Parse two kinds of statements, with a priority


I would like to parse both f(arg).method and f(arg) as block_statement; the first has more priority than the latter.

The following elements in parser.mly can't parse f(arg), but can parse f(arg).method as follows:

  (* f(arg).method *)
  BS_MAE MAE_LE_UN (
    LE_IE IE_LE_AL (
      LE_SNE SNE_I f,
      AL_I arg),
    UN_I method)

(* parser.mly: *)

block_statement:
| member_access_expression { BS_MAE $1 }

simple_name_expression: | IDENTIFIER { SNE_I $1 }
member_access_expression: | l_expression DOT unrestricted_name { MAE_LE_UN ($1, $3) }
unrestricted_name: | IDENTIFIER { UN_I $1 }
index_expression: | l_expression LPAREN argument_list RPAREN { IE_LE_AL ($1, $3) }
expression: | l_expression { E_LE $1 }

l_expression:
| simple_name_expression { LE_SNE $1 } 
| index_expression { LE_IE $1 } 

call_statement: 
| simple_name_expression argument_list { CallS_SNE_AL ($1, $2) }
| member_access_expression argument_list { CallS_MAE_AL ($1, $2) }

argument_list: | IDENTIFIER { AL_I $1 }

But if we append another line | IDENTIFIER LPAREN expression RPAREN { BS_I_E ($1, $3) } for block_statement, this time it can parse f(arg) as follows:

  BS_I_E (
    f,
    E_LE LE_SNE SNE_I arg)

However, this time, f(arg).method can't be parsed anymore. It raises an error after reading .

I don't know how to let the parser go a little bit further to read f(arg).method as a whole if possible; I really need the parser to parse both of the statements... Could anyone help?


Solution

  • I would try a grammar with a structure along the lines of:

    block:
    | expr
    
    expr:
    | expr LPAREN argument_list RPAREN
    | expr DOT unrestricted_name
    | simple_expr
    
    simple_expr:
    | IDENTIFIER
    

    Note that if you want to parse a full sentence, and not just a valid prefix of the input, your toplevel rule should request the EOF token to be present (to force the parser to go to the end of the input):

    %start <block> main
    
    main:
    | b=block EOF { b }