Search code examples
parsingcompiler-constructionabstract-syntax-tree

Parse grammar for a member expression


What is the correct grammar for a standard member expression?

E.g. the ast from the code:

test.test.function()

would be

MemberExpression("test", MemberExpression("test", MethodCall("function")))

And likewise for a variable:

test.test.test.variable

MemberExpression("test", MemberExpression("test", MemberExpression("test", Variable("variable"))))


Solution

  • Depends on the language, surely :-) But it's pretty straight-up in most grammars (see below).

    One comment, though. As indicated by the grammars below, member access (like function calls and, usually, subscripting) acts like a postfix operator; the symbol after the dot (or arrow, in C-like languages) is a symbol representing a member name. It is not an expression; the only expression in the member lookup is on the left-hand side of the operator. So a.b.c should correspond to an AST node something like:

    MemberLookup(MemberLookup(Variable("a"), "b"), "c") 
    

    and a.b.func(2, c) should be turned into:

    MethodCall(MemberLookup(Variable("a"), "b"),
               "func",
               List(Number(2), Variable("c")))
    

    or, perhaps,

    Apply(MemberLookup(MemberLookup(Variable("a"), "b"), "func"),
          List(Number(2), Variable("c"))
    

    (The difference has to do with the implicit self/this argument; there are various strategies for handling this. Contrast Java, Python and Lua for three completely different strategies.)

    Anyway, here's a couple of simple grammar fragments:

    C

    Here's an excerpt from the C grammar (as found in Appendix A of the C standard:

    postfix-expression:
      primary-expression
      postfix-expression '[' expression ']'
      postfix-expression '(' argument-expression-listopt ')'
      postfix-expression '.' identifier
      postfix-expression '->' identifier
      postfix-expression '++'
      postfix-expression '--'
    

    I included more than just the member access functions, because it shows that .identifier and ->identifier are handled just like any other postfix operator, which is a useful insight. The same production also includes two postfix bracketed operators, subscripting ([...]) and function call ((...)), which seem relevant here. But I left out compound literals (which I would have put into primary-expression).

    Python

    The comparable excerpt from the Python 3.9 docs:

    primary:
      primary '.' NAME
      primary '(' [arguments] ')'
      primary '[' slices ']'
      primary genexp
      atom