What is the correct grammar for a standard member expression?
E.g. the ast from the code:
test.test.function()
would be
MemberExpression("test", MemberExpression("test", MethodCall("function")))
And likewise for a variable:
test.test.test.variable
MemberExpression("test", MemberExpression("test", MemberExpression("test", Variable("variable"))))
Depends on the language, surely :-) But it's pretty straight-up in most grammars (see below).
One comment, though. As indicated by the grammars below, member access (like function calls and, usually, subscripting) acts like a postfix operator; the symbol after the dot (or arrow, in C-like languages) is a symbol representing a member name. It is not an expression; the only expression in the member lookup is on the left-hand side of the operator. So a.b.c
should correspond to an AST node something like:
MemberLookup(MemberLookup(Variable("a"), "b"), "c")
and a.b.func(2, c)
should be turned into:
MethodCall(MemberLookup(Variable("a"), "b"),
"func",
List(Number(2), Variable("c")))
or, perhaps,
Apply(MemberLookup(MemberLookup(Variable("a"), "b"), "func"),
List(Number(2), Variable("c"))
(The difference has to do with the implicit self
/this
argument; there are various strategies for handling this. Contrast Java, Python and Lua for three completely different strategies.)
Anyway, here's a couple of simple grammar fragments:
Here's an excerpt from the C grammar (as found in Appendix A of the C standard:
postfix-expression:
primary-expression
postfix-expression '[' expression ']'
postfix-expression '(' argument-expression-listopt ')'
postfix-expression '.' identifier
postfix-expression '->' identifier
postfix-expression '++'
postfix-expression '--'
I included more than just the member access functions, because it shows that .identifier
and ->identifier
are handled just like any other postfix operator, which is a useful insight. The same production also includes two postfix bracketed operators, subscripting ([...]
) and function call ((...)
), which seem relevant here. But I left out compound literals (which I would have put into primary-expression
).
The comparable excerpt from the Python 3.9 docs:
primary:
primary '.' NAME
primary '(' [arguments] ')'
primary '[' slices ']'
primary genexp
atom