I am trying to wrap my head around compiling a assignment expression to bytecode. I am writing my own language, and this part has really got me stumped. My language takes the source code and converts it into tokens, which are converted directly to bytecode. For example, something like:
a + 2
becomes
TOKEN_NAME
TOKEN_ADD
TOKEN_INT
this would then be parsed and converted to bytecode which would look something like
LOAD_VARIABLE (this is the a)
LOAD_CONSTANT (this is the 2)
ADD
This is pretty straight forward. But for an assignment expression such as:
a[0][1] = 2
which would become
TOKEN_NAME
TOKEN_L_BRACKET
TOKEN_INT
TOKEN_R_BRACKET
TOKEN_L_BRACKET
TOKEN_INT
TOKEN_R_BRACKET
TOKEN_ASSIGN
TOKEN_INT
I need to load a, do a subscript on that object (the 0 subscript), then store 2 into 1 subscript. I should add that the parser is effectively LL(1), which makes this particularly difficult.
I cannot think of a way to make sure the last part of the left hand side expression (the part I am assigning to) isn't loaded, but has the value (2) stored into it.
If any of this is unclear, please leave a comment I will be happy to clarify my program. (It's pretty hard to make a MCVE for an entire interpreter for a programming language!)
Thanks in advance.
You can use a simple backtracking method to construct references:
LOAD_VALUE
converts to GET_VALUE_REF
LOAD_PROPERTY
converts to GET_PROPERTY_REF
(LOAD_PROPERTY
is generated by a.b
)LOAD_ELEMENT
converts to GET_ELEMENT_REF
(LOAD_ELEMENT
is generated by a[b]
)This method is sufficient for the most common semantics. For C you would add support for the dereferencing operator *
: GET_POINTER_VALUE
converts to GET_POINTER_REF
which is essentially a no op.
To implement this you need to keep track of the last opcode generated by the compiler, with the possibility to patch it into another byte code.
The expression a[0][2]
would compile to
LOAD_VARIABLE a (this is the a)
LOAD_CONSTANT 0 (this is the 0)
GET_ELEMENT
LOAD_CONSTANT 2 (this is the 2)
GET_ELEMENT
a[0][2] = 3
converts to
LOAD_VARIABLE a
LOAD_CONSTANT 0
GET_ELEMENT
LOAD_CONSTANT 2
GET_ELEMENT_REF
LOAD_CONSTANT 3
STORE_REF
You can also generate specific stores directly if you don't need a reference (you need a reference for a[b] += c
for example).
a[0][2] = 3
then converts to
LOAD_VARIABLE a
LOAD_CONSTANT 0
GET_ELEMENT
LOAD_CONSTANT 2
LOAD_CONSTANT 3
STORE_ELEMENT (uses 3 stack slots)
while a[0][2] += 3
produces:
LOAD_VARIABLE a
LOAD_CONSTANT 0
GET_ELEMENT
LOAD_CONSTANT 2
GET_ELEMENT_REF
LOAD_REF
LOAD_CONSTANT 3
ADD
STORE_REF